RedisClusterConnection stops operation after .RedisCommandTimeoutException #42

taer · 2015-02-26T22:20:07Z

Setup: current master checkout of 3.1-SNAPSHOT: One thread, which has a single RedisClusterConnection<String,String>

The thread does a hgetall in a loop. You want to force a timeout on this thread. Do so by placing a VM breakpoint(pause all threads) in Utf8StringCodec.decode(). Let this hang for a bit(60 seconds I believe is the default timeout)

Now, once you're sure you've expired the timeout, remove the breakpoint and resume the JVM. The command you paused will throw from
FutureSyncInvocationHandler:59 return LettuceFutures.await(command, timeout, unit);

It doesn't seem to occur on a normal RedisConnection.

This is as far as I have made it. What happens next is the original caller gets an exception, which is perfectly fine. However, future calls to hgetall on the same connection will now simply timeout.

! com.lambdaworks.redis.RedisCommandTimeoutException: Command timed out
! at com.lambdaworks.redis.LettuceFutures.await(LettuceFutures.java:68) ~[tanknew1.jar:na]
! at com.lambdaworks.redis.FutureSyncInvocationHandler.handleInvocation(FutureSyncInvocationHandler.java:59) ~[tanknew1.jar:na]
! at com.lambdaworks.com.google.common.reflect.AbstractInvocationHandler.invoke(AbstractInvocationHandler.java:87) ~[tanknew1.jar:na]
! at com.sun.proxy.$Proxy95.hgetall(Unknown Source) ~[na:na]

Additional info: The pause all threads didn't seem to matter. I updated to just pause the single thread in the decode, and it broke the same way.

That was how to reproduce the error. Here is all I've managed to dig up so far.

Did a Thread dump. I have numerous requests blocked waiting for lock on
com.lambdaworks.redis.RedisAsyncConnectionImpl.dispatch()

The guy who has that monitor is blocked on
at com.lambdaworks.redis.protocol.Command.get(Command.java:109)
which is a CountDownLatch.await()

The text was updated successfully, but these errors were encountered:

taer · 2015-02-26T23:33:31Z

Ok, I see the problem now, but not sure how this happens. In CommandHandler:118

        while (!queue.isEmpty() && rsm.decode(buffer, queue.peek(), queue.peek().getOutput())) {

The queue is NOT empty. On the next time through after I break the connection via the UTF8Decoder pause, it has 2 things in it.
Head of the queue is my original clustercommand
Command [type=HGETALL, output=null]
Next in queue is the one I just retried
Command [type=HGETALL, output=MapOutput [output={}, error='null']]

rsm.decode is getting a null output, but returning false, which is exiting the while loop, abandoning my command that I just sent in.

Inverting the return at RedisStateMachine:81 to

        if (output == null) {
            return !stack.isEmpty();
        }

Sort of solves this problem(future commands work to that key). The old command is taken from the queue, completed, and loops over again, sending the result to my real command. But I haven't dug through RSM too much. I think the queue in CommandHandler should be removing the timed out Commands when they time out. Is that the intention of the above stack.isEmpty?

Also, with that small hack above, isMoved() in ClusterCommand NPEs due to otput being null. That could probably be checked for.

taer · 2015-02-26T23:37:23Z

e5fbd1e has a patch that makes it not freeze. However, like I said above, I have no idea if this breaks other things.

taer · 2015-02-27T03:25:23Z

Have a simple test case that reproduces this on standalone and cluster.
a80dfdd

mp911de · 2015-02-27T06:41:47Z

Hi @taer I'll take a look on this.

On inturruption, the Command class is nulling the output. But this is causing the statemachine to not know what to do with the actual output it eventually gets. If we leave the output there, the RSM can still update the output, even though the client has moved on.

taer · 2015-02-27T16:09:33Z

@mp911de Think I found a solution(and updated the test as well) The inversion of that line in RSM is definitely incorrect.

I pushed a new version of BreakServer(should be called breakClient) bd5227f . The fix is 3e20467

mp911de · 2015-02-27T16:19:19Z

You're faster than me. Would you like to create a PR that contains the fix and the cluster setup (your test perhaps makefile-changes as well)?

taer · 2015-02-27T16:23:25Z

Sure. I'm updating the test now to fit with the rest of the integration tests. And am putting it someplace sane.

Since the command fix fixed both(standalone and cluster), and cluster setup in a test is much more prone to errors, I removed the cluster part.

mp911de · 2015-02-27T16:24:33Z

There is a Cluster setup rule (JUnit Test Rule) (see existing cluster tests) which might help you

mp911de · 2015-02-27T18:36:34Z

Can I close this issue after #43 is merged?

taer · 2015-02-27T19:39:31Z

It satisfies this case. I don't see if breaking anything else. I don't see it causing leaks, since if there is an output, the command will be taken from the queue in CommandHandler.java, GCing both.

mp911de · 2015-02-27T20:55:34Z

Closing, everything is fine. Thanks @taer

taer added a commit to taer/lettuce that referenced this issue Feb 27, 2015

Test case that reproduces Issue redis#42

a80dfdd

mp911de added the type: bug A general bug label Feb 27, 2015

This was referenced Feb 27, 2015

Issue 42 Fix #43

Merged

Fix synchronized issue #44

Merged

mp911de added a commit that referenced this issue Feb 27, 2015

Reformat code and polish documentation #42

69ee5dd

mp911de added this to the Lettuce 3.1 milestone Feb 27, 2015

mp911de closed this as completed Feb 27, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RedisClusterConnection stops operation after .RedisCommandTimeoutException #42

RedisClusterConnection stops operation after .RedisCommandTimeoutException #42

taer commented Feb 26, 2015

taer commented Feb 26, 2015

taer commented Feb 26, 2015

taer commented Feb 27, 2015

mp911de commented Feb 27, 2015

taer commented Feb 27, 2015

mp911de commented Feb 27, 2015

taer commented Feb 27, 2015

mp911de commented Feb 27, 2015

mp911de commented Feb 27, 2015

taer commented Feb 27, 2015

mp911de commented Feb 27, 2015

RedisClusterConnection stops operation after .RedisCommandTimeoutException #42

RedisClusterConnection stops operation after .RedisCommandTimeoutException #42

Comments

taer commented Feb 26, 2015

taer commented Feb 26, 2015

taer commented Feb 26, 2015

taer commented Feb 27, 2015

mp911de commented Feb 27, 2015

taer commented Feb 27, 2015

mp911de commented Feb 27, 2015

taer commented Feb 27, 2015

mp911de commented Feb 27, 2015

mp911de commented Feb 27, 2015

taer commented Feb 27, 2015

mp911de commented Feb 27, 2015