-
Notifications
You must be signed in to change notification settings - Fork 945
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RedisClusterConnection stops operation after .RedisCommandTimeoutException #42
Comments
Ok, I see the problem now, but not sure how this happens. In CommandHandler:118
The queue is NOT empty. On the next time through after I break the connection via the UTF8Decoder pause, it has 2 things in it. rsm.decode is getting a null output, but returning false, which is exiting the while loop, abandoning my command that I just sent in. Inverting the return at RedisStateMachine:81 to
Sort of solves this problem(future commands work to that key). The old command is taken from the queue, completed, and loops over again, sending the result to my real command. But I haven't dug through RSM too much. I think the queue in CommandHandler should be removing the timed out Commands when they time out. Is that the intention of the above stack.isEmpty? Also, with that small hack above, isMoved() in ClusterCommand NPEs due to otput being null. That could probably be checked for. |
e5fbd1e has a patch that makes it not freeze. However, like I said above, I have no idea if this breaks other things. |
Have a simple test case that reproduces this on standalone and cluster. |
Hi @taer I'll take a look on this. |
On inturruption, the Command class is nulling the output. But this is causing the statemachine to not know what to do with the actual output it eventually gets. If we leave the output there, the RSM can still update the output, even though the client has moved on.
You're faster than me. Would you like to create a PR that contains the fix and the cluster setup (your test perhaps makefile-changes as well)? |
Sure. I'm updating the test now to fit with the rest of the integration tests. And am putting it someplace sane. Since the command fix fixed both(standalone and cluster), and cluster setup in a test is much more prone to errors, I removed the cluster part. |
There is a Cluster setup rule (JUnit Test Rule) (see existing cluster tests) which might help you |
Can I close this issue after #43 is merged? |
It satisfies this case. I don't see if breaking anything else. I don't see it causing leaks, since if there is an output, the command will be taken from the queue in CommandHandler.java, GCing both. |
Closing, everything is fine. Thanks @taer |
Setup: current master checkout of 3.1-SNAPSHOT: One thread, which has a single RedisClusterConnection<String,String>
The thread does a hgetall in a loop. You want to force a timeout on this thread. Do so by placing a VM breakpoint(pause all threads) in Utf8StringCodec.decode(). Let this hang for a bit(60 seconds I believe is the default timeout)
Now, once you're sure you've expired the timeout, remove the breakpoint and resume the JVM. The command you paused will throw from
FutureSyncInvocationHandler:59 return LettuceFutures.await(command, timeout, unit);
It doesn't seem to occur on a normal RedisConnection.
This is as far as I have made it. What happens next is the original caller gets an exception, which is perfectly fine. However, future calls to hgetall on the same connection will now simply timeout.
Additional info: The pause all threads didn't seem to matter. I updated to just pause the single thread in the decode, and it broke the same way.
That was how to reproduce the error. Here is all I've managed to dig up so far.
Did a Thread dump. I have numerous requests blocked waiting for lock on
com.lambdaworks.redis.RedisAsyncConnectionImpl.dispatch()
The guy who has that monitor is blocked on
at com.lambdaworks.redis.protocol.Command.get(Command.java:109)
which is a CountDownLatch.await()
The text was updated successfully, but these errors were encountered: