New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Redis command keep buffered(Questions) #466
Comments
Commands get buffered if a connection gets disconnected and these commands get replayed once the connection is restored again, by default. You can tweak that behavior with See also: |
Thanks for the wiki link. I think it should auto reconnect(default true), after restarting there is no problem to connect to all nodes.This keeps growing for day, and other servers has no problems. I'm curious if there is possibility that auto connect not works?(or maybe it try reconnecting but always failed.(step 15 at reconnect)).... I will change |
You might want to inspect the connection target (host/port) and the topology view ( |
Zero attempts is unusual. Take a look into Reconnections are performed by |
In my log, I didn't get warn at https://github.com/mp911de/lettuce/blob/master/src/main/java/com/lambdaworks/redis/protocol/ConnectionWatchdog.java#L277 Hope this help(I'm trying too..) |
I can't reproduce locally, but curious https://github.com/mp911de/lettuce/blob/master/src/main/java/com/lambdaworks/redis/protocol/ConnectionWatchdog.java#L215 if it possible that timeout's run runs and reconnectWorkers.submit runs before assignment of reconnectScheduleTimeout? |
It is possible if the timeout is small enough and a lot of things come together (the right time, a GC pause) although it's very unlikely. For now, I'd like to close this ticket because there's effectively nothing we can do about it right now. At a later time, if we find some clues, we can still reopen the ticket. Does this make sense? |
Ok, I just suspect that situation may cause the state like I found at heap dump's result. Let me close this and if I can find something new, I will reopen this. |
We met this problem again, so let me reopen this again to see if my thoughts can provide some clue or not. I'm not sure if this can be treat as reproduce-able. e.g.
at https://github.com/lettuce-io/lettuce-core/blob/master/src/main/java/com/lambdaworks/redis/protocol/ConnectionWatchdog.java#L217 . And in current code base, maybe it would be ok to remove |
I think it could make sense to improve state handling to prevent multiple attempts of reconnect initialization. Have you tried enabling debug logging for |
hm, let me try. But it's hard to reproduce, last time is Feb and this time is June. |
I think that I face the same problem pretty often in production. I have a limited command queue. Often after some network works with possible blips I get a state:
A number of such exceptions make me think that likely lettuce can't run commands against only one process in a cluster. On the same time, nodes are definitely accessible, restart helps to resolve this state. My configuration code for the client
Lettuce 4.4.1.Final + netty 4.0.51.Final Watchdog logging:
Which doesn't make too much sense - node is accessible, connection opens from the same node from the command line in 5ms. And after application restart, this connection will be established successfully with the same parameters. |
@Spikhalskiy interesting to hear that Lettuce can't reconnect to the node (timeout) but a |
@mp911de I will take a thread dump next time when I get this state. I get it pretty often, one in a week or two. Currently cure it by triggering shutting down and recreating a lettuce client. |
@mp911de Why I posted here and think that it can be related - if remove a queue limit from the client configuration in my case - the issue and outcomes will look very similar to the original one. |
Closing this one as per #679. |
My environment is 4.3.0.Final and using Redis Cluster. Recently we have 1 server that has OOM by lettuce. Below is the image of dumped heap.
Looks like there are lots of commands buffered, I'm not sure but suspect that maybe there is something wrong at part of connections but nodes are still alive in the cluster. I just suspect this is the reason why commands buffered(https://github.com/mp911de/lettuce/blob/master/src/main/java/com/lambdaworks/redis/protocol/CommandHandler.java#L307).
But I saw there is connection watchdog help reconnect..
Below is what my configuration.
others keep default.
The text was updated successfully, but these errors were encountered: