-
Notifications
You must be signed in to change notification settings - Fork 975
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EVALSHA broken when a master node goes down #417
Comments
Thanks for digging into So there's nothing unusual here. Which API are you using (the synchronous or asynchronous)? Do you have a bit of code how you use lettuce? Mainly command invocation. I didn't fully understand what you mean with
Is it something like
|
I am using the sync API. Yeah, the code more or less looks like what you have. It does not make sense to me either looking at the code and the log but it is in fact happening. Not sure how to debug further. Is there any more logging we can add to help with this? |
There are a couple of things you could do:
|
Are you suggesting there was a MOVED or ASK response coming back from Redis? There was no slot migration going on at that time so i am not sure if that is true. is Lettuce not able to handle that case when there is a MOVED or ASK response? |
I suggest checking every step that is on the way towards command completion to find out why commands fail although they receive a completing response. |
So i took another look at the log. Seems like the root issue here is the connection to the new master node is somehow stale; or commands written to it seem to be lost. To summarize:
Now, between 5 and 6 (about 8 minutes), HGET to S2 works just fine. EVALSHA to S2 got re-routed to new-M1 (either with MOVED or ASK, not sure which one). Lettuce, when it got the MOVED or ASK, writes the command to the existing connection to new-M1 (or S1) and it seems like there is no response coming back at all. Only when the background save is done, i would see this in the log
Out of 72 hosts that run Lettuce clients, about 1/2 would have this issue and the other hosts can query from new-M1 just fine so i think this is more a client side issue, not Redis. Do you have any insight on why this is happening? could it be a problem with the way the role of the connection to S1 has been changed (slave --> master)? Thanks, |
It seems like all EVALSHA requests on a slave node is re-routed to the master node, which is why we saw HGET going through and not EVALSHA (because HGET was successful on the slave node while all requests to the new-master node were just dropped). So, i think we've narrowed down the problem to a connection to the newly-elected master where requests to this connection somehow do not make it to Redis. I have topology refresh turned on and can see new connection being spun up from the host to the new master node working just fine (the commands are CLIENT and CLUSTER). Any suggestion on what the next step should be? |
Thanks for further investigation. I think it's important now to find out more about the flow of the redirected command. So when This also means that you should be able to find the redirected command in the logs where it's retried with a different connection. Tracing these connections can be a hassle but you could introduce a command counter (command Id) in Something like: class ClusterCommand {
private static final AtomicLong counter = new AtomicLong();
private final long commandId = counter.incrementAndGet();
// ...
@Override
public String toString() {
final StringBuilder sb = new StringBuilder();
sb.append(getClass().getSimpleName());
sb.append(" [command=").append(command);
sb.append(", commandId=").append(id);
sb.append(", redirections=").append(redirections);
sb.append(", maxRedirections=").append(maxRedirections);
sb.append(']');
return sb.toString();
} |
Here is the log that shows a re-direct
First, the EVALSHA goes to a slave node (S2 with |
Note that i also opened an issue against Redis for not being able to run read-only Lua script on slave nodes redis/redis#3665 |
That's interesting. So an existing connection to the slave S2, which gets promoted to master seems to cause the issue. The command is written to the connection but the command does not receive a response. Can you reproduce that behavior with |
I cant reproduce with redis-cli Here is the sequence of events:
I should also note that our Redis cluster is configured to kill idle connection after 60 seconds of inactivity. C1 had been coming in and out of channelActive()/channelInactive() state every minute before it got the first evalsha request. Every time it gets into channelInactive(), the ConnectionWatchDog's previously scheduled reconnection task kicks in and reconnect immediately. Now, I see that between the last successful evalsha and the next unsuccessful ZCARD request, it was about 85 seconds. However, i did not see any channelInactive() event coming in. My current theory is somehow the inactive() event was not received properly on the client side (dropped packet ...) or there is some exception thrown in the channel pipeline, before it gets to CommandHandler Do you have any suggestion on how to detect when a connection goes bad and trigger a manual reconnection? I was thinking of adding a ChannelHandler that detects when the diff b/w the last channelRead() and channelWrite(), and if this time is more than a configurable period of time, close the channel. Will that work? will that cause the connection to be completely closed and not reconnected? Thanks, |
Thank's for the awesome detailed analysis. Connection lifecycle signals follow the TCP packets that come with lifecycle changes. If a TCP packet with FIN or RST bits is received by the client, then the connection is reset (connection reset by peer) or closed (regular closed). RST packets are sent by the server when a client is sending a packet to a closed port. Please be aware that You can plug a LoggingHandler into netty's pipeline as the first handler to enable really detailed logging of the channel activity. Having a time window of 85 seconds is very little to inspect what's going on but at least it's reproducible. At the moment I don't have any clue besides further logging and inspection what happens on the channel. Additionally, you could TCPdump (or better, use Wireshark) on the server-side to inspect the traffic between the client and the newly elected master. I will attempt to reproduce the issue on my side. I don't know the cause why the client either doesn't get an answer or the received data doesn't get processed. /cc @badboy @itamarhaber: Have you seen a similar issue where connection traffic gets stuck after slave to master promotion in Redis Cluster? |
I did not run Cluster for anything other than testing yet, where I never experienced such an issue. Itamar might know more or @antirez himself. |
@long-xuan-nguyen I tried to reproduce the issue by using The only kind of error I could reproduce was |
Any update on this ticket? |
Hi there, So i turned on a little more logging to help debugging this issue. I saw this happening again yesterday. This time, there was no change in topology; i.e: all nodes were performing just fine. Here is the sequence of events that happened:
See that "Reconnect scheduling disabled" is printed. i took a heap dump and can see that in ConnectionWatchDog, listenOnChannelInactive is true but reconnectionHandler.reconnectSuspended = false. Also note that clientOptions.isSuspendReconnectOnProtocolFailure = false. This indicates ClusterNodeCommandHandler.prepareClose() must have been called by PooledClusterConnectionProvider.reconfigurePartitions() previously. However, it doesn't explain why the channel is still open up until this point. Am I missing something? FYI, i see that ConnectionWatchDog.setReconnectSuspended() does not make use of the parameter. It blindly calls reconnectionHandler.setReconnectSuspended(true). I was hoping there was some code that would call reconnectionHandler.setReconnectSuspended(false) but couldnt find any. Is there some code hidden behind reflection somewhere? |
If fixed the parameter issue, thanks for the hint. |
i am sorry ; i meant listenOnChannelInactive=true and reconnectSuspended=true. As a result, when the connection is killed on the server side, no reconnection is re-scheduled, causing all writes to the connection to go into buffer. |
any idea how this could happen? Thanks, |
I have no clue how these things happen with the given context. At this stage, the ticket scope gets quite unclear and I don't have any idea how to continue. |
Sorry if this has been confusing. I'll break it down. The issue is that for some reason, ConnectionWatchDog CommandHandler that is associated to a master node is no longer schedule reconnection anymore. Here is my setup: Redis server-side is set to kill idle connection every 60 seconds. A connection to a master node has been working fine for a day and have regular writes going out successfully. Then during low traffic time, the connection get killed by Redis. My expectation is that the ConnectionWatchDog for that ClusterNodeCommandHandler should reconnect but I see in the log that "Reconnect scheduling disabled". As a result, all future writes to this ClusterNodeCommandHandler ends up in the buffer. I took the heap dump when this occurs and upon looking at the heap dump, I see that ConnectionWatchDog.listenOnChannelInactive == true and ConnectionWatchDog.ReconnectionHandler.reconnectSuspended == true. Now, the only instance this would happen is when ClusterNodeCommandHandler.prepareClose() has been called, based on the code. I am not sure if there is any other instance when this happens and I am unable to root cause. Would you be able to have some insight into this issue or if there is any workaround? I was thinking to not even look at isReconnectSuspended() in ConnectionWatchDog to let it always schedule the reconnection. Would appreciate your thoughts on this. Thanks, |
Closing this one as this issue seems resolved. |
My setup:
+ Make EVALSHA intent READ so we can utilize a slave node to read (our Lua script is strictly reading, no writing)
Problem: When a master node goes down (let us call it M1 and its 2 slaves S1 and S2), when a request for EVALSHA hits either of M1, S1 or S2, it will always timeout with RedisCommandTimeoutException. I expect the call to M1 to timeout but on retry, RedisClusterClient calls S1 or S2 and even though the call return with proper response and I can see in CommandHander, command.complete() is called, Lettuce would just timeout and unable to retrieve the result.
Here is some log:
The request id that runs on my main thread is ee7a2eb7-cd0b-4b7c-ba24-a9fcda448d62. Last log line indicates the EVALSHA request is retried (there are 9 other segment of logs similar to this but i am just pasting the first one).
The Redis node being queried here is 10.91.214.2, which is S1. M1 went down about a couple of seconds before this request came in our server.
You can see the log line "Received response ...", which is a log statement i added after line 211 here https://github.com/mp911de/lettuce/blob/4.2.x/src/main/java/com/lambdaworks/redis/protocol/CommandHandler.java#L211 . That indicates command.complete() has already been called successfully with no exception bubbling up.
Another interesting thing is in between, we have a request for a HGET (02 Dec 2016 13:41:27,915), which is in between retries, to the same node and it is working fine.
There are many other instances of this with different combination where the sequence of retries differ, such as M1 -> S2 -> S1 -> M1 or S1 -> M1 -> S2 ..., all result in failures after 10 retries
Another interesting fact is that i randomly see failures once in a while (3 to 5 every hour or two) with EVALSHA where all nodes are queried as part of retry but it never succeeds. I wonder if it is due to the same issue and the fact that a master node goes down makes it much worse (2000+ failures every 5 minute)
Any help would be greatly appreciated.
Thanks,
The text was updated successfully, but these errors were encountered: