EVALSHA broken when a master node goes down #417

long-xuan-nguyen · 2016-12-03T07:24:26Z

My setup:

50 master nodes; each with 2 slaves making it 150 nodes in total
Lettuce 4.2.2 with some internal patches:
+ Make EVALSHA intent READ so we can utilize a slave node to read (our Lua script is strictly reading, no writing)
- Add some debugging log here: https://github.com/mp911de/lettuce/blob/4.2.x/src/main/java/com/lambdaworks/redis/protocol/CommandHandler.java#L211 (after command.complete() within the try block
READFROM is our custom implementation which takes a random node
RedisClusterClient timeout is 300 ms
RedisClient is wrapped with a retry policy that retry 10 times.

Problem: When a master node goes down (let us call it M1 and its 2 slaves S1 and S2), when a request for EVALSHA hits either of M1, S1 or S2, it will always timeout with RedisCommandTimeoutException. I expect the call to M1 to timeout but on retry, RedisClusterClient calls S1 or S2 and even though the call return with proper response and I can see in CommandHander, command.complete() is called, Lettuce would just timeout and unable to retrieve the result.

Here is some log:

02 Dec 2016 13:41:27,638 [DEBUG] ee7a2eb7-cd0b-4b7c-ba24-a9fcda448d62 (pool-1-thread-8977) XXX : Running lua script for RedisRangeQueryRequest(scanForward=false, pageSize=3, previousKey=, previousValue=, fastMode=false, numberOfSameScoreItemsThreshold=50) against the Redis key 1:1/1/AAA
02 Dec 2016 13:41:27,638 [DEBUG] ee7a2eb7-cd0b-4b7c-ba24-a9fcda448d62 (pool-1-thread-8977) com.lambdaworks.redis.RedisChannelHandler: dispatching command AsyncCommand [type=EVALSHA, commandType=com.lambdaworks.redis.protocol.Command] with args 1:1/1/AAA
02 Dec 2016 13:41:27,638 [DEBUG] ee7a2eb7-cd0b-4b7c-ba24-a9fcda448d62 (pool-1-thread-8977) com.lambdaworks.redis.cluster.PooledClusterConnectionProvider: getConnection(READ, 11598)
02 Dec 2016 13:41:27,638 [DEBUG] ee7a2eb7-cd0b-4b7c-ba24-a9fcda448d62 (pool-1-thread-8977) com.lambdaworks.redis.cluster.PooledClusterConnectionProvider: Returned connection for (READ,11598) is [chid=0x17b, channel=0x1f4ddb6f, /10.160.12.233:46637 -> /10.91.214.25:6379]
02 Dec 2016 13:41:27,638 [DEBUG] ee7a2eb7-cd0b-4b7c-ba24-a9fcda448d62 (pool-1-thread-8977) com.lambdaworks.redis.protocol.CommandHandler: [chid=0x17b, channel=0x1f4ddb6f, /10.160.12.233:46637 -> /10.91.214.25:6379] Command wrtten: ClusterCommand [command=AsyncCommand [type=EVALSHA, commandType=com.lambdaworks.redis.protocol.Command], executions=0, arg=1:1/1/AAA
02 Dec 2016 13:41:27,638 [DEBUG] ee7a2eb7-cd0b-4b7c-ba24-a9fcda448d62 (pool-1-thread-8977) com.lambdaworks.redis.protocol.CommandHandler: [chid=0x17b, channel=0x1f4ddb6f, /10.160.12.233:46637 -> /10.91.214.25:6379] write() writeAndFlush command ClusterCommand [command=AsyncCommand [type=EVALSHA, commandType=com.lambdaworks.redis.protocol.Command], executions=0, arg=1:1/1/AAA
02 Dec 2016 13:41:27,638 [DEBUG] ee7a2eb7-cd0b-4b7c-ba24-a9fcda448d62 (pool-1-thread-8977) com.lambdaworks.redis.protocol.CommandHandler: [chid=0x17b, channel=0x1f4ddb6f, /10.160.12.233:46637 -> /10.91.214.25:6379] write() done
02 Dec 2016 13:41:27,638 [DEBUG] ee7a2eb7-cd0b-4b7c-ba24-a9fcda448d62 (pool-1-thread-8977) com.lambdaworks.redis.AbstractRedisAsyncCommands: Dispatched to the underlying connection Command [type=EVALSHA, output=ValueOutput [output=null, error='null']]
02 Dec 2016 13:41:27,638 [DEBUG]  (lettuce-nioEventLoop-4-78) com.lambdaworks.redis.protocol.CommandHandler: [chid=0x17b, channel=0x1f4ddb6f, /10.160.12.233:46637 -> /10.91.214.25:6379] write(ctx, ClusterCommand [command=AsyncCommand [type=EVALSHA, commandType=com.lambdaworks.redis.protocol.Command], executions=0, arg=1:1/1/AAA, promise)
02 Dec 2016 13:41:27,638 [DEBUG]  (lettuce-nioEventLoop-4-78) com.lambdaworks.redis.protocol.CommandHandler: [chid=0x17b, channel=0x1f4ddb6f, /10.160.12.233:46637 -> /10.91.214.25:6379] Received: 33 bytes, 1 queued commands
02 Dec 2016 13:41:27,638 [DEBUG]  (lettuce-nioEventLoop-4-78) com.lambdaworks.redis.protocol.CommandHandler: [chid=0x17b, channel=0x1f4ddb6f, /10.160.12.233:46637 -> /10.91.214.25:6379] Queue contains: 1 commands. Processing ClusterCommand [command=AsyncCommand [type=EVALSHA, commandType=com.lambdaworks.redis.protocol.Command], executions=0, arg=1:1/1/AAA
02 Dec 2016 13:41:27,638 [DEBUG]  (lettuce-nioEventLoop-4-78) com.lambdaworks.redis.protocol.CommandHandler: [chid=0x17b, channel=0x1f4ddb6f, /10.160.12.233:46637 -> /10.91.214.25:6379] Received response for ClusterCommand [command=AsyncCommand [type=EVALSHA, commandType=com.lambdaworks.redis.protocol.Command], executions=1, arg=1:1/1/AAA
02 Dec 2016 13:41:27,915 [DEBUG] 00eaef77-f926-4a88-bb54-a357ab82c63c (pool-1-thread-8963) com.lambdaworks.redis.cluster.PooledClusterConnectionProvider: Returned connection for (READ,11544) is [chid=0x17b, channel=0x1f4ddb6f, /10.160.12.233:46637 -> /10.91.214.25:6379]
02 Dec 2016 13:41:27,915 [DEBUG] 00eaef77-f926-4a88-bb54-a357ab82c63c (pool-1-thread-8963) com.lambdaworks.redis.protocol.CommandHandler: [chid=0x17b, channel=0x1f4ddb6f, /10.160.12.233:46637 -> /10.91.214.25:6379] Command wrtten: ClusterCommand [command=AsyncCommand [type=HGET, commandType=com.lambdaworks.redis.protocol.Command], executions=0, arg=0:7920:d
02 Dec 2016 13:41:27,915 [DEBUG] 00eaef77-f926-4a88-bb54-a357ab82c63c (pool-1-thread-8963) com.lambdaworks.redis.protocol.CommandHandler: [chid=0x17b, channel=0x1f4ddb6f, /10.160.12.233:46637 -> /10.91.214.25:6379] write() writeAndFlush command ClusterCommand [command=AsyncCommand [type=HGET, commandType=com.lambdaworks.redis.protocol.Command], executions=0, arg=0:7920:d
02 Dec 2016 13:41:27,915 [DEBUG] 00eaef77-f926-4a88-bb54-a357ab82c63c (pool-1-thread-8963) com.lambdaworks.redis.protocol.CommandHandler: [chid=0x17b, channel=0x1f4ddb6f, /10.160.12.233:46637 -> /10.91.214.25:6379] write() done
02 Dec 2016 13:41:27,915 [DEBUG]  (lettuce-nioEventLoop-4-78) com.lambdaworks.redis.protocol.CommandHandler: [chid=0x17b, channel=0x1f4ddb6f, /10.160.12.233:46637 -> /10.91.214.25:6379] write(ctx, ClusterCommand [command=AsyncCommand [type=HGET, commandType=com.lambdaworks.redis.protocol.Command], executions=0, arg=0:7920:d, promise)
02 Dec 2016 13:41:27,915 [DEBUG]  (lettuce-nioEventLoop-4-78) com.lambdaworks.redis.protocol.CommandHandler: [chid=0x17b, channel=0x1f4ddb6f, /10.160.12.233:46637 -> /10.91.214.25:6379] Received: 5 bytes, 1 queued commands
02 Dec 2016 13:41:27,915 [DEBUG]  (lettuce-nioEventLoop-4-78) com.lambdaworks.redis.protocol.CommandHandler: [chid=0x17b, channel=0x1f4ddb6f, /10.160.12.233:46637 -> /10.91.214.25:6379] Queue contains: 1 commands. Processing ClusterCommand [command=AsyncCommand [type=HGET, commandType=com.lambdaworks.redis.protocol.Command], executions=0, arg=0:7920:d
02 Dec 2016 13:41:27,915 [DEBUG]  (lettuce-nioEventLoop-4-78) com.lambdaworks.redis.protocol.CommandHandler: [chid=0x17b, channel=0x1f4ddb6f, /10.160.12.233:46637 -> /10.91.214.25:6379] Received response for ClusterCommand [command=AsyncCommand [type=HGET, commandType=com.lambdaworks.redis.protocol.Command], executions=1, arg=0:7920:d
02 Dec 2016 13:41:27,938 [DEBUG] ee7a2eb7-cd0b-4b7c-ba24-a9fcda448d62 (pool-1-thread-8977) com.lambdaworks.redis.RedisChannelHandler: dispatching command AsyncCommand [type=EVALSHA, commandType=com.lambdaworks.redis.protocol.Command] with args 1:1/1/38132768605
02 Dec 2016 13:41:27,938 [DEBUG] ee7a2eb7-cd0b-4b7c-ba24-a9fcda448d62 (pool-1-thread-8977) com.lambdaworks.redis.cluster.PooledClusterConnectionProvider: getConnection(READ, 11598)

The request id that runs on my main thread is ee7a2eb7-cd0b-4b7c-ba24-a9fcda448d62. Last log line indicates the EVALSHA request is retried (there are 9 other segment of logs similar to this but i am just pasting the first one).

The Redis node being queried here is 10.91.214.2, which is S1. M1 went down about a couple of seconds before this request came in our server.

You can see the log line "Received response ...", which is a log statement i added after line 211 here https://github.com/mp911de/lettuce/blob/4.2.x/src/main/java/com/lambdaworks/redis/protocol/CommandHandler.java#L211 . That indicates command.complete() has already been called successfully with no exception bubbling up.

Another interesting thing is in between, we have a request for a HGET (02 Dec 2016 13:41:27,915), which is in between retries, to the same node and it is working fine.

There are many other instances of this with different combination where the sequence of retries differ, such as M1 -> S2 -> S1 -> M1 or S1 -> M1 -> S2 ..., all result in failures after 10 retries

Another interesting fact is that i randomly see failures once in a while (3 to 5 every hour or two) with EVALSHA where all nodes are queried as part of retry but it never succeeds. I wonder if it is due to the same issue and the fact that a master node goes down makes it much worse (2000+ failures every 5 minute)

Any help would be greatly appreciated.

Thanks,

The text was updated successfully, but these errors were encountered:

mp911de · 2016-12-03T16:29:18Z

Thanks for digging into CommandHandler. An available/unreachable node doesn't matter much once PooledClusterConnectionProvider returns a connection. The cluster facade uses individual connections under the hood that run separate queues and TCP connections. I see from your log that lettuce works normally and Redis responses commands are executed in (sub) millisecond time ranges.

So there's nothing unusual here. EVALSHA is marked as read-only allowed command since lettuce 4.3.0 (see #366).

Which API are you using (the synchronous or asynchronous)? Do you have a bit of code how you use lettuce? Mainly command invocation. I didn't fully understand what you mean with

RedisClient is wrapped with a retry policy that retry 10 times.

Is it something like

for(int i=0; i < 10; i++) {
  try{
    return redis.evalsha(…);
  } catch (RedisException e){
    // ignore
  }
}

long-xuan-nguyen · 2016-12-03T17:35:25Z

I am using the sync API.

Yeah, the code more or less looks like what you have.

It does not make sense to me either looking at the code and the log but it is in fact happening. Not sure how to debug further. Is there any more logging we can add to help with this?

mp911de · 2016-12-03T20:42:59Z

There are a couple of things you could do:

Use the async API and inspect the thrown exception (connection.async().evalsha(…).get(300, TimeUnit.MILLISECONDS)
It's a ClusterCommand that has a special complete handling. You could make sure that super.complete() is called (see ClusterCommand.java:53
Inspect the data that is returned by Redis. The debug log shows the outgoing payload but the result of the command is not in the logs (see CommandHandler.java:194). Payload logging creates severe additional load but you could limit it to EVALSHA inside the decode loop (if (command.getType().name().equals("EVALSHA")))

long-xuan-nguyen · 2016-12-03T22:43:41Z

Are you suggesting there was a MOVED or ASK response coming back from Redis? There was no slot migration going on at that time so i am not sure if that is true.

is Lettuce not able to handle that case when there is a MOVED or ASK response?

mp911de · 2016-12-04T08:18:09Z

I suggest checking every step that is on the way towards command completion to find out why commands fail although they receive a completing response.

long-xuan-nguyen · 2016-12-05T15:33:10Z

So i took another look at the log. Seems like the root issue here is the connection to the new master node is somehow stale; or commands written to it seem to be lost.

To summarize:

We have M1, S1 and S2
M1 went down
S1 got promoted to be new-M1
S2 asks for a full-sync from S1 or new-M1
M1 kick offs background dump of its data
Data fully sync'ed in S2

Now, between 5 and 6 (about 8 minutes), HGET to S2 works just fine. EVALSHA to S2 got re-routed to new-M1 (either with MOVED or ASK, not sure which one). Lettuce, when it got the MOVED or ASK, writes the command to the existing connection to new-M1 (or S1) and it seems like there is no response coming back at all.

Only when the background save is done, i would see this in the log

02 Dec 2016 13:44:04,045 [DEBUG]  (lettuce-nioEventLoop-4-77) com.lambdaworks.redis.protocol.CommandHandler: [chid=0x179, channel=0x80fc6c70, /10.160.12.233:42118 -> /10.88.137.23:6379] Received response for ClusterCommand [command=AsyncCommand [type=HGET, commandType=com.lambdaworks.redis.protocol.Command], executions=1, arg=0:2188:d

02 Dec 2016 13:50:36,855 [DEBUG]  (lettuce-nioEventLoop-4-77) com.lambdaworks.redis.protocol.CommandHandler: [chid=0x179, channel=0x80fc6c70, /10.160.12.233:42118 -> /10.88.137.23:6379] channelInactive()
02 Dec 2016 13:50:36,856 [DEBUG]  (lettuce-nioEventLoop-4-77) com.lambdaworks.redis.protocol.CommandHandler: [chid=0x179, channel=0x80fc6c70, /10.160.12.233:42118 -> /10.88.137.23:6379] deactivating channel handler
02 Dec 2016 13:50:36,856 [DEBUG]  (lettuce-nioEventLoop-4-77) com.lambdaworks.redis.protocol.CommandHandler: [chid=0x179, channel=0x80fc6c70, /10.160.12.233:42118 -> /10.88.137.23:6379] channelInactive() done
02 Dec 2016 13:50:36,856 [DEBUG]  (lettuce-nioEventLoop-4-77) com.lambdaworks.redis.protocol.ConnectionWatchdog: [chid=0x179, channel=0x80fc6c70, /10.160.12.233:42118 -> /10.88.137.23:6379] channelInactive(ChannelHandlerContext(ConnectionWatchdog#0, [id: 0x80fc6c70, L:/10.160.12.233:42118 ! R:/10.88.137.23:6379]))
02 Dec 2016 13:50:36,856 [DEBUG]  (lettuce-nioEventLoop-4-77) com.lambdaworks.redis.protocol.ConnectionWatchdog: [chid=0x179, channel=0x80fc6c70, /10.160.12.233:42118 -> /10.88.137.23:6379] scheduleReconnect()
02 Dec 2016 13:50:36,856 [DEBUG]  (lettuce-nioEventLoop-4-77) com.lambdaworks.redis.protocol.ConnectionWatchdog: [chid=0x179, channel=0x80fc6c70, /10.160.12.233:42118 -> /10.88.137.23:6379] Reconnect attempt 1, delay 1ms
02 Dec 2016 13:50:36,856 [DEBUG]  (lettuce-nioEventLoop-4-77) com.lambdaworks.redis.protocol.CommandHandler: [chid=0x179, channel=0x80fc6c70, /10.160.12.233:42118 -> /10.88.137.23:6379] channelUnregistered()
02 Dec 2016 13:50:36,933 [DEBUG]  (lettuce-nioEventLoop-4-165) com.lambdaworks.redis.protocol.CommandHandler: [chid=0xb5, channel=0x9f6314e7, [id: 0x9f6314e7] (inactive)] channelRegistered()
02 Dec 2016 13:50:36,949 [DEBUG]  (lettuce-nioEventLoop-4-165) com.lambdaworks.redis.protocol.CommandHandler: [chid=0xb5, channel=0x9f6314e7, /10.160.12.233:37464 -> /10.88.137.23:6379] channelActive()
02 Dec 2016 13:50:36,949 [DEBUG]  (lettuce-nioEventLoop-4-165) com.lambdaworks.redis.protocol.CommandHandler: [chid=0xb5, channel=0x9f6314e7, /10.160.12.233:37464 -> /10.88.137.23:6379] moveQueuedCommandsToCommandBuffer 0 command(s) added to buffer
02 Dec 2016 13:50:36,949 [DEBUG]  (lettuce-nioEventLoop-4-165) com.lambdaworks.redis.protocol.CommandHandler: [chid=0xb5, channel=0x9f6314e7, /10.160.12.233:37464 -> /10.88.137.23:6379] activateCommandHandlerAndExecuteBufferedCommands 1245 command(s) buffered
02 Dec 2016 13:50:36,949 [DEBUG]  (lettuce-nioEventLoop-4-165) com.lambdaworks.redis.protocol.CommandHandler: [chid=0xb5, channel=0x9f6314e7, /10.160.12.233:37464 -> /10.88.137.23:6379] activating channel handler
02 Dec 2016 13:50:36,949 [DEBUG]  (lettuce-nioEventLoop-4-165) com.lambdaworks.redis.protocol.CommandHandler: [chid=0xb5, channel=0x9f6314e7, /10.160.12.233:37464 -> /10.88.137.23:6379] flushCommands()
02 Dec 2016 13:50:36,950 [DEBUG]  (lettuce-nioEventLoop-4-165) com.lambdaworks.redis.protocol.CommandHandler: [chid=0xb5, channel=0x9f6314e7, /10.160.12.233:37464 -> /10.88.137.23:6379] flushCommands() Flushing 1245 commands

Out of 72 hosts that run Lettuce clients, about 1/2 would have this issue and the other hosts can query from new-M1 just fine so i think this is more a client side issue, not Redis.

Do you have any insight on why this is happening? could it be a problem with the way the role of the connection to S1 has been changed (slave --> master)?

Thanks,

long-xuan-nguyen · 2016-12-05T23:10:45Z

It seems like all EVALSHA requests on a slave node is re-routed to the master node, which is why we saw HGET going through and not EVALSHA (because HGET was successful on the slave node while all requests to the new-master node were just dropped).

So, i think we've narrowed down the problem to a connection to the newly-elected master where requests to this connection somehow do not make it to Redis. I have topology refresh turned on and can see new connection being spun up from the host to the new master node working just fine (the commands are CLIENT and CLUSTER).

Any suggestion on what the next step should be?

mp911de · 2016-12-06T15:52:17Z

Thanks for further investigation. I think it's important now to find out more about the flow of the redirected command. So when EVALSHA gets redirected, it's re-attempted on a (very likely) different connection. Both redirection -MOVED and -ASK provide a connection point of the target where the client should retry the command. This can be either an existing connection (because it was already used before) or a new connection.

This also means that you should be able to find the redirected command in the logs where it's retried with a different connection. Tracing these connections can be a hassle but you could introduce a command counter (command Id) in ClusterCommand for the purpose of tracking a particular command.

Something like:

class ClusterCommand {

    private static final AtomicLong counter = new AtomicLong();

    private final long commandId = counter.incrementAndGet();

    // ...

    @Override
    public String toString() {
        final StringBuilder sb = new StringBuilder();
        sb.append(getClass().getSimpleName());
        sb.append(" [command=").append(command);
        sb.append(", commandId=").append(id);
        sb.append(", redirections=").append(redirections);
        sb.append(", maxRedirections=").append(maxRedirections);
        sb.append(']');
        return sb.toString();
    }

long-xuan-nguyen · 2016-12-06T19:37:09Z

Here is the log that shows a re-direct

02 Dec 2016 13:45:43,414 [DEBUG]  (lettuce-nioEventLoop-4-78) com.lambdaworks.redis.protocol.CommandHandler: [chid=0x17b, channel=0x1f4ddb6f, /10.160.12.233:46637 -> /10.91.214.25:6379] write(ctx, ClusterCommand [command=AsyncCommand [type=EVALSHA, commandType=com.lambdaworks.redis.protocol.Command], executions=0, arg=0:1/1, promise)
02 Dec 2016 13:45:43,415 [DEBUG]  (lettuce-nioEventLoop-4-78) com.lambdaworks.redis.protocol.CommandHandler: [chid=0x17b, channel=0x1f4ddb6f, /10.160.12.233:46637 -> /10.91.214.25:6379] Received: 32 bytes, 1 queued commands
02 Dec 2016 13:45:43,415 [DEBUG]  (lettuce-nioEventLoop-4-78) com.lambdaworks.redis.protocol.CommandHandler: [chid=0x17b, channel=0x1f4ddb6f, /10.160.12.233:46637 -> /10.91.214.25:6379] Queue contains: 1 commands. Processing ClusterCommand [command=AsyncCommand [type=EVALSHA, commandType=com.lambdaworks.redis.protocol.Command], executions=0, arg=0:1/1
02 Dec 2016 13:45:43,415 [DEBUG]  (lettuce-nioEventLoop-4-78) com.lambdaworks.redis.protocol.RedisStateMachine: Decode ClusterCommand [command=AsyncCommand [type=EVALSHA, commandType=com.lambdaworks.redis.protocol.Command], executions=0, arg=0:1/1
02 Dec 2016 13:45:43,415 [DEBUG]  (lettuce-nioEventLoop-4-78) com.lambdaworks.redis.protocol.RedisStateMachine: Decoded ClusterCommand [command=AsyncCommand [type=EVALSHA, commandType=com.lambdaworks.redis.protocol.Command], executions=0, arg=0:1/1/346348897, empty stack: true
02 Dec 2016 13:45:43,415 [DEBUG]  (lettuce-nioEventLoop-4-78) com.lambdaworks.redis.cluster.PooledClusterConnectionProvider: getConnection(WRITE, 10.88.137.23, 6379)
02 Dec 2016 13:45:43,415 [DEBUG]  (lettuce-nioEventLoop-4-78) com.lambdaworks.redis.protocol.CommandHandler: [chid=0xb5, channel=0x9f92cc2f, /10.160.12.233:34722 -> /10.88.137.23:6379] Command wrtten: ClusterCommand [command=AsyncCommand [type=EVALSHA, commandType=com.lambdaworks.redis.protocol.Command], executions=1, arg=0:1/1
02 Dec 2016 13:45:43,415 [DEBUG]  (lettuce-nioEventLoop-4-78) com.lambdaworks.redis.protocol.CommandHandler: [chid=0xb5, channel=0x9f92cc2f, /10.160.12.233:34722 -> /10.88.137.23:6379] write() writeAndFlush command ClusterCommand [command=AsyncCommand [type=EVALSHA, commandType=com.lambdaworks.redis.protocol.Command], executions=1, arg=0:1/1
02 Dec 2016 13:45:43,415 [DEBUG]  (lettuce-nioEventLoop-4-78) com.lambdaworks.redis.protocol.CommandHandler: [chid=0xb5, channel=0x9f92cc2f, /10.160.12.233:34722 -> /10.88.137.23:6379] write() done
02 Dec 2016 13:45:43,415 [DEBUG]  (lettuce-nioEventLoop-4-78) com.lambdaworks.redis.protocol.CommandHandler: [chid=0x17b, channel=0x1f4ddb6f, /10.160.12.233:46637 -> /10.91.214.25:6379] Received response for ClusterCommand [command=AsyncCommand [type=EVALSHA, commandType=com.lambdaworks.redis.protocol.Command], executions=1, arg=0:1/1
02 Dec 2016 13:45:43,415 [DEBUG]  (lettuce-nioEventLoop-4-192) com.lambdaworks.redis.protocol.CommandHandler: [chid=0xb5, channel=0x9f92cc2f, /10.160.12.233:34722 -> /10.88.137.23:6379] write(ctx, ClusterCommand [command=AsyncCommand [type=EVALSHA, commandType=com.lambdaworks.redis.protocol.Command], executions=1, arg=0:1/1, promise)

First, the EVALSHA goes to a slave node (S2 with channelId=0x1f4ddb6f and IP 10.91.214.25). Then, Lettuce got a MOVED that point to the new master node (channelId = 0x9f92cc2f and IP 10.88.137.23). The requests that go to this WRITE connection (as well as the READ connection) seem to never finish after the slave promotion. Cluster topology refresh seems to be able to connect to the new master node and got a response back with no issue. Some other hosts seem to work fine as well (about 1/2 of the hosts in the fleet would have this issue)

long-xuan-nguyen · 2016-12-06T23:25:14Z

Note that i also opened an issue against Redis for not being able to run read-only Lua script on slave nodes redis/redis#3665

mp911de · 2016-12-08T10:07:58Z

That's interesting. So an existing connection to the slave S2, which gets promoted to master seems to cause the issue. The command is written to the connection but the command does not receive a response.

Can you reproduce that behavior with redis-cli?

long-xuan-nguyen · 2016-12-08T16:19:10Z

I cant reproduce with redis-cli

Here is the sequence of events:

M1 - S1 - S2
Lettuce has 2 READ connections to S1, C1 and C2. C2 is what get returned by PooledClusterConnectionProvider (not sure how C1 ever got created in the first place. Note that C1 is not used by cluster topology refresh task)
M1 went down
S1 got promoted to be M
Topology refresh occurs, new topology took effect
Lua script execution went to S2; got re-directed to S1, new M
PooledClusterConnectionProvider now returns C1 for Write intention
Execution is successful for 2 evalsha requests
Next request for ZCARD comes in. PooledClusterConnectionProvider returns C1 for READ intention
All subsequent requests to C1 never got any response back

I should also note that our Redis cluster is configured to kill idle connection after 60 seconds of inactivity. C1 had been coming in and out of channelActive()/channelInactive() state every minute before it got the first evalsha request. Every time it gets into channelInactive(), the ConnectionWatchDog's previously scheduled reconnection task kicks in and reconnect immediately.

Now, I see that between the last successful evalsha and the next unsuccessful ZCARD request, it was about 85 seconds. However, i did not see any channelInactive() event coming in. My current theory is somehow the inactive() event was not received properly on the client side (dropped packet ...) or there is some exception thrown in the channel pipeline, before it gets to CommandHandler

Do you have any suggestion on how to detect when a connection goes bad and trigger a manual reconnection? I was thinking of adding a ChannelHandler that detects when the diff b/w the last channelRead() and channelWrite(), and if this time is more than a configurable period of time, close the channel. Will that work? will that cause the connection to be completely closed and not reconnected?

Thanks,

mp911de · 2016-12-09T07:58:26Z

Thank's for the awesome detailed analysis. Connection lifecycle signals follow the TCP packets that come with lifecycle changes. If a TCP packet with FIN or RST bits is received by the client, then the connection is reset (connection reset by peer) or closed (regular closed). RST packets are sent by the server when a client is sending a packet to a closed port.

Please be aware that PooledClusterConnectionProvider returns different connections for read and write intentions.

You can plug a LoggingHandler into netty's pipeline as the first handler to enable really detailed logging of the channel activity. Having a time window of 85 seconds is very little to inspect what's going on but at least it's reproducible.

At the moment I don't have any clue besides further logging and inspection what happens on the channel. Additionally, you could TCPdump (or better, use Wireshark) on the server-side to inspect the traffic between the client and the newly elected master.

I will attempt to reproduce the issue on my side. I don't know the cause why the client either doesn't get an answer or the received data doesn't get processed.

/cc @badboy @itamarhaber: Have you seen a similar issue where connection traffic gets stuck after slave to master promotion in Redis Cluster?

badboy · 2016-12-09T09:08:28Z

I did not run Cluster for anything other than testing yet, where I never experienced such an issue. Itamar might know more or @antirez himself.

mp911de · 2016-12-09T20:55:14Z

@long-xuan-nguyen I tried to reproduce the issue by using create-cluster start/create-cluster create and killing the master to trigger failover. Slave to master promotion works quite well but I wasn't able to reproduce the command getting stuck.

The only kind of error I could reproduce was CLUSTERDOWN The cluster is down which self-healed after a few seconds.

mp911de · 2016-12-21T15:59:22Z

Any update on this ticket?

long-xuan-nguyen · 2016-12-30T21:57:26Z

Hi there,

So i turned on a little more logging to help debugging this issue. I saw this happening again yesterday. This time, there was no change in topology; i.e: all nodes were performing just fine.

Here is the sequence of events that happened:

Channel to a Redis node initialized
WRITE's to this channel went thru successfully. Everything was good for about a day and a half
Then, there is no WRITE requests for this channel for a consecutive 60 seconds. Redis configuration is to kill idle connection after 60 seconds.
The connection got killed from the server.
In the log, i can see this:

29 Dec 2016 17:44:37,302 [DEBUG] (lettuce-nioEventLoop-4-59) com.lambdaworks.redis.protocol.CommandHandler: [chid=0x51, channel=0xe7237817, /10.88.106.6:44110 -> /10.88.5.133:6379] channelInactive()
29 Dec 2016 17:44:37,302 [DEBUG] (lettuce-nioEventLoop-4-59) com.lambdaworks.redis.protocol.CommandHandler: [chid=0x51, channel=0xe7237817, /10.88.106.6:44110 -> /10.88.5.133:6379] deactivating channel handler
29 Dec 2016 17:44:37,302 [DEBUG] (lettuce-nioEventLoop-4-59) com.lambdaworks.redis.protocol.CommandHandler: [chid=0x51, channel=0xe7237817, /10.88.106.6:44110 -> /10.88.5.133:6379] channelInactive() done
29 Dec 2016 17:44:37,302 [DEBUG] (lettuce-nioEventLoop-4-59) com.lambdaworks.redis.protocol.ConnectionWatchdog: [chid=0x51, channel=0xe7237817, /10.88.106.6:44110 -> /10.88.5.133:6379] channelInactive(ChannelHandlerContext(ConnectionWatchdog#0, [id: 0xe7237817, L:/10.88.106.6:44110 ! R:/10.88.5.133:6379]))
29 Dec 2016 17:44:37,302 [DEBUG] (lettuce-nioEventLoop-4-59) com.lambdaworks.redis.protocol.ConnectionWatchdog: [chid=0x51, channel=0xe7237817, /10.88.106.6:44110 -> /10.88.5.133:6379] Reconnect scheduling disabled
29 Dec 2016 17:44:37,307 [DEBUG] (lettuce-nioEventLoop-4-59) com.lambdaworks.redis.protocol.CommandHandler: [chid=0x51, channel=0xe7237817, /10.88.106.6:44110 -> /10.88.5.133:6379] channelUnregistered()

See that "Reconnect scheduling disabled" is printed. i took a heap dump and can see that in ConnectionWatchDog, listenOnChannelInactive is true but reconnectionHandler.reconnectSuspended = false.

Also note that clientOptions.isSuspendReconnectOnProtocolFailure = false.

This indicates ClusterNodeCommandHandler.prepareClose() must have been called by PooledClusterConnectionProvider.reconfigurePartitions() previously. However, it doesn't explain why the channel is still open up until this point. Am I missing something?

FYI, i see that ConnectionWatchDog.setReconnectSuspended() does not make use of the parameter. It blindly calls reconnectionHandler.setReconnectSuspended(true). I was hoping there was some code that would call reconnectionHandler.setReconnectSuspended(false) but couldnt find any. Is there some code hidden behind reflection somewhere?

mp911de · 2016-12-31T13:18:51Z

listenOnChannelInactive=true and reconnectSuspended=false is the normal operation state. After channelUnregistered the connection is closed and no longer functional. Usually, it's cleared then also by the garbage collector. Currently, I don't follow why you consider this connection open.

prepareClose is called either by stale connection cleanup (PooledClusterConnectionProvider.reconfigurePartitions()) or as part of the connection close sequence.

If fixed the parameter issue, thanks for the hint.

long-xuan-nguyen · 2016-12-31T16:12:11Z

i am sorry ; i meant listenOnChannelInactive=true and reconnectSuspended=true.

As a result, when the connection is killed on the server side, no reconnection is re-scheduled, causing all writes to the connection to go into buffer.

long-xuan-nguyen · 2017-01-06T22:53:47Z

any idea how this could happen?

Thanks,

mp911de · 2017-01-06T23:16:05Z

I have no clue how these things happen with the given context. At this stage, the ticket scope gets quite unclear and I don't have any idea how to continue.

long-xuan-nguyen · 2017-01-07T06:54:58Z

Sorry if this has been confusing. I'll break it down.

The issue is that for some reason, ConnectionWatchDog CommandHandler that is associated to a master node is no longer schedule reconnection anymore.

Here is my setup: Redis server-side is set to kill idle connection every 60 seconds.

A connection to a master node has been working fine for a day and have regular writes going out successfully. Then during low traffic time, the connection get killed by Redis. My expectation is that the ConnectionWatchDog for that ClusterNodeCommandHandler should reconnect but I see in the log that "Reconnect scheduling disabled". As a result, all future writes to this ClusterNodeCommandHandler ends up in the buffer. I took the heap dump when this occurs and upon looking at the heap dump, I see that ConnectionWatchDog.listenOnChannelInactive == true and ConnectionWatchDog.ReconnectionHandler.reconnectSuspended == true. Now, the only instance this would happen is when ClusterNodeCommandHandler.prepareClose() has been called, based on the code.

I am not sure if there is any other instance when this happens and I am unable to root cause. Would you be able to have some insight into this issue or if there is any workaround? I was thinking to not even look at isReconnectSuspended() in ConnectionWatchDog to let it always schedule the reconnection. Would appreciate your thoughts on this.

Thanks,

mp911de · 2017-01-23T20:26:17Z

I found a possible leak in #443 which builds the stale connections list twice. It's possible #443 could play into this issue.

mp911de · 2017-12-14T10:45:55Z

Closing this one as this issue seems resolved.

mp911de added the for: stackoverflow A question that is better suited to stackoverflow.com label Dec 8, 2016

mp911de added the status: waiting-for-feedback We need additional information before we can continue label Dec 21, 2016

mp911de added a commit that referenced this issue Dec 31, 2016

Consider boolean value of ConnectionWatchdog.setReconnectSuspended #417

c46fa88

mp911de added a commit that referenced this issue Jan 27, 2017

Consider boolean value of ConnectionWatchdog.setReconnectSuspended #417

59f511a

mp911de added status: cannot-reproduce and removed status: waiting-for-feedback We need additional information before we can continue labels Feb 17, 2017

mp911de closed this as completed Dec 14, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EVALSHA broken when a master node goes down #417

EVALSHA broken when a master node goes down #417

long-xuan-nguyen commented Dec 3, 2016 •

edited by mp911de

Loading

mp911de commented Dec 3, 2016

long-xuan-nguyen commented Dec 3, 2016

mp911de commented Dec 3, 2016

long-xuan-nguyen commented Dec 3, 2016

mp911de commented Dec 4, 2016

long-xuan-nguyen commented Dec 5, 2016 •

edited by mp911de

Loading

long-xuan-nguyen commented Dec 5, 2016

mp911de commented Dec 6, 2016

long-xuan-nguyen commented Dec 6, 2016 •

edited by mp911de

Loading

long-xuan-nguyen commented Dec 6, 2016

mp911de commented Dec 8, 2016

long-xuan-nguyen commented Dec 8, 2016

mp911de commented Dec 9, 2016

badboy commented Dec 9, 2016

mp911de commented Dec 9, 2016

mp911de commented Dec 21, 2016

long-xuan-nguyen commented Dec 30, 2016 •

edited by mp911de

Loading

mp911de commented Dec 31, 2016

long-xuan-nguyen commented Dec 31, 2016

long-xuan-nguyen commented Jan 6, 2017

mp911de commented Jan 6, 2017

long-xuan-nguyen commented Jan 7, 2017

mp911de commented Jan 23, 2017

mp911de commented Dec 14, 2017

EVALSHA broken when a master node goes down #417

EVALSHA broken when a master node goes down #417

Comments

long-xuan-nguyen commented Dec 3, 2016 • edited by mp911de Loading

mp911de commented Dec 3, 2016

long-xuan-nguyen commented Dec 3, 2016

mp911de commented Dec 3, 2016

long-xuan-nguyen commented Dec 3, 2016

mp911de commented Dec 4, 2016

long-xuan-nguyen commented Dec 5, 2016 • edited by mp911de Loading

long-xuan-nguyen commented Dec 5, 2016

mp911de commented Dec 6, 2016

long-xuan-nguyen commented Dec 6, 2016 • edited by mp911de Loading

long-xuan-nguyen commented Dec 6, 2016

mp911de commented Dec 8, 2016

long-xuan-nguyen commented Dec 8, 2016

mp911de commented Dec 9, 2016

badboy commented Dec 9, 2016

mp911de commented Dec 9, 2016

mp911de commented Dec 21, 2016

long-xuan-nguyen commented Dec 30, 2016 • edited by mp911de Loading

mp911de commented Dec 31, 2016

long-xuan-nguyen commented Dec 31, 2016

long-xuan-nguyen commented Jan 6, 2017

mp911de commented Jan 6, 2017

long-xuan-nguyen commented Jan 7, 2017

mp911de commented Jan 23, 2017

mp911de commented Dec 14, 2017

long-xuan-nguyen commented Dec 3, 2016 •

edited by mp911de

Loading

long-xuan-nguyen commented Dec 5, 2016 •

edited by mp911de

Loading

long-xuan-nguyen commented Dec 6, 2016 •

edited by mp911de

Loading

long-xuan-nguyen commented Dec 30, 2016 •

edited by mp911de

Loading