Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix cluster inconsistent slots mappings after slot migrating and failover occur simultaneously #12336

Closed
wants to merge 4 commits into from

Conversation

cyningsun
Copy link

Background

Node A(Master) update clusterState based on clusterMsg from another Node B(Master). This is a chance to cause slot mapping conflict between Node A and Node B, When:

  • Node C(Master) was migrated(SETSLOT) Slot S from Node B
  • Node B received updating from Node C. It's configEpoch collision with another node Node D(Master). Node B's node id is smaller, so Node B bumps to new configEpoch, which is maximum in the cluster.
  • Node A received PONG from Node B. POV of Node A: new configEpoch of Node B is updated but Slot S is not updated because it didn't belong to Node A anymore
  • Node C(Master), which was assigned this slot, POV of Node A failed to assign it to Node C because its config epoch(senderConfigEpoch) is smaller than server.cluster->slots[j]->configEpoch.

As shown in the figure:

image

How to fix it

Skip updating the sender's configEpoch and Slots before the migrated slot is claimed in POV.

configEpoch acts as the guard of slots it owns in POV. If there is a migrated slot of the sender in POV pending claim by a new master, the sender's config epoch in POV cannot be updated separately.

Otherwise, the slot will be guarded by a new config epoch of the previous master. This config epoch maybe has been bumped after this slot was assigned to another master.

The new master will fail to claim the slot because senderConfigEpoch is smaller than server.cluster->slots[j]->configEpoch.

Example

Here is a cluster that has 5 shards, and each shard has 2 replicas, shown below before slot migrating.

master nodeid master node config epoch current epoch slot 0 owner (configEpoch) slot 16383 owner (configEpoch) slave nodeid slave node
6ea4c2f8e7efe9180db83cf0adf1b055a557c74d 10.53.52.144:6379 5 8 10.53.52.144:6379 (5) 10.53.52.144:6375 (1) 626af417d334896000f5e23e6305f29242945948 10.53.52.144:6373
d69ae698cb4ec7af73e8c15eb64d6f8b6cde4f59 10.53.52.144:6378 2 8 10.53.52.144:6379 (5) 10.53.52.144:6375 (1) 95e75b159b3bcb7c33a8df9db93b5443413dfccb 10.53.52.144:6377
0c92e0b02f839260b508a6fdf258539a26e96903 10.53.52.144:6376 8 8 10.53.52.144:6379 (5) 10.53.52.144:6375 (1) 900e489fcf28e0648272fc6c8b61f0bd998a01a0 10.53.52.144:6371
f567e9d0fb1ccbdfbfc6a5c505e1d84d224de0ef 10.53.52.144:6374 3 8 10.53.52.144:6379 (5) 10.53.52.144:6375 (1) f65353c097d7f7e0f44644a28ddb980c21fc37bf 10.53.52.144:6380
c013fefabf931b77a5488e4e0fe1e81db1eaf4aa 10.53.52.144:6375 1 8 10.53.52.144:6379 (5) 10.53.52.144:6375 (1) 2ddaf93f230facc9893bc95bf9234a3d580d0291 10.53.52.144:6372

Operations

  • Step 1: Slot 0 is migrated from 6ea4c2f8e7efe9180db83cf0adf1b055a557c74d 10.53.52.144:6379 to d69ae698cb4ec7af73e8c15eb64d6f8b6cde4f59 10.53.52.144:6378
  • Step 2: Manual failover is sent to 2ddaf93f230facc9893bc95bf9234a3d580d0291 10.53.52.144:6372
  • Step 3: Slot 16383 is migrated from c013fefabf931b77a5488e4e0fe1e81db1eaf4aa 10.53.52.144:6375 to 0c92e0b02f839260b508a6fdf258539a26e96903 10.53.52.144:6376

Let's focus on how Slot 16383 is conflicted in the cluster

Timeline

1. failover election for 2ddaf93f230facc9893bc95bf9234a3d580d0291 10.53.52.144:6372

failover_auth_epoch = 9

2. After Slot 0 is SETSLOT to d69ae698cb4ec7af73e8c15eb64d6f8b6cde4f59 10.53.52.144:6378

master nodeid master node config epoch current epoch slot 0 owner (configEpoch) slot 16383 owner (configEpoch) slave nodeid slave node
6ea4c2f8e7efe9180db83cf0adf1b055a557c74d 10.53.52.144:6379 5 8 10.53.52.144:6379 (5) 10.53.52.144:6375 (1) 626af417d334896000f5e23e6305f29242945948 10.53.52.144:6373
d69ae698cb4ec7af73e8c15eb64d6f8b6cde4f59 10.53.52.144:6378 9 9 10.53.52.144:6378 (9) 10.53.52.144:6375 (1) 95e75b159b3bcb7c33a8df9db93b5443413dfccb 10.53.52.144:6377
0c92e0b02f839260b508a6fdf258539a26e96903 10.53.52.144:6376 8 8 10.53.52.144:6379 (5) 10.53.52.144:6375 (1) 900e489fcf28e0648272fc6c8b61f0bd998a01a0 10.53.52.144:6371
f567e9d0fb1ccbdfbfc6a5c505e1d84d224de0ef 10.53.52.144:6374 3 8 10.53.52.144:6379 (5) 10.53.52.144:6375 (1) f65353c097d7f7e0f44644a28ddb980c21fc37bf 10.53.52.144:6380
c013fefabf931b77a5488e4e0fe1e81db1eaf4aa 10.53.52.144:6375 1 8 10.53.52.144:6379 (5) 10.53.52.144:6375 (1) 2ddaf93f230facc9893bc95bf9234a3d580d0291 10.53.52.144:6372

3. After slot 16383 is SETSLOT to 0c92e0b02f839260b508a6fdf258539a26e96903 10.53.52.144:6376

After 0c92e0b02f839260b508a6fdf258539a26e96903 10.53.52.144:6376 receiving PONG send by d69ae698cb4ec7af73e8c15eb64d6f8b6cde4f59 10.53.52.144:6378, configEpoch,currentEpoch is 9+1

master nodeid master node config epoch current epoch slot 0 owner (configEpoch) slot 16383 owner (configEpoch) slave nodeid slave node
6ea4c2f8e7efe9180db83cf0adf1b055a557c74d 10.53.52.144:6379 5 8 10.53.52.144:6379 (5) 10.53.52.144:6375 (1) 626af417d334896000f5e23e6305f29242945948 10.53.52.144:6373
d69ae698cb4ec7af73e8c15eb64d6f8b6cde4f59 10.53.52.144:6378 9 9 10.53.52.144:6378 (9) 10.53.52.144:6375 (1) 95e75b159b3bcb7c33a8df9db93b5443413dfccb 10.53.52.144:6377
0c92e0b02f839260b508a6fdf258539a26e96903 10.53.52.144:6376 10 10 10.53.52.144:6378 (9) 10.53.52.144:6376 (10) 900e489fcf28e0648272fc6c8b61f0bd998a01a0 10.53.52.144:6371
f567e9d0fb1ccbdfbfc6a5c505e1d84d224de0ef 10.53.52.144:6374 3 8 10.53.52.144:6379 (5) 10.53.52.144:6375 (1) f65353c097d7f7e0f44644a28ddb980c21fc37bf 10.53.52.144:6380
c013fefabf931b77a5488e4e0fe1e81db1eaf4aa 10.53.52.144:6375 1 8 10.53.52.144:6379 (5) 10.53.52.144:6375 (1) 2ddaf93f230facc9893bc95bf9234a3d580d0291 10.53.52.144:6372

4. After Failover is authed to 2ddaf93f230facc9893bc95bf9234a3d580d0291 10.53.52.144:6372

master nodeid master node config epoch current epoch slot 0 owner (configEpoch) slot 16383 owner (configEpoch) slave nodeid slave node
6ea4c2f8e7efe9180db83cf0adf1b055a557c74d 10.53.52.144:6379 5 8 10.53.52.144:6379 (5) 10.53.52.144:6375 (1) 626af417d334896000f5e23e6305f29242945948 10.53.52.144:6373
d69ae698cb4ec7af73e8c15eb64d6f8b6cde4f59 10.53.52.144:6378 9 9 10.53.52.144:6378 (9) 10.53.52.144:6375 (1) 95e75b159b3bcb7c33a8df9db93b5443413dfccb 10.53.52.144:6377
0c92e0b02f839260b508a6fdf258539a26e96903 10.53.52.144:6376 10 10 10.53.52.144:6378 (9) 10.53.52.144:6376 (10) 900e489fcf28e0648272fc6c8b61f0bd998a01a0 10.53.52.144:6371
f567e9d0fb1ccbdfbfc6a5c505e1d84d224de0ef 10.53.52.144:6374 3 8 10.53.52.144:6379 (5) 10.53.52.144:6375 (1) f65353c097d7f7e0f44644a28ddb980c21fc37bf 10.53.52.144:6380
2ddaf93f230facc9893bc95bf9234a3d580d0291 10.53.52.144:6372 9 9 10.53.52.144:6379 (5) 10.53.52.144:6372 (9) c013fefabf931b77a5488e4e0fe1e81db1eaf4aa 10.53.52.144:6375

5. After other nodes receive PONG send by 2ddaf93f230facc9893bc95bf9234a3d580d0291 10.53.52.144:6372

master nodeid master node config epoch current epoch slot 0 owner (configEpoch) slot 16383 owner (configEpoch) slave nodeid slave node
6ea4c2f8e7efe9180db83cf0adf1b055a557c74d 10.53.52.144:6379 5 9 10.53.52.144:6379 (5) 10.53.52.144:6372 (9) 626af417d334896000f5e23e6305f29242945948 10.53.52.144:6373
d69ae698cb4ec7af73e8c15eb64d6f8b6cde4f59 10.53.52.144:6378 9 9 10.53.52.144:6378 (9) 10.53.52.144:6372 (9) 95e75b159b3bcb7c33a8df9db93b5443413dfccb 10.53.52.144:6377
0c92e0b02f839260b508a6fdf258539a26e96903 10.53.52.144:6376 10 10 10.53.52.144:6378 (9) 10.53.52.144:6376 (10) 900e489fcf28e0648272fc6c8b61f0bd998a01a0 10.53.52.144:6371
f567e9d0fb1ccbdfbfc6a5c505e1d84d224de0ef 10.53.52.144:6374 3 9 10.53.52.144:6379 (5) 10.53.52.144:6372 (9) f65353c097d7f7e0f44644a28ddb980c21fc37bf 10.53.52.144:6380
2ddaf93f230facc9893bc95bf9234a3d580d0291 10.53.52.144:6372 9 9 10.53.52.144:6379 (5) 10.53.52.144:6372 (9) c013fefabf931b77a5488e4e0fe1e81db1eaf4aa 10.53.52.144:6375

6. After 2ddaf93f230facc9893bc95bf9234a3d580d0291 10.53.52.144:6372 receiving PONG send by 0c92e0b02f839260b508a6fdf258539a26e96903 10.53.52.144:6376

master nodeid master node config epoch current epoch slot 0 owner (configEpoch) slot 16383 owner (configEpoch) slave nodeid slave node
6ea4c2f8e7efe9180db83cf0adf1b055a557c74d 10.53.52.144:6379 5 8 10.53.52.144:6379 (5) 10.53.52.144:6372 (9) 626af417d334896000f5e23e6305f29242945948 10.53.52.144:6373
d69ae698cb4ec7af73e8c15eb64d6f8b6cde4f59 10.53.52.144:6378 9 9 10.53.52.144:6378 (9) 10.53.52.144:6372 (9) 95e75b159b3bcb7c33a8df9db93b5443413dfccb 10.53.52.144:6377
0c92e0b02f839260b508a6fdf258539a26e96903 10.53.52.144:6376 10 10 10.53.52.144:6378 (9) 10.53.52.144:6376 (10) 900e489fcf28e0648272fc6c8b61f0bd998a01a0 10.53.52.144:6371
f567e9d0fb1ccbdfbfc6a5c505e1d84d224de0ef 10.53.52.144:6374 3 8 10.53.52.144:6379 (5) 10.53.52.144:6372 (9) f65353c097d7f7e0f44644a28ddb980c21fc37bf 10.53.52.144:6380
2ddaf93f230facc9893bc95bf9234a3d580d0291 10.53.52.144:6372 9 10 10.53.52.144:6378 (9) 10.53.52.144:6376 (10) c013fefabf931b77a5488e4e0fe1e81db1eaf4aa 10.53.52.144:6375

7. 2ddaf93f230facc9893bc95bf9234a3d580d0291 10.53.52.144:6372 resolve configEpoch Collision and send PONG msg

  • configEpoch Collision between 2ddaf93f230facc9893bc95bf9234a3d580d0291 10.53.52.144:6372 and d69ae698cb4ec7af73e8c15eb64d6f8b6cde4f59 10.53.52.144:6378
  • 2ddaf93f230facc9893bc95bf9234a3d580d0291 10.53.52.144:6372 has smaller Node ID

slot 0 and 16383 ownership on other nodes will not be affected because myslots in clusterMsg header of 2ddaf93f230facc9893bc95bf9234a3d580d0291 10.53.52.144:6372 don't have those two slots.

master nodeid master node config epoch current epoch slot 0 owner (configEpoch) slot 16383 owner (configEpoch) slave nodeid slave node
6ea4c2f8e7efe9180db83cf0adf1b055a557c74d 10.53.52.144:6379 5 11 10.53.52.144:6379 (5) 10.53.52.144:6372 (11) 626af417d334896000f5e23e6305f29242945948 10.53.52.144:6373
d69ae698cb4ec7af73e8c15eb64d6f8b6cde4f59 10.53.52.144:6378 9 11 10.53.52.144:6378 (9) 10.53.52.144:6372 (11) 95e75b159b3bcb7c33a8df9db93b5443413dfccb 10.53.52.144:6377
0c92e0b02f839260b508a6fdf258539a26e96903 10.53.52.144:6376 10 11 10.53.52.144:6378 (9) 10.53.52.144:6376 (10) 900e489fcf28e0648272fc6c8b61f0bd998a01a0 10.53.52.144:6371
f567e9d0fb1ccbdfbfc6a5c505e1d84d224de0ef 10.53.52.144:6374 3 11 10.53.52.144:6379 (5) 10.53.52.144:6372 (11) f65353c097d7f7e0f44644a28ddb980c21fc37bf 10.53.52.144:6380
2ddaf93f230facc9893bc95bf9234a3d580d0291 10.53.52.144:6372 11 11 10.53.52.144:6378 (9) 10.53.52.144:6376 (10) c013fefabf931b77a5488e4e0fe1e81db1eaf4aa 10.53.52.144:6375

8. 0c92e0b02f839260b508a6fdf258539a26e96903 10.53.52.144:6376 is reset by CLUSTERMSG_TYPE_UPDATE message

master nodeid master node config epoch current epoch slot 0 owner (configEpoch) slot 16383 owner (configEpoch) slave nodeid slave node
6ea4c2f8e7efe9180db83cf0adf1b055a557c74d 10.53.52.144:6379 5 11 10.53.52.144:6379 (5) 10.53.52.144:6372 (11) 626af417d334896000f5e23e6305f29242945948 10.53.52.144:6373
d69ae698cb4ec7af73e8c15eb64d6f8b6cde4f59 10.53.52.144:6378 9 11 10.53.52.144:6378 (9) 10.53.52.144:6372 (11) 95e75b159b3bcb7c33a8df9db93b5443413dfccb 10.53.52.144:6377
0c92e0b02f839260b508a6fdf258539a26e96903 10.53.52.144:6376 10 11 10.53.52.144:6378 (9) 10.53.52.144:6372 (11) 900e489fcf28e0648272fc6c8b61f0bd998a01a0 10.53.52.144:6371
f567e9d0fb1ccbdfbfc6a5c505e1d84d224de0ef 10.53.52.144:6374 3 11 10.53.52.144:6379 (5) 10.53.52.144:6372 (11) f65353c097d7f7e0f44644a28ddb980c21fc37bf 10.53.52.144:6380
2ddaf93f230facc9893bc95bf9234a3d580d0291 10.53.52.144:6372 11 11 10.53.52.144:6378 (9) 10.53.52.144:6376 (10) c013fefabf931b77a5488e4e0fe1e81db1eaf4aa 10.53.52.144:6375

9. Slot 0 finished Gossip among cluster

master nodeid master node config epoch current epoch slot 0 owner (configEpoch) slot 16383 owner (configEpoch) slave nodeid slave node
6ea4c2f8e7efe9180db83cf0adf1b055a557c74d 10.53.52.144:6379 5 11 10.53.52.144:6378 (9) 10.53.52.144:6372 (11) 626af417d334896000f5e23e6305f29242945948 10.53.52.144:6373
d69ae698cb4ec7af73e8c15eb64d6f8b6cde4f59 10.53.52.144:6378 9 11 10.53.52.144:6378 (9) 10.53.52.144:6372 (11) 95e75b159b3bcb7c33a8df9db93b5443413dfccb 10.53.52.144:6377
0c92e0b02f839260b508a6fdf258539a26e96903 10.53.52.144:6376 10 11 10.53.52.144:6378 (9) 10.53.52.144:6372 (11) 900e489fcf28e0648272fc6c8b61f0bd998a01a0 10.53.52.144:6371
f567e9d0fb1ccbdfbfc6a5c505e1d84d224de0ef 10.53.52.144:6374 3 11 10.53.52.144:6378 (9) 10.53.52.144:6372 (11) f65353c097d7f7e0f44644a28ddb980c21fc37bf 10.53.52.144:6380
2ddaf93f230facc9893bc95bf9234a3d580d0291 10.53.52.144:6372 11 11 10.53.52.144:6378 (9) 10.53.52.144:6376 (10) c013fefabf931b77a5488e4e0fe1e81db1eaf4aa 10.53.52.144:6375

Result

The cluster has inconsistent slots mapping about slot 16383, between 2ddaf93f230facc9893bc95bf9234a3d580d0291 10.53.52.144:6372 and other nodes

@zuiderkwast
Copy link
Contributor

zuiderkwast commented Jun 21, 2023

@PingXie is this covered by (or affected by) your PR? #10517

@madolson
Copy link
Contributor

@zuiderkwast I don't think so, but I believe we observed this internally and someone might produce a fix soonish.

@zuiderkwast
Copy link
Contributor

I don't think so, but I believe we observed this internally and someone might produce a fix soonish.

@madolson This PR is a fix to the problem. Do you have another fix in mind?

@cyningsun Are you able to reproduce the problem consistently? It is always good to have a test.

There is a test suite tests/cluster/tests/21-many-slot-migration.tcl which starts like this:

# Tests for many simultaneous migrations.

# TODO: Test is currently disabled until it is stabilized (fixing the test
# itself or real issues in Redis).

if {false} {

source "../tests/includes/init-tests.tcl"
source "../tests/includes/utils.tcl"

# TODO: This test currently runs without replicas, as failovers (which may
# happen on lower-end CI platforms) are still not handled properly by the
# cluster during slot migration (related to #6339).

Maybe this test suite can be enabled with your fix. Delete the if {false} { (the first TODO) and add replicas and make sure some failover happens during migration (the second TODO). What do you think?

@madolson
Copy link
Contributor

@madolson This PR is a fix to the problem. Do you have another fix in mind?

hehe, yes, it does fix a problem. I think the problem is more pervasive though, and can happen after the slot migration can complete and the epoch is bumped. I'm still trying to wrap my head around if they are complementary fixes, we need both, or if the one we are looking at internally solves this, maybe we just need one.

@cyningsun
Copy link
Author

cyningsun commented Jun 23, 2023

Sorry @zuiderkwast , Currently, I can't reproduce this problem consistently. I wrote a demo and some scripts to simulate the entire process that happened in our production environment. It relies on several random conditions:

  • source node failover must start before SETSLOT to target node and is authed after it
  • source node PONG with migrated slots to other nodes before target node
  • node id of the source node must be smaller than the target node, making config epoch collision happen, and the source node eventually has the largest configEpoch

using my script, the probability of the problem occurring is around 10%.

I did notice this test case. Just as @madolson said, only this specific issue has been fixed. It's uncertain whether there are more general issues that can be fixed once and for all. That's why I didn't complete these two TODOs. If suitable, I will.

@srgsanky
Copy link
Contributor

srgsanky commented Jun 23, 2023

Thanks @madolson! We observed a similar behavior internally. The main difference between the scenario that @cyningsun posted and what we observed is

  • The above scenario did a manual failover on another shard that caused source to bump the epoch to resolve configEpoch collision
  • In our case, addition of new shard bumped the epoch on the source to resolve configEpoch collision.

Here is an example scenario: The key detail is that the slot migration source interacts with Node A (a third observer node) before the slot migration target interacted with Node A.

Not processing loss of slot ownership

Here is the code for the above diagram Not processing loss of slot ownership.txt

The gist of the fix is to process loss of slot ownership - when a node no longer claims owning a slot, we need to record that. We added an else block to https://github.com/redis/redis/blob/unstable/src/cluster.c#L2244-L2245

Will send a PR later today.

@PingXie
Copy link
Contributor

PingXie commented Jun 25, 2023

@PingXie is this covered by (or affected by) your PR? #10517

@madolson is right. My PR handles a SPOF during the slot migration but this is a race condition between the entire cluster learning about the new slot owner and the old owner getting a configEpoch bump, whether through consensus or consensus-less means. This race condition happens after the slot migration has been successfully completed. Apart from the scenarios mentioned by @cyningsun (manual failover) and @srgsanky (conflicting epochs on the new node), I believe an automatic failover in node 3's shard (using @srgsanky's illustration) would have the same effect.

Regarding the fix, I am thinking whether node 1 should clear the ownership for slot 1200 if it is no longer claimed by node 4, when handling the PING message from node 4 in clusterUpdateSlotsConfigWith. This would allow node 2 to have an opportunity to "reclaim" slot 1200 when its turn comes.

@srgsanky
Copy link
Contributor

Regarding the fix, I am thinking whether node 1 should clear the ownership for slot 1200 if it is no longer claimed by node 4, when handling the PING message from node 4 in clusterUpdateSlotsConfigWith. This would allow node 2 to have an opportunity to "reclaim" slot 1200 when its turn comes.

@PingXie our fix is similar. There is one catch though - if node 1 clears the ownership information of the slot, clients can temporarily see errors at the end of slot migration in the common case until the target broadcasts the change in ownership.

  • CLUSTERDOWN Hash slot not served
  • CLUSTERDOWN The cluster is down

Not all clients will retry on these kinds of failures. We need to fix the uncommon race condition and we don't want to cause errors in the common case. So it might be better to redirect clients even if we are not sure about the ownership change. When the client connects to the redirected node, it might correct the client (one more redirect) or give the Hash slot not served failure. This kind of false information propagation is benign - no data is lost.

But, we strictly don't want this node to update other nodes that the source still owns the slot (when source is no longer claiming it). If we don't prevent this, it leads to propagation of misinformation that can make owner of the slot to purge the slot, losing data. This is destructive.

We added a bitfield in cluster state to track the slots for which the owner is no longer claiming the slot. If a different node claims the slot that the previous owner is no longer claiming, we shouldn't try to update the node.

One side effect of this behavior is that when a slot is deleted, this information is not propagated to all nodes. When cluster-require-full-coverage is set to yes, only the node that owned the deleted slot will report a cluster state of FAIL whereas the rest of the nodes in the cluster will report OK. This is reasonable as we normally don't delete slots without adding them to some other shard. This is the behavior even without the changes in the PR.

I have opened a PR with the proposed change #12344

@PingXie
Copy link
Contributor

PingXie commented Jun 25, 2023

Make sense, @srgsanky. I like the "tombstone" idea. Worst case, clients get redirected a second time on node 3 (client -> node 1 -> node 3 -> node 2) but it is all transient and short-lived. Once the cluster converges on the topology, everything will be back to normal.

@cyningsun
Copy link
Author

@PingXie @srgsanky I’d like to add some considerations why PR skip the updating instead of temporarily removing slots from POV

  • Compatibility: New client connection established with nodes during slots temporarily removed, unable to figure out with node to request which hashed to this removed slots. Old configuration doesn’t have such concerns because of MOVED redirection
  • Cohesion: When slots migrate, slots always change together with configEpoch bumped at target nodes, but hard to figure out such relevance in clusterProcessPacket when nodes update POV. That's why this bug occurs. It consists of two parts in two places:L2676: updating configEpoch directly without considering impaction on slots and
    L2262: comparison before updating slots in POV.
  • Trouble-shooting: Old configurations carry slots' history, which helped me to present what happened. Removing slots will clear that information
  • Simplicity: Currently it's one-way slots updating in POV by new masters. Introducing tombstone and two-way(old-masters&new-masters) increases complexity

However, there is also a negative effect

  • Slow down agreement about configuration a bit, because configEpoch in POV is not updated when skipping somePONG messages which are sent by old masters when configEpoch collisions

@PingXie
Copy link
Contributor

PingXie commented Jun 26, 2023

@cyningsun, skipping clusterUpdateSlotsConfigWith is a big deal that would break our efforts to improve the slot migration reliability in general (please see PR #10517). The whole idea of this tombstone bitmap is to achieve exactly what you called out above without the downsides of slowing down the topology convergence and hindering further improvements in the slot migration area. More specifically, it does not clear out the ownership information (on slot 1200) when processing node 3's PING message that has a configEpoch of 4. There is actually more context/history preserved with PR #12344 because it doesn't skip processing any cluster messages.

@zuiderkwast
Copy link
Contributor

Is this PR obsolete now that #12344 is merged?

@madolson
Copy link
Contributor

madolson commented Jul 7, 2023

Is this PR obsolete now that #12344 is merged?

I do think it's obsolete, but it would be great if @cyningsun can either write a test or validate it does solve their use case as well.

@cyningsun
Copy link
Author

Is this PR obsolete now that #12344 is merged?

I do think it's obsolete, but it would be great if @cyningsun can either write a test or validate it does solve their use case as well.

Issue resolved too. Thank you all for your help. 👍

@madolson madolson closed this Jul 7, 2023
@madolson
Copy link
Contributor

madolson commented Jul 7, 2023

@cyningsun Thank you for taking the effort to report the fix!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants