You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As we used redis cluster this year, at several times we found that the slot mapping from each node could become inconsistent after fixing failed slot migration. The reason of migration failure may sometimes be that the machine serving redis was down or the resharding script was forced to stop. And after clearing the importing and migrating flag and waiting for a long time the cluster couldn't fix this inconsistency automatically.
We should have keep some broken clusters from real life but we didn't... Here I will give two ways to build an inconsistent cluster. Hope that we can encounter this problem again so that we can find out what exactly happened to the broken cluster.
The version we are using is 3.0.4 and 3.0.7.
This problem may be relevant to issues #3442#2969.
Inconsistent Cluster
How To Build It
At first we have an consistent cluster with three nodes A, B, and C. The important part here is epoch A < epoch B < epoch C.
Then you will find even though both node A and B think slot 10922 belongs to node A now, but node C still insist that 10922 belongs to node B. Then no matter how long you wait, it just stays still.
In the slot table of node C, the slot 10922 is associated with node B and its epoch, which is 2 here. After node A get slot 10922, it tell it to node C with its epoch 1 but node C reject it for its lower epoch.
I'm not sure whether this is by design. It seems that Redis Cluster use a unique epoch for each node and the consensus is based on these rules:
Cluster should never change if epoch doesn't change.
Each epoch is bound to at most one specific cluster configuration change.
The example above violate the first rule. Well, we should never use cluster setslot [slot] node casually. I think our slot inconsistency problem is mostly caused by using setslot node to a node with low epoch. However, the second rule is out of our control.
Epoch Collision
In some rare cases, the same epoch in a cluster can correspond to multiple configuration changes. Here's an example. let's say there's a node A importing slot from node B. When the setslot node is running on node A after migration, and at the same time node B is doing something that will bump its epoch, such as fixing epoch collision or dealing setslot node itself (even though it's nonsense to do that), they bump their own epoch to the same one. Now if the node id of node B is less than node A, node B bump a higher epoch and may spread it out to the cluster before node A. Finally the inconsistent state mentioned above are produced.
Actually I only reproduced it by sending cluster bumpepoch on node B to simulate this case, and I don't think it's the main reason.
I don't have any corrupted cluster at hand now to analyse. Usually the broken cluster is produced by fixing slot after resharding failure. We used our own fixing script which clear the importing, migrating flags and restart migration even if there's only importing or migrating flag. Most of the time it got the job done because the gossip protocol of redis may just be blocked by the importing flags in clusterUpdateSlotsConfigWith. But we did suffered from this slot inconsistency problem several times and it's quite hard to fix it by hand. Maybe it's a result of misusing cluster setslot node or there's something wrong before migration which lead to the epoch collision problem.
Anyway I think it make no sense to allow inconsistency to exist forever in a cluster with gossip. Is it possible to fix the slot inconsistency by deleting the slot binding if the former node declares that it doesn't own the slot any more?
The text was updated successfully, but these errors were encountered:
Hi, i have the same problem on redis-cluster 4.0.8. My cluster configured as 19 shards (57 nodes total), network latency ~1,76ms.
Problem was not reproduced by redis-trib, cause redis-trib processed nodes one-by-one (migrate all slots from one node, then migrate all slots from next node).
To reproduce this bug, i'm write a script (on error, an exception will occur):
migration method: https://pastebin.com/QZLpNyv7
migration script: https://pastebin.com/htvGJU5i
log tail (at the begin of the log, messages are the same): https://pastebin.com/fJJmGR2T
I've also seen this error with Redis version 3.2.8. I can confirm that this is caused by resharding failures and attempting to manually migrate slots after said failure.
Hi:
As we used redis cluster this year, at several times we found that the slot mapping from each node could become inconsistent after fixing failed slot migration. The reason of migration failure may sometimes be that the machine serving redis was down or the resharding script was forced to stop. And after clearing the importing and migrating flag and waiting for a long time the cluster couldn't fix this inconsistency automatically.
We should have keep some broken clusters from real life but we didn't... Here I will give two ways to build an inconsistent cluster. Hope that we can encounter this problem again so that we can find out what exactly happened to the broken cluster.
The version we are using is 3.0.4 and 3.0.7.
This problem may be relevant to issues #3442 #2969.
Inconsistent Cluster
How To Build It
At first we have an consistent cluster with three nodes A, B, and C. The important part here is
epoch A < epoch B < epoch C
.Now use
cluster setslot node
to change slot 10922 from node B to node A. Note that we must change node B first since it has larger epoch than node A.Then you will find even though both node A and B think slot 10922 belongs to node A now, but node C still insist that 10922 belongs to node B. Then no matter how long you wait, it just stays still.
Why
In the slot table of node C, the slot 10922 is associated with node B and its epoch, which is 2 here. After node A get slot 10922, it tell it to node C with its epoch 1 but node C reject it for its lower epoch.
I'm not sure whether this is by design. It seems that Redis Cluster use a unique epoch for each node and the consensus is based on these rules:
The example above violate the first rule. Well, we should never use
cluster setslot [slot] node
casually. I think our slot inconsistency problem is mostly caused by usingsetslot node
to a node with low epoch. However, the second rule is out of our control.Epoch Collision
In some rare cases, the same epoch in a cluster can correspond to multiple configuration changes. Here's an example. let's say there's a node A importing slot from node B. When the
setslot node
is running on node A after migration, and at the same time node B is doing something that will bump its epoch, such as fixing epoch collision or dealingsetslot node
itself (even though it's nonsense to do that), they bump their own epoch to the same one. Now if the node id of node B is less than node A, node B bump a higher epoch and may spread it out to the cluster before node A. Finally the inconsistent state mentioned above are produced.Actually I only reproduced it by sending
cluster bumpepoch
on node B to simulate this case, and I don't think it's the main reason.The Problem In Real Life
I don't have any corrupted cluster at hand now to analyse. Usually the broken cluster is produced by fixing slot after resharding failure. We used our own fixing script which clear the importing, migrating flags and restart migration even if there's only importing or migrating flag. Most of the time it got the job done because the gossip protocol of redis may just be blocked by the importing flags in
clusterUpdateSlotsConfigWith
. But we did suffered from this slot inconsistency problem several times and it's quite hard to fix it by hand. Maybe it's a result of misusingcluster setslot node
or there's something wrong before migration which lead to the epoch collision problem.Anyway I think it make no sense to allow inconsistency to exist forever in a cluster with gossip. Is it possible to fix the slot inconsistency by deleting the slot binding if the former node declares that it doesn't own the slot any more?
The text was updated successfully, but these errors were encountered: