New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Redis cluster rebalance fails when trying to shard away from large number of nodes #4592
Comments
I have exactly the same problem. |
Ping @artix75 |
Reproduced during removal of a single node from a 3-node cluster (no replication):
add-node exited with code 0
rebalance exited with code 0
rebalance exited with code 1 after error during move of final slot.
|
We were able to work around this issue by modifying That said, I'm not sure how or why this works... With this change, all slots are removed from the node, it is converted to a slave, and can safely be removed from the cluster and terminated. But... if the node doesn't change from a master to a slave until it has no slots assigned, then |
…, which will crash if the node in question has become a slave
redis version 3.2.10, 4.0.6
redis gem version 3.3.3
I'm seeing the rebalance command fail when moving slots away from nodes in a 60 node redis cluster (no replication), the error is:
For some reason one of the nodes gets converted to a slave and so SETSLOT fails, I then have to remove the slave node and try again.
I'm using the redis-trib
rebalance
command to move hash slots away from nodes for the purposes of scaling down, e.g. scaling down by 3 nodes, pick 3 nodes A, B and C, the command would look like:I've only seen this happen with larger clusters 50+ nodes and it seems to be more likely to happen the more nodes you're assigning weight zero, but I have still seen it happen when removing 10 nodes from a 60 node cluster.
The workaround seems to be to try again with a smaller number of nodes set to weight zero.
The text was updated successfully, but these errors were encountered: