-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
standby_info interferes with cluster recovery #810
Comments
@wereHamster The fact that A cannot join the cluster is an expected behavior for now, because A needs to get some metadata when switching to peer mode. |
I don't know the exact sequence of actions, but I was able to reliably reproduce locally with three running etcd nodes and killing/restarting them randomly. Shouldn't B first try to start in peer mode before falling back to standby? Otherwise you have a classical deadlock. A is waiting for B and B is waiting for A. |
@wereHamster I start three machines, and play a little with them:
Theoretically, It should fall into standby mode only when the cluster asks it to do so, or it was in standby mode before killed before. |
I got a three node cluster into a state where node A has
standby_info
with Running:true so it always starts in standby mode, and node B thinks A,B are the only surviving cluster nodes. B starts in peer mode and waits for the other nodes to join, so it can elect a leader. But A never joins because it remains in standby mode (it waits until the cluster has a leader, WARNING: fail getting leader from cluster (nodeA,nodeB)). If I deletestandby_info
on node A then it starts in peer mode and the cluster recovers.The text was updated successfully, but these errors were encountered: