Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

force-leave does not remove the failed node from the raft voters list #6856

Open
andriytk opened this issue Nov 29, 2019 · 6 comments
Open
Assignees
Labels
theme/internals Serf, Raft, SWIM, Lifeguard, Anti-Entropy, locking topics

Comments

@andriytk
Copy link

andriytk commented Nov 29, 2019

Overview of the Issue

force-leave command does not remove the failed node from the list of raft voters in a case when the quorum is lost. From another side, the graceful leave in a similar scenario does not cause the loss of the quorum in the first place and works fine.

Reproduction Steps

  1. Create a cluster with 2 server nodes and bootstrap it with -bootstrap-expect=1.
  2. Crash 1 of the nodes.
  3. Run consul force-leave <crashed-node> on the alive node.

It seems natural to expect that force-leave would remove the failed node from the raft voters and the quorum would be restored (like in the case with graceful leave), but it does not and the cluster effectively remains stuck.

(Tested on 1.5.3, 1.6.1 and 1.6.2 versions.)

@crhino crhino self-assigned this Dec 4, 2019
@crhino
Copy link
Contributor

crhino commented Dec 18, 2019

Hi @andriytk. We took a look at this issue and have some questions and thoughts.

The consul leave command is able to gracefully leave both the Serf and Raft membership because we are explicitly shutting down the local agent, and have definitive knowledge about the state. With consul force-leave, we are telling the rest of the cluster to mark a node as left. This is fine in the Serf membership, as it is eventually consistent and if the node is actually alive will rejoin the gossip. However, with Raft membership, removing a node is more final and thus we want to be more cautious about when we do so.

In the scenario you provide, it seems that the end goal is to restore a working Consul cluster. Why use force-leave to do this instead of starting up a new server node? More insight into what you are attempting to do and why would be helpful.

@andriytk
Copy link
Author

Hi @crhino, thanks for your reply.

You are right, the end goal is to restore a working Consul cluster. The problem is that we need to start the new server node with the same node name of the failed one. But Consul will refuse it since this node name is already registered by the different node id. (The symptoms are similar to #4741.) So we need some mechanism to clean up this stale node with its id from the Consul brain. It seems natural that the force-leave cmd would do this. (In the similar way as the leave cmd does it when the node is leaving gracefully.)

@crhino
Copy link
Contributor

crhino commented Jan 3, 2020

@andriytk I spent some time today reproducing this and reading the issue you linked and its associated fix. I see that the issue really boils down to the fact that the cluster ends up losing raft quorum and thus cannot update its state (otherwise #5485 would work as intended). consul force-leave has the same problem losing quorum being unable to remove the server node eventually, as you point out.

I see where you are coming from with force-leave changing the quorum size, let me think on this for a bit and get back to you. Going to dig a little into force-leave and Raft to get a better sense of what a change like this might mean.

@andriytk
Copy link
Author

andriytk commented Jan 6, 2020

the issue really boils down to the fact that the cluster ends up losing raft quorum

Yes. That's exactly matches with what I mentioned in the description.

Thank you @crhino for looking at it.

@crhino
Copy link
Contributor

crhino commented Jan 6, 2020

Just realized that in this scenario it is actually impossible for force-leave to modify the raft voters.

Modifying the raft voters equates to asking Raft to make a membership change, and membership changes must be committed in the Raft log to take effect (Raft paper Section 6). Thus, we must recover the cluster before making any changes to membership.

I think there is still an interesting situation here with failing to register a node with the same name after recovering the cluster, but I don't think force-leave can be the solution here.

@jsosulska jsosulska added the theme/internals Serf, Raft, SWIM, Lifeguard, Anti-Entropy, locking topics label Jun 2, 2020
@justinabrahms
Copy link

justinabrahms commented Jul 20, 2023

Found this issue because my cluster is in the same place. I misconfigured a consul instance with the ip of "127.0.0.1:8500". Now, I have 4 raft peers, 1 of which is that broken one. The others can't figure out a leader b/c the call to that other nodeid is failing due to timeout. There seems to be no mechanism to recover.

We have to have a mechanism to force that out of raft consensus, even if it's ssh-ing to all other participatory servers and telling them to forget that peer.

EDIT: It seems like you can recover from this. https://justin.abrah.ms/2023-07-20-consul-leader-election-issues.html (tl;dr, use peers.json)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
theme/internals Serf, Raft, SWIM, Lifeguard, Anti-Entropy, locking topics
Projects
None yet
Development

No branches or pull requests

4 participants