force-leave does not remove the failed node from the raft voters list #6856

andriytk · 2019-11-29T19:56:36Z

Overview of the Issue

force-leave command does not remove the failed node from the list of raft voters in a case when the quorum is lost. From another side, the graceful leave in a similar scenario does not cause the loss of the quorum in the first place and works fine.

Reproduction Steps

Create a cluster with 2 server nodes and bootstrap it with -bootstrap-expect=1.
Crash 1 of the nodes.
Run consul force-leave <crashed-node> on the alive node.

It seems natural to expect that force-leave would remove the failed node from the raft voters and the quorum would be restored (like in the case with graceful leave), but it does not and the cluster effectively remains stuck.

(Tested on 1.5.3, 1.6.1 and 1.6.2 versions.)

The text was updated successfully, but these errors were encountered:

crhino · 2019-12-18T21:25:17Z

Hi @andriytk. We took a look at this issue and have some questions and thoughts.

The consul leave command is able to gracefully leave both the Serf and Raft membership because we are explicitly shutting down the local agent, and have definitive knowledge about the state. With consul force-leave, we are telling the rest of the cluster to mark a node as left. This is fine in the Serf membership, as it is eventually consistent and if the node is actually alive will rejoin the gossip. However, with Raft membership, removing a node is more final and thus we want to be more cautious about when we do so.

In the scenario you provide, it seems that the end goal is to restore a working Consul cluster. Why use force-leave to do this instead of starting up a new server node? More insight into what you are attempting to do and why would be helpful.

andriytk · 2019-12-18T23:12:11Z

Hi @crhino, thanks for your reply.

You are right, the end goal is to restore a working Consul cluster. The problem is that we need to start the new server node with the same node name of the failed one. But Consul will refuse it since this node name is already registered by the different node id. (The symptoms are similar to #4741.) So we need some mechanism to clean up this stale node with its id from the Consul brain. It seems natural that the force-leave cmd would do this. (In the similar way as the leave cmd does it when the node is leaving gracefully.)

crhino · 2020-01-03T21:38:59Z

@andriytk I spent some time today reproducing this and reading the issue you linked and its associated fix. I see that the issue really boils down to the fact that the cluster ends up losing raft quorum and thus cannot update its state (otherwise #5485 would work as intended). consul force-leave has the same problem losing quorum being unable to remove the server node eventually, as you point out.

I see where you are coming from with force-leave changing the quorum size, let me think on this for a bit and get back to you. Going to dig a little into force-leave and Raft to get a better sense of what a change like this might mean.

andriytk · 2020-01-06T15:07:35Z

the issue really boils down to the fact that the cluster ends up losing raft quorum

Yes. That's exactly matches with what I mentioned in the description.

Thank you @crhino for looking at it.

crhino · 2020-01-06T16:41:38Z

Just realized that in this scenario it is actually impossible for force-leave to modify the raft voters.

Modifying the raft voters equates to asking Raft to make a membership change, and membership changes must be committed in the Raft log to take effect (Raft paper Section 6). Thus, we must recover the cluster before making any changes to membership.

I think there is still an interesting situation here with failing to register a node with the same name after recovering the cluster, but I don't think force-leave can be the solution here.

justinabrahms · 2023-07-20T15:07:04Z

Found this issue because my cluster is in the same place. I misconfigured a consul instance with the ip of "127.0.0.1:8500". Now, I have 4 raft peers, 1 of which is that broken one. The others can't figure out a leader b/c the call to that other nodeid is failing due to timeout. There seems to be no mechanism to recover.

We have to have a mechanism to force that out of raft consensus, even if it's ssh-ing to all other participatory servers and telling them to forget that peer.

EDIT: It seems like you can recover from this. https://justin.abrah.ms/2023-07-20-consul-leader-election-issues.html (tl;dr, use peers.json)

crhino self-assigned this Dec 4, 2019

archekb mentioned this issue Apr 23, 2020

Can't delete ipv4 node from ipv6 consul cluster #7691

Open

jsosulska added the theme/internals Serf, Raft, SWIM, Lifeguard, Anti-Entropy, locking topics label Jun 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

force-leave does not remove the failed node from the raft voters list #6856

force-leave does not remove the failed node from the raft voters list #6856

andriytk commented Nov 29, 2019 •

edited

crhino commented Dec 18, 2019

andriytk commented Dec 18, 2019

crhino commented Jan 3, 2020 •

edited

andriytk commented Jan 6, 2020

crhino commented Jan 6, 2020

justinabrahms commented Jul 20, 2023 •

edited

force-leave does not remove the failed node from the raft voters list #6856

force-leave does not remove the failed node from the raft voters list #6856

Comments

andriytk commented Nov 29, 2019 • edited

Overview of the Issue

Reproduction Steps

crhino commented Dec 18, 2019

andriytk commented Dec 18, 2019

crhino commented Jan 3, 2020 • edited

andriytk commented Jan 6, 2020

crhino commented Jan 6, 2020

justinabrahms commented Jul 20, 2023 • edited

andriytk commented Nov 29, 2019 •

edited

crhino commented Jan 3, 2020 •

edited

justinabrahms commented Jul 20, 2023 •

edited