Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.
Sign upraft: deadlock during PreVote migration process #8501
Comments
This was referenced Sep 5, 2017
irfansharif
changed the title from
raft: deadlock during {CheckQuorum,PreVote} migration process
to
raft: deadlock during PreVote migration process
Sep 5, 2017
xiang90
added
the
Raft
label
Sep 5, 2017
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
xiang90
Sep 5, 2017
Contributor
@irfansharif @bdarnell @siddontang @es-chow
Someone probably needs to own the pre-voting feature, and make it stable. I do not have the bandwidth to own this for the near future. But if no one is going to own it, I will put this item on my schedule and make it stable.
|
@irfansharif @bdarnell @siddontang @es-chow Someone probably needs to own the pre-voting feature, and make it stable. I do not have the bandwidth to own this for the near future. But if no one is going to own it, I will put this item on my schedule and make it stable. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
bdarnell
Sep 6, 2017
Member
I agree, and I apologize for (mostly) implementing this without getting it into a working state. We'll work on fixing this.
|
I agree, and I apologize for (mostly) implementing this without getting it into a working state. We'll work on fixing this. |
irfansharif
referenced this issue
Sep 7, 2017
Closed
etcdserver: use PreVote so majority partition's leader isn't forced to step down #8499
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
siddontang
Sep 8, 2017
Contributor
We will also do more tests to make it more stable.
/cc @javaforfun
|
We will also do more tests to make it more stable. /cc @javaforfun |
lishuai87
referenced this issue
Sep 8, 2017
Merged
raft: fix deadlock during PreVote migration process #8525
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
irfansharif
Sep 8, 2017
Contributor
Originally wrote the following on #8525 but posting here instead:
This addresses the specific deadlock observed in #8501 but given how we're looking to stabilize PreVote overall I was wondering what your thoughts were on the following proposal:
The current Pre-Vote implementation works such that a a PreCandidate sending out PreVote messages annotates Message.Term with the "next" term, i.e. the term it would call elections for should the PreVote phase pass with quorum. I tried retrofitting this implementation idea to the original TLA+ spec by @ongardie (ongardie/raft.tla) but proved to be slightly difficult to get right. I then tried however annotating Message.Term with the PreCandidate's current term and having the receiver know this on the other end (i.e. PreVotes are votes for a term in the immediate future) and things were slightly cleaner to work out. You can find my implementation here: irfansharif/raft.tla, irfansharif/raft.tla@22b0581 specifically. I have a WIP branch exploring this approach instead here for PreVotes but nothing ready as yet.
As it pertains to #8501, consider how it addresses that: Messages from higher terms always cause us to revert to the follower state. See irfansharif/raft.tla@22b0581#L520-L522 which was from the original spec (this was the issue I had before trying to retrofit into). So for the "rogue" candidate with advanced terms pre-PreVote enabling, it would just work out of the box. The message handling at the receiver too is simplified as there's less of case-by-case basis. Right now if we have a Message.Term > raft.Term if PreVote, let it fall through to be handled below. This patch introduces a negative response but by comparing logs if it's specifically PreVote. I think the alternate is slightly cleaner in this respect but I have nothing concrete at hand to present as yet.
NB: With the TLA+ spec adding PreVotes, I did not complete a full run. I tried constraining the search space and tried random searching but after a 5-ish hour run with no failures (albeit on my tiny 4-core machine) I am convinced it's OK.
As an aside, I'll be on vacation through the upcoming week (so will Ben). I will be back in school right after but I'll try submitting this by/before then.
|
Originally wrote the following on #8525 but posting here instead: This addresses the specific deadlock observed in #8501 but given how we're looking to stabilize PreVote overall I was wondering what your thoughts were on the following proposal: The current Pre-Vote implementation works such that a a As it pertains to #8501, consider how it addresses that: Messages from higher terms always cause us to revert to the follower state. See irfansharif/raft.tla@22b0581#L520-L522 which was from the original spec (this was the issue I had before trying to retrofit into). So for the "rogue" candidate with advanced terms pre-PreVote enabling, it would just work out of the box. The message handling at the receiver too is simplified as there's less of case-by-case basis. Right now if we have a NB: With the TLA+ spec adding PreVotes, I did not complete a full run. I tried constraining the search space and tried random searching but after a 5-ish hour run with no failures (albeit on my tiny 4-core machine) I am convinced it's OK. As an aside, I'll be on vacation through the upcoming week (so will Ben). I will be back in school right after but I'll try submitting this by/before then. |
heyitsanthony
assigned
xiang90
Sep 12, 2017
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
bdarnell
Jan 26, 2018
Member
When I initially implemented prevote, I tried it both ways (the current implementation with MsgPreVote containing r.Term+1, and Irfan's suggestion that it contains the current term and the recipient adds one). Both solutions are awkward in places; I think that making this change would just move complexity around without being much of a net win.
In particular, I don't think making the switch in how we send pre-vote terms would simplify the fix for this deadlock. Sending the true term instead of the future term would avoid the special case when the terms only differ by one, but if there is a greater difference we'd still need a special case to handle MsgPreVote with a term in the past.
|
When I initially implemented prevote, I tried it both ways (the current implementation with In particular, I don't think making the switch in how we send pre-vote terms would simplify the fix for this deadlock. Sending the true term instead of the future term would avoid the special case when the terms only differ by one, but if there is a greater difference we'd still need a special case to handle MsgPreVote with a term in the past. |
irfansharif commentedSep 5, 2017
This was first uncovered in the context of cockroachdb/cockroach#18151. For some background we were running a migration from v1.0.x cockroachdb/cockroach binary to our v1.1-rc binary, the difference between the two branches was that the earlier revision had raft groups configured with
PreVote: falseandCheckQuorum: truewhereas the new v1.1-rc revision hadPreVote: trueandCheckQuorum: false. The "migration" process here was a rolling restart of the cluster from the old revision to the new where it was entirely likely that for a given raft group, you could have some replicas running the new revision withPreVoteenabled and some without.Here's the relevant summary of a raft group that got wedged during this migration process:

Note that of the two replicas shown, we have the first replica with a higher current term than the second replica but with a lower last index. At this point the following manifests:
Now, as for how we could get to such a state where a replica with a lower last index ends up with a higher term, remember that this was a rolling restart process where we had replicas without pre-vote enabled. Replicas at this stage, when calling for elections, advance their current term. This advanced current term is persisted (necessarily so) and when the node is then restarted with PreVote enabled, we're deadlocked as described above.
Here's a test demonstrating this:
+cc @bdarnell @xiang90.