Permalink
Browse files

Formal specification for the Raft Pre-Vote mechanism

Introduce the Raft Pre-Vote mechanism as mentioned in section 4.2.3 of
the disseration, also further expanded on in section 9.6.

---

We introduce an explicit PreCandidate stage, the happy-path control flow
is now [Follower -> PreCandidate -> Candidate -> Leader]. PreCandidates
send out Pre-Votes to first check if any current quorum of nodes would be
willing to grant it votes should it run as a Candidate in the subsequent
term. If acknowledgement is received from a majority of nodes,
PreCandidates transition to the Candidate stage and call for an
election. If the PreCandidate receives responses from nodes at a higher
term (rejected Pre-Votes), it steps down to the Follower stage at the
specified term. If the PreCandidate does not receive a majority of
pre-votes, it will retry for the same next term after timing out (if it
does not receive heartbeats/append entries in the interim). Nodes can
give out pre-votes to multiple PreCandidates in the same term,
PreCandidates subsequently compete during the Candidate election stage.

Pre-Vote requests from nodes at a higher term cause the recipient term
to step down to a Follower stage at the specified term. This is similar
to the behaviour when Vote requests from higher termed Candidates are
received.
  • Loading branch information...
irfansharif committed Sep 7, 2017
1 parent fe3ae2a commit 22b05818b6bcfe6719a708f2270a1308fecbc0fa
Showing with 87 additions and 8 deletions.
  1. +2 −0 README.md
  2. +85 −8 raft.tla
@@ -5,4 +5,6 @@ For more information, see Chapter 8 (Correctness) and Appendix B (Safety proof a

Copyright 2014 Diego Ongaro.

Copyright 2017 Irfan Sharif.

This work is licensed under the Creative Commons Attribution-4.0 International License https://creativecommons.org/licenses/by/4.0/ .
@@ -11,13 +11,14 @@ EXTENDS Naturals, FiniteSets, Sequences, TLC
CONSTANTS Server

\* Server states.
CONSTANTS Follower, Candidate, Leader
CONSTANTS Follower, PreCandidate, Candidate, Leader

\* A reserved value.
CONSTANTS Nil

\* Message types:
CONSTANTS RequestVoteRequest, RequestVoteResponse,
RequestPreVoteRequest, RequestPreVoteResponse,
AppendEntriesRequest, AppendEntriesResponse

\* Maximum number of client requests
@@ -27,7 +28,8 @@ CONSTANTS MaxClientRequests

----
\* Global variables

\* TODO(irfansharif): Allow transition from follower to candidate directly?
\* Semi-randomly i.e.


\* A bag of records representing requests and responses sent from one server
@@ -52,7 +54,7 @@ VARIABLE allLogs

\* The server's term number.
VARIABLE currentTerm
\* The server's state (Follower, Candidate, or Leader).
\* The server's state (Follower, PreCandidate, Candidate, or Leader).
VARIABLE state
\* The candidate the server voted for in its current term, or
\* Nil if it hasn't voted for any.
@@ -193,10 +195,10 @@ Restart(i) ==
/\ commitIndex' = [commitIndex EXCEPT ![i] = 0]
/\ UNCHANGED <<messages, currentTerm, votedFor, log, elections, clientRequests, committedLog, committedLogDecrease>>

\* Server i times out and starts a new election.
Timeout(i) == /\ state[i] \in {Follower, Candidate}
/\ state' = [state EXCEPT ![i] = Candidate]
/\ currentTerm' = [currentTerm EXCEPT ![i] = currentTerm[i] + 1]
\* Server i times out and enters PreCandidate phase.
Timeout(i) == /\ state[i] \in {Follower, PreCandidate, Candidate}
/\ state' = [state EXCEPT ![i] = PreCandidate]
/\ UNCHANGED currentTerm
\* Most implementations would probably just set the local vote
\* atomically, but messaging localhost for it is weaker.
/\ votedFor' = [votedFor EXCEPT ![i] = Nil]
@@ -205,6 +207,17 @@ Timeout(i) == /\ state[i] \in {Follower, Candidate}
/\ voterLog' = [voterLog EXCEPT ![i] = [j \in {} |-> <<>>]]
/\ UNCHANGED <<messages, leaderVars, logVars>>

\* PreCandidate i sends j a RequestPreVote request.
RequestPreVote(i,j) ==
/\ state[i] = PreCandidate
/\ Send([mtype |-> RequestPreVoteRequest,
mterm |-> currentTerm[i],
mlastLogTerm |-> LastTerm(log[i]),
mlastLogIndex |-> Len(log[i]),
msource |-> i,
mdest |-> j])
/\ UNCHANGED <<serverVars, votesGranted, voterLog, leaderVars, logVars, votesSent>>

\* Candidate i sends j a RequestVote request.
RequestVote(i,j) ==
/\ state[i] = Candidate
@@ -260,6 +273,21 @@ BecomeLeader(i) ==
evoterLog |-> voterLog[i]]}
/\ UNCHANGED <<messages, currentTerm, votedFor, candidateVars, logVars>>

\* PreCandidate i transitions to Candidate.
BecomeCandidate(i) ==
/\ state[i] = PreCandidate
/\ votesGranted[i] \in Quorum
/\ state' = [state EXCEPT ![i] = Candidate]
/\ currentTerm' = [currentTerm EXCEPT ![i] = currentTerm[i] + 1]
\* Most implementations would probably just set the local vote
\* atomically, but messaging localhost for it is weaker.
/\ votedFor' = [votedFor EXCEPT ![i] = Nil]
/\ votesSent' = [votesSent EXCEPT ![i] = FALSE ]
/\ votesGranted' = [votesGranted EXCEPT ![i] = {}]
/\ voterLog' = [voterLog EXCEPT ![i] = [j \in {} |-> <<>>]]
/\ UNCHANGED <<messages, leaderVars, logVars>>


\* Leader i receives a client request to add v to the log.
ClientRequest(i) ==
/\ state[i] = Leader
@@ -308,6 +336,28 @@ AdvanceCommitIndex(i) ==
\* Message handlers
\* i = recipient, j = sender, m = message

\* Server i receives a RequestPreVote request from server j with
\* m.mterm <= currentTerm[i].
HandleRequestPreVoteRequest(i, j, m) ==
LET logOk == \/ m.mlastLogTerm > LastTerm(log[i])
\/ /\ m.mlastLogTerm = LastTerm(log[i])
/\ m.mlastLogIndex >= Len(log[i])
grant == /\ m.mterm = currentTerm[i]
/\ logOk
/\ votedFor[i] \in {Nil}
IN /\ m.mterm <= currentTerm[i]
/\ Reply([mtype |-> RequestPreVoteResponse,
mterm |-> currentTerm[i],
mvoteGranted |-> grant,
\* mlog is used just for the `elections' history variable for
\* the proof. It would not exist in a real implementation.
mlog |-> log[i],
msource |-> i,
mdest |-> j],
m)
/\ UNCHANGED votedFor
/\ UNCHANGED <<state, currentTerm, candidateVars, leaderVars, logVars>>

\* Server i receives a RequestVote request from server j with
\* m.mterm <= currentTerm[i].
HandleRequestVoteRequest(i, j, m) ==
@@ -331,6 +381,24 @@ HandleRequestVoteRequest(i, j, m) ==
m)
/\ UNCHANGED <<state, currentTerm, candidateVars, leaderVars, logVars>>

\* Server i receives a RequestPreVote response from server j with
\* m.mterm = currentTerm[i].
HandleRequestPreVoteResponse(i, j, m) ==
\* This tallies votes even when the current state is not PreCandidate, but
\* they won't be looked at, so it doesn't matter.
/\ m.mterm = currentTerm[i]
/\ \/ /\ m.mvoteGranted
/\ votesGranted' = [votesGranted EXCEPT ![i] =
votesGranted[i] \cup {j}]
/\ voterLog' = [voterLog EXCEPT ![i] =
voterLog[i] @@ (j :> m.mlog)]
/\ UNCHANGED <<votesSent>>
\/ /\ ~m.mvoteGranted
/\ UNCHANGED <<votesSent, votesGranted, voterLog>>
/\ Discard(m)
/\ UNCHANGED <<serverVars, votedFor, leaderVars, logVars>>


\* Server i receives a RequestVote response from server j with
\* m.mterm = currentTerm[i].
HandleRequestVoteResponse(i, j, m) ==
@@ -373,7 +441,7 @@ HandleAppendEntriesRequest(i, j, m) ==
/\ UNCHANGED <<serverVars, logVars>>
\/ \* return to follower state
/\ m.mterm = currentTerm[i]
/\ state[i] = Candidate
/\ state[i] \in {PreCandidate, Candidate}
/\ state' = [state EXCEPT ![i] = Follower]
/\ UNCHANGED <<currentTerm, votedFor, logVars, messages>>
\/ \* accept request
@@ -452,6 +520,11 @@ Receive(m) ==
IN \* Any RPC with a newer term causes the recipient to advance
\* its term first. Responses with stale terms are ignored.
\/ UpdateTerm(i, j, m)
\/ /\ m.mtype = RequestPreVoteRequest
/\ HandleRequestPreVoteRequest(i, j, m)
\/ /\ m.mtype = RequestPreVoteResponse
/\ \/ DropStaleResponse(i, j, m)
\/ HandleRequestPreVoteResponse(i, j, m)
\/ /\ m.mtype = RequestVoteRequest
/\ HandleRequestVoteRequest(i, j, m)
\/ /\ m.mtype = RequestVoteResponse
@@ -481,8 +554,10 @@ DropMessage(m) ==
\* Defines how the variables may transition.
Next == /\ \/ \E i \in Server : Restart(i)
\/ \E i \in Server : Timeout(i)
\/ \E i, j \in Server : RequestPreVote(i, j)
\/ \E i, j \in Server : RequestVote(i, j)
\/ \E i \in Server : BecomeLeader(i)
\/ \E i \in Server : BecomeCandidate(i)
\/ \E i \in Server : ClientRequest(i)
\/ \E i \in Server : AdvanceCommitIndex(i)
\/ \E i,j \in Server : AppendEntries(i, j)
@@ -518,6 +593,8 @@ MoreThanOneLeader ==
===============================================================================

\* Changelog:
\* 2017-09-07:
\* - Formalise PreVote mechanism.
\*
\* 2014-12-02:
\* - Fix AppendEntries to only send one entry at a time, as originally

0 comments on commit 22b0581

Please sign in to comment.