Node can be in three states: Follower, Candidate, or Leader.
All change requests go to the Leader.
- Change is added as an entry to node's log. * Entry is not committed yet.
- Entry is replicated to all Followers.
- Followers confirm with Leader on writing the entry.
- Leader commits entry and applies the state change.
- Leader notifies Followers that entry is committed.
- Cluster is in consensus on state.
- Nodes all start in the Followers state
- One of the nodes becomes a Candidate.
- The Candidate will request votes from the other nodes.
- Nodes reply with their vote.
- If Candidate gets votes from a majority of notes, it becomes the Leader.
- Each node has a randomized election timeout between 150ms - 300ms.
- At the end of timeout, Follower become a Candidate.
- This starts an Election Term.
- Candidate votes for itself.
- It sends out Request Vote messages to other nodes for the specific Term ID.
- If nodes haven't voted yet for a given Term ID, it will reply to the Candidate with a vote and reset its election timeout.
- Once Candidate has a majority of the votes, it becomes the Leader.
- Leader will start sending out Append Entries messages in intervals set by the heartbeat timeout. * I believe this is called broadcast time? Ranges from 0.5ms - 20ms.
- Followers respond to each Append Entries message and resets its election timeout.
- Election Term will continue until a Follower stops receiving heartbeats (Append Entries messages). * Followers becomes a Candidate if it hasn't received a heartbeat by the time its election timeout runs out.
- If votes are split evenly, election timeout causes another node to become a Candidate.
- It sends out Request Vote message to other nodes for a new Term ID.
- Leader sends out changes via the Append Entries messages on the next heartbeat.
- Followers acknowledge the change.
- Leader commits the entry when majority of Followers acknowledges the change.
- Response is sent to the Client.
- Nodes are divided to be in separate network partitions.
- A Leader will exist in each of the partitions.
- Clients send Change Requests to both Leader.
- The Leader that cannot replicate to a majority will not be able to commit its log entry. * Only one of the Leader will be able to replicate to majority.
- Network partition becomes healed.
- One Leader will have a higher Term ID than another Leader. * The Leader that was in the partition with the majority of nodes will have the higher Term ID.
- Leader with the lower Term ID will step down to a Follower.
- Nodes in the minority partition will roll back uncommitted log entries and match the new Leader's log.
- The logs are now consistent across the cluster.
- "Raft: Understandable Distributed Consensus." The Secret Lives of Data. Accessed November 04, 2018. http://thesecretlivesofdata.com/raft/.