Raft Distributed Consensus

Node can be in three states: Follower, Candidate, or Leader.

Leader

All change requests go to the Leader.

Log Replication Summary

Change is added as an entry to node's log. * Entry is not committed yet.
Entry is replicated to all Followers.
Followers confirm with Leader on writing the entry.
Leader commits entry and applies the state change.
Leader notifies Followers that entry is committed.
Cluster is in consensus on state.

Leader Election Summary

Nodes all start in the Followers state
One of the nodes becomes a Candidate.
The Candidate will request votes from the other nodes.
Nodes reply with their vote.
If Candidate gets votes from a majority of notes, it becomes the Leader.

Election Timeout

Each node has a randomized election timeout between 150ms - 300ms.
At the end of timeout, Follower become a Candidate.
This starts an Election Term.

Election Term

Candidate votes for itself.
It sends out Request Vote messages to other nodes for the specific Term ID.
If nodes haven't voted yet for a given Term ID, it will reply to the Candidate with a vote and reset its election timeout.
Once Candidate has a majority of the votes, it becomes the Leader.
Leader will start sending out Append Entries messages in intervals set by the heartbeat timeout. * I believe this is called broadcast time? Ranges from 0.5ms - 20ms.
Followers respond to each Append Entries message and resets its election timeout.
Election Term will continue until a Follower stops receiving heartbeats (Append Entries messages). * Followers becomes a Candidate if it hasn't received a heartbeat by the time its election timeout runs out.

Split Vote Issue

If votes are split evenly, election timeout causes another node to become a Candidate.
It sends out Request Vote message to other nodes for a new Term ID.

Log Replication

Leader sends out changes via the Append Entries messages on the next heartbeat.
Followers acknowledge the change.
Leader commits the entry when majority of Followers acknowledges the change.
Response is sent to the Client.

Network Partition Issue

Nodes are divided to be in separate network partitions.
A Leader will exist in each of the partitions.
Clients send Change Requests to both Leader.
The Leader that cannot replicate to a majority will not be able to commit its log entry. * Only one of the Leader will be able to replicate to majority.
Network partition becomes healed.
One Leader will have a higher Term ID than another Leader. * The Leader that was in the partition with the majority of nodes will have the higher Term ID.
Leader with the lower Term ID will step down to a Follower.
Nodes in the minority partition will roll back uncommitted log entries and match the new Leader's log.
The logs are now consistent across the cluster.

References

"Raft: Understandable Distributed Consensus." The Secret Lives of Data. Accessed November 04, 2018. http://thesecretlivesofdata.com/raft/.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

raft-distributed-consensus.md

raft-distributed-consensus.md

Raft Distributed Consensus

Leader

Log Replication Summary

Leader Election Summary

Election Timeout

Election Term

Split Vote Issue

Log Replication

Network Partition Issue

References

Files

raft-distributed-consensus.md

Latest commit

History

raft-distributed-consensus.md

File metadata and controls

Raft Distributed Consensus

Leader

Log Replication Summary

Leader Election Summary

Election Timeout

Election Term

Split Vote Issue

Log Replication

Network Partition Issue

References