Golang implementation of the Raft consensus protocol
Go Other
Clone or download
Latest commit a3fb458 Feb 12, 2018
kyhavlov Merge pull request #258 from tylertreat/master
Fix incorrect docstring for NewTCPTransportWithConfig
Permalink
Failed to load latest commit information.
bench bench: fix up initializations Jan 31, 2015
fuzzy Merge branch 'library-v2-stage-one'. This resolves merge conflicts an… Sep 29, 2017
.gitignore Added makefile Nov 5, 2013
.travis.yml Travis: Fix build and test against newer go versions Oct 23, 2017
LICENSE Initial commit Nov 5, 2013
Makefile Add fuzztest testsuite Jul 11, 2017
README.md Fix typo in README.md Sep 29, 2017
api.go Clarify AddVoter and AddNonvoter docs Jan 30, 2018
commands.go Adds types for version numbers and a stub for choosing the snapshot v… Aug 1, 2016
commitment.go Add new types for ServerID and ServerAddress Jun 24, 2016
commitment_test.go Add new types for ServerID and ServerAddress Jun 24, 2016
config.go Adds types for version numbers and a stub for choosing the snapshot v… Aug 1, 2016
configuration.go Introduce AddressProvider interface that can be wired into the net tr… Aug 15, 2017
configuration_test.go Adds basic protocol versioning for in-place upgrades. Jul 18, 2016
discard_snapshot.go Adds types for version numbers and a stub for choosing the snapshot v… Aug 1, 2016
discard_snapshot_test.go Adding DiscardSnapshotStore Jun 1, 2015
file_snapshot.go Disable parent directory fsync on windows Aug 24, 2017
file_snapshot_test.go Adds types for version numbers and a stub for choosing the snapshot v… Aug 1, 2016
fsm.go Puts FSM commits and restores into a single pipe so we control order. Sep 29, 2016
future.go Makes the restore interface an io.Reader, not an io.ReadCloser. Oct 14, 2016
future_test.go doc fixes, goroutine WaitGroup, export Leader field Jun 29, 2016
inmem_snapshot.go Use an RWMutex for the inmem snapshot store Nov 10, 2016
inmem_snapshot_test.go Start empty to avoid returning len > 0 on List() Oct 29, 2016
inmem_store.go Fixes inmem store to properly track all types of delete operations. Jul 30, 2016
inmem_transport.go Introduce AddressProvider interface that can be wired into the net tr… Aug 15, 2017
inmem_transport_test.go Add LoopbackTransport, WithPeers interfaces Mar 21, 2016
integ_test.go Add back logger in integration test Sep 29, 2017
log.go Adds a comment about protocol versions with the deprecated log entry … Jul 30, 2016
log_cache.go Simplify LogCache implementation Jan 14, 2015
log_cache_test.go Simplify LogCache implementation Jan 14, 2015
membership.md Add membership.md to describe planned changes Apr 11, 2016
net_transport.go Adds missing "raft:" prefix to log message. Jan 26, 2018
net_transport_test.go Rework contexts for connection handlers Jan 17, 2018
observer.go change leader observation to use server address Nov 11, 2017
peersjson.go Provides a JSON format for recovery when using server IDs. May 4, 2017
peersjson_test.go Provides a JSON format for recovery when using server IDs. May 4, 2017
raft.go change leader observation to use server address Nov 11, 2017
raft_test.go Introduce AddressProvider interface that can be wired into the net tr… Aug 15, 2017
replication.go Introduce AddressProvider interface that can be wired into the net tr… Aug 15, 2017
snapshot.go Adds ability to access user snapshots and restore them. Sep 28, 2016
stable.go log.Logger versions of tcp/net transport Sep 5, 2015
state.go Put 64bit atomics at the top of structs Aug 7, 2017
tag.sh Revert incorrect key from tag.sh Sep 29, 2017
tcp_transport.go Fix incorrect docstring for NewTCPTransportWithConfig Nov 3, 2017
tcp_transport_test.go log.Logger versions of tcp/net transport Sep 5, 2015
transport.go Introduce AddressProvider interface that can be wired into the net tr… Aug 15, 2017
transport_test.go Introduce AddressProvider interface that can be wired into the net tr… Aug 15, 2017
util.go Merge branch 'master' into f-merge Jul 1, 2016
util_test.go Merge branch 'master' into f-merge Jul 1, 2016

README.md

raft Build Status

raft is a Go library that manages a replicated log and can be used with an FSM to manage replicated state machines. It is a library for providing consensus.

The use cases for such a library are far-reaching as replicated state machines are a key component of many distributed systems. They enable building Consistent, Partition Tolerant (CP) systems, with limited fault tolerance as well.

Building

If you wish to build raft you'll need Go version 1.2+ installed.

Please check your installation with:

go version

Documentation

For complete documentation, see the associated Godoc.

To prevent complications with cgo, the primary backend MDBStore is in a separate repository, called raft-mdb. That is the recommended implementation for the LogStore and StableStore.

A pure Go backend using BoltDB is also available called raft-boltdb. It can also be used as a LogStore and StableStore.

Tagged Releases

As of September 2017, Hashicorp will start using tags for this library to clearly indicate major version updates. We recommend you vendor your application's dependency on this library.

  • v0.1.0 is the original stable version of the library that was in master and has been maintained with no breaking API changes. This was in use by Consul prior to version 0.7.0.

  • v1.0.0 takes the changes that were staged in the library-v2-stage-one branch. This version manages server identities using a UUID, so introduces some breaking API changes. It also versions the Raft protocol, and requires some special steps when interoperating with Raft servers running older versions of the library (see the detailed comment in config.go about version compatibility). You can reference https://github.com/hashicorp/consul/pull/2222 for an idea of what was required to port Consul to these new interfaces.

    This version includes some new features as well, including non voting servers, a new address provider abstraction in the transport layer, and more resilient snapshots.

Protocol

raft is based on "Raft: In Search of an Understandable Consensus Algorithm"

A high level overview of the Raft protocol is described below, but for details please read the full Raft paper followed by the raft source. Any questions about the raft protocol should be sent to the raft-dev mailing list.

Protocol Description

Raft nodes are always in one of three states: follower, candidate or leader. All nodes initially start out as a follower. In this state, nodes can accept log entries from a leader and cast votes. If no entries are received for some time, nodes self-promote to the candidate state. In the candidate state nodes request votes from their peers. If a candidate receives a quorum of votes, then it is promoted to a leader. The leader must accept new log entries and replicate to all the other followers. In addition, if stale reads are not acceptable, all queries must also be performed on the leader.

Once a cluster has a leader, it is able to accept new log entries. A client can request that a leader append a new log entry, which is an opaque binary blob to Raft. The leader then writes the entry to durable storage and attempts to replicate to a quorum of followers. Once the log entry is considered committed, it can be applied to a finite state machine. The finite state machine is application specific, and is implemented using an interface.

An obvious question relates to the unbounded nature of a replicated log. Raft provides a mechanism by which the current state is snapshotted, and the log is compacted. Because of the FSM abstraction, restoring the state of the FSM must result in the same state as a replay of old logs. This allows Raft to capture the FSM state at a point in time, and then remove all the logs that were used to reach that state. This is performed automatically without user intervention, and prevents unbounded disk usage as well as minimizing time spent replaying logs.

Lastly, there is the issue of updating the peer set when new servers are joining or existing servers are leaving. As long as a quorum of nodes is available, this is not an issue as Raft provides mechanisms to dynamically update the peer set. If a quorum of nodes is unavailable, then this becomes a very challenging issue. For example, suppose there are only 2 peers, A and B. The quorum size is also 2, meaning both nodes must agree to commit a log entry. If either A or B fails, it is now impossible to reach quorum. This means the cluster is unable to add, or remove a node, or commit any additional log entries. This results in unavailability. At this point, manual intervention would be required to remove either A or B, and to restart the remaining node in bootstrap mode.

A Raft cluster of 3 nodes can tolerate a single node failure, while a cluster of 5 can tolerate 2 node failures. The recommended configuration is to either run 3 or 5 raft servers. This maximizes availability without greatly sacrificing performance.

In terms of performance, Raft is comparable to Paxos. Assuming stable leadership, committing a log entry requires a single round trip to half of the cluster. Thus performance is bound by disk I/O and network latency.