Initial Raft bootstrap using multiple arbitrarily-started nodes should be supported #853

nathanleclaire · 2016-06-08T02:03:13Z

Consider:

You want to start a swarmkit cluster. The swarmkit cluster will consist of, say, 3 manager nodes and 7 worker nodes.

You boot the manager and worker nodes up at the same time, expecting the workers to join once the initial bootstrap and leader election is complete. You want the leader election to be arbitrary, i.e. handled by swarmkit's Raft implementation and not dictated ahead of time from on high.

In the 3 Nodes Cluster example, --join-addr specifies only one address, so it's not clear how an initial leader election is intended to be handled. (EDIT: If I understand correctly the node where docker swarm init is run just becomes the leader by default.)

How will this be handled in swarmkit? Bootstraps of this nature seem supported by etcd and Consul, so perhaps swarmkit should support it as well?

In Consul, when starting an agent for an initial bootstrap you pass the -bootstrap-expect N flag to delay election until the proper number of N peers have connected. It seems that these agents then discover each other by one of them running consul join with multiple addresses:

$ consul join <Node A Address> <Node B Address> <Node C Address>
Successfully joined cluster by contacting 3 nodes.

(I have no idea if that will run retries in the daemon if the IPs cannot be contacted. Seems doing so with a set number of retries might be the prudent approach).

Interestingly, they note:

Since a join operation is symmetric, it does not matter which node initiates it.

Then, for the workers:

... [when] the servers are all started and replicating to each other, all the remaining clients can be joined. Clients are much easier as they can join against any existing node.

Another example: in etcd this seems handled in the static case by the --initial-cluster flag:

As we know the cluster members, their addresses and the size of the cluster before starting, we can use an offline bootstrap configuration by setting the initial-cluster flag.

There is also support for a "discovery service" and for DNS SRV records. I'm sure each have their tradeoffs, so maybe swarmkit could implement a simple solution to this problem like static (or Consul's multi-node join) to start with?

In summary:

Without such a bootstrap functionality accepting multiple node addresses, will end-users be expected to bear the burden of this initial distributed consensus and discovery instead of swarmkit? What's the intended workflow/solution here? How do downstream nodes discovery the upstream?
Perhaps more importantly, what happens if the initial node that other nodes --join to goes away? How are downstream join nodes intended to get the address of a new node to --join to?

@aluzzardi @vieux @dongluochen @stevvooe Thanks, I hope I am understanding the internals correctly and have written a thorough examination of the issue at play.

The text was updated successfully, but these errors were encountered:

abronan · 2016-06-08T02:23:13Z

Agreed. I think it's related to #342 :)

nathanleclaire · 2016-06-08T02:51:31Z

Oh, nice!

dongluochen · 2016-06-08T03:00:59Z

I think providing a pool of addresses is a good idea. N/2 +1 nodes have to show up before bootstrapping. This would avoid network partition starting 2 different clusters.

jarrydfillmore · 2016-06-08T20:21:48Z

100%!

abronan added area/raft kind/enhancement labels Jun 8, 2016

friism mentioned this issue Jun 27, 2016

Support for creating and managing infrastructure for swarms docker-archive/deploykit#84

Open

nathanleclaire mentioned this issue Jun 29, 2016

docker swarm join should take multiple addresses moby/moby#24085

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial Raft bootstrap using multiple arbitrarily-started nodes should be supported #853

Initial Raft bootstrap using multiple arbitrarily-started nodes should be supported #853

nathanleclaire commented Jun 8, 2016 •

edited

abronan commented Jun 8, 2016

nathanleclaire commented Jun 8, 2016

dongluochen commented Jun 8, 2016

jarrydfillmore commented Jun 8, 2016

Initial Raft bootstrap using multiple arbitrarily-started nodes should be supported #853

Initial Raft bootstrap using multiple arbitrarily-started nodes should be supported #853

Comments

nathanleclaire commented Jun 8, 2016 • edited

abronan commented Jun 8, 2016

nathanleclaire commented Jun 8, 2016

dongluochen commented Jun 8, 2016

jarrydfillmore commented Jun 8, 2016

nathanleclaire commented Jun 8, 2016 •

edited