Support incremental cluster joins #3478

jwilder · 2015-07-27T21:54:26Z

Overview

This PR is a follow up to #3372 and implements more the of functionality for #2966. Specifically, it allows servers to become raft peers when joining. This allows single node cluster to be expanded to larger clusters and automatically use a large number of nodes for raft consensus to provide better availability. By default, this is still hard-code to 3 nodes max, but could be made configurable. The first three nodes added to the cluster will become raft peer members. The remaining are data-only nodes.

To add a new member to the cluster, you should start influxd with the -join flag. The -join flag tags a host:port value. Multiple host:ports can be specific by comma separating them. For example:

influxd -join host1:8088,host2:8088,host3:8088

The port should be the cluster port (default 8088). Not the API port (default 8086).

The existing [meta].Peers config var can also be used and is equivalent to the -join flag. If a both are specified, the -join flag override the config value.

If nodes are restarted that have previously joined a cluster, their existing cluster state on disk will be used and any join or config variable will be ignored.

Additionally, the SHOW SERVERS statement now has a raft column that indicates whether the node is running raft or not.

Not implemented

Leaving the cluster - The infrastructure is in place to remove a raft peer, but is not currently exposed. Removing a non-raft peer is not supported at this time.
Promoting/demoting raft peers - The infrastructure to promote/demote nodes to become raft peers is in place but not exposed or wired up currently.
Changing hostnames - The hostname:port used when joining the cluster can't be changed after the fact. This will be addressed in Should update metastore and cluster if IP or hostname changes #3421

pauldix · 2015-07-28T15:35:46Z

overall lgtm. w00t clustering!!

This change adds the first 3 nodes to the cluster as raft peers. Other nodes are data-only.

* Test add new nodes that become raft peers * Test restarting a cluster w/ 3 raft nodes and 3 non-raft nodes

Removes the two separate variables in the meta.Config. -join will now override the Peers var.

Reports whether the not is part of the raft consensus cluster or not.

There is a race when stopping servers where the meta.Store is closing but the server has not signaled it is closing so the reporting goroutine repeeatedly errors out in fast loop during this time. It creates a lot of noise in the logs.

Support incremental cluster joins

beckettsean · 2015-07-29T00:33:07Z

@jwilder is there a difference between the following scenarios? (assume servers are launched in alphabetical order)

B, C, & D join A

Server A: influxd -config /path/to/config
Server B: influxd -config /path/to/config -join serverA:8088
Server C: influxd -config /path/to/config -join serverA:8088
Server D: influxd -config /path/to/config -join serverA:8088

B joins A, C joins B & A, D joins C, B, & A

Server A: influxd -config /path/to/config
Server B: influxd -config /path/to/config -join serverA:8088
Server C: influxd -config /path/to/config -join serverA:8088,serverB:8088
Server D: influxd -config /path/to/config -join serverA:8088,serverB:8088,serverC:8088

Putting it another way, why would I specify multiple servers in the -join flag?

beckettsean · 2015-07-29T00:36:50Z

"The existing [meta].Peers config var can also be used and is equivalent to the -join flag. If a both are specified, the -join flag override the config value."

So what's the reason for the -join flag? If I set the peers correctly in [meta] the server does the right thing on a restart. If I just use the -join flag it only does the right thing on the first launch (or I must pass -join every time).

pauldix · 2015-07-29T00:43:19Z

@beckettsean peers and join only get used on the first startup. Generally I would prefer to not even have peers since people think it's something that gets used all the time if it's in the config.

Once it has joined the cluster, it writes a file to the local disk that contains the information. On restart it will see this file is there and use that to reconnect to the cluster.

pauldix · 2015-07-29T00:44:22Z

There's no difference in the situation you describe. The list is just to have other ones to try to join to if the first fails. The only thing that matters is the order in which the four systems join the cluster because the 4th won't run the consensus protocol

beckettsean · 2015-07-29T01:01:18Z

Thanks, Paul. I tend to agree, that if [meta] peers is a one-time thing it shouldn't be in the config at all. Maybe we can deprecate that with 0.9.3 and remove in 0.9.4?

Is the cluster file on disk human readable and/or editable? If I blow it away presumably the node forgets about the cluster on restart unless there's a -join or [meta] section involved?

Understood about the first three make the consensus group and all subsequent are not participants. It makes sense that the -join command could specify fallback servers in case one doesn't respond, and that makes it idempotent with respect to startup timing for each process.

kfitzpatrick · 2015-07-31T20:02:42Z

It would be great to have something that could be run while the database is up or in the config file that I can change and bounce the service. When I'm spinning up machines I don't want to have to pass a special argument to the machine. it makes the monit scripts a pain to deal with and maintain automatically.

Also, if a new node is added, then I have to go back into all the existing nodes somehow and change everything.

Reason this matters: I would like to add the ability to add nodes dynamically to a cluster instead of having to bring them all up at the same time in our hosted environment.

pauldix · 2015-07-31T20:04:33Z

@kfitzpatrick you don't need to do anything to existing nodes. They all get notified of the new member automatically. This design allows you to bring up new nodes at any time and join them to a cluster

kfitzpatrick · 2015-07-31T20:12:11Z

So let's see if I got this. I create node A. I bring up another node (B) and have to start it with "--join A:8088". Or, if it's not removed, add A to B's Peers in the config and bring it up and now they all know about each other.

Correct, @pauldix ?

pauldix · 2015-07-31T23:17:41Z

right, except I'm not sure what you mean by "Or, if it's not removed". I assume you meant that if you don't specify the join argument you have to list A in B's peers? If so, then you're correct

toddboom · 2015-08-01T00:10:36Z

i think he meant "if it's not removed from the config file". maybe?

jwilder added the 2 - Working label Jul 27, 2015

jwilder added 9 commits July 28, 2015 09:40

Rename raftState.openRaft to open

f5705ae

Support add new raft nodes

c93e46d

This change adds the first 3 nodes to the cluster as raft peers. Other nodes are data-only.

Add more meta store cluster tests

2938601

* Test add new nodes that become raft peers * Test restarting a cluster w/ 3 raft nodes and 3 non-raft nodes

Use config.Peers when passing -join flag

06d8ff7

Removes the two separate variables in the meta.Config. -join will now override the Peers var.

Add raft column to show servers statement

f5d86b9

Reports whether the not is part of the raft consensus cluster or not.

Exit report goroutine if server is closing

514f36c

Fix data race in WaitForDataChanged

95c98d1

Update changelog

c12b556

jwilder force-pushed the jw-cluster branch from 8a72d34 to c12b556 Compare July 28, 2015 15:41

jwilder added a commit that referenced this pull request Jul 28, 2015

Merge pull request #3478 from influxdb/jw-cluster

1536cd5

Support incremental cluster joins

jwilder merged commit 1536cd5 into master Jul 28, 2015

jwilder removed the 2 - Working label Jul 28, 2015

jwilder deleted the jw-cluster branch July 28, 2015 16:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support incremental cluster joins #3478

Support incremental cluster joins #3478

jwilder commented Jul 27, 2015

pauldix commented Jul 28, 2015

beckettsean commented Jul 29, 2015

beckettsean commented Jul 29, 2015

pauldix commented Jul 29, 2015

pauldix commented Jul 29, 2015

beckettsean commented Jul 29, 2015

kfitzpatrick commented Jul 31, 2015

pauldix commented Jul 31, 2015

kfitzpatrick commented Jul 31, 2015

pauldix commented Jul 31, 2015

toddboom commented Aug 1, 2015

Support incremental cluster joins #3478

Support incremental cluster joins #3478

Conversation

jwilder commented Jul 27, 2015

Overview

Not implemented

pauldix commented Jul 28, 2015

beckettsean commented Jul 29, 2015

beckettsean commented Jul 29, 2015

pauldix commented Jul 29, 2015

pauldix commented Jul 29, 2015

beckettsean commented Jul 29, 2015

kfitzpatrick commented Jul 31, 2015

pauldix commented Jul 31, 2015

kfitzpatrick commented Jul 31, 2015

pauldix commented Jul 31, 2015

toddboom commented Aug 1, 2015