Fix/raft rejoin errors #170

ZenGround0 · 2017-10-11T16:51:43Z

Addresses #112 by storing peers as bootstrappers when a node is shutdown. This makes rejoining the cluster pain-free (no chance for bad raft state to error during rejoin) for the departing node. This strengthens the implicit assumption that departing nodes will rejoin a cluster by default, as starting the departing node in its own cluster will now require modifying the config.

@hsanjuan I may be using Shadow in an unconventional way, feedback is welcome.

This PR exposed a flaw in the tests that is still a WIP. Specifically in k8s-ipfs restarting the cluster is currently handled by reinitializing ipfs-cluster-service before starting the daemon. This clears listeners bound to sockets that are left behind by the old daemon. Using the saved config between restarts means reinitializing is not an option, and the network state needs to be cleaned up elsewhere. I will comment when this is resolved and the full k8s test suite correctly preserves configs.

edit
Upon further inspection it looks like this is just a timing consideration (the re-init just gave more time for system to clean things up). Newest commit doubles restart timing buffer to help with this.

…to respect saved config for tests

…ively

hsanjuan

See comments.

hsanjuan · 2017-10-11T18:06:25Z

cluster.go

@@ -423,6 +423,10 @@ func (c *Cluster) Shutdown() error {
 		} else {
 			time.Sleep(2 * time.Second)
 		}
+		/* set c.Config.Bootstrap to current peers */
+		c.config.Bootstrap = c.peerManager.peersAddrs()
+		c.config.Shadow()


Hmm yeah, unconventional :)

There is a behaviour change here. Before, if the user passed in a flag like --leave but leave_on_shutdown was false in the configuration, the configuration would stay as false when the cluster is shudown. With this, however, it will switch to true when this code runs.

I think the cleanest here is to do c.config.unshadow() before editing the Bootstrap key. In both cases you are saving right afterwards, and that's when unshadow is supposed to happen anyway. Since you've done it before, whatever you write on .Bootstrap will stay.

hsanjuan · 2017-10-11T18:08:38Z

You may have broken sharness too

…lity

hsanjuan · 2017-10-12T07:13:38Z

LGTM now, but tests should pass

hsanjuan · 2017-10-12T07:18:29Z

Maybe sharness will be fixed upon rebasing.

@ZenGround0 note that you can work directly on ipfs-cluster branches and don't need to do it on your own fork. This would allow me to push to them too more easily.

ZenGround0 · 2017-10-12T12:40:51Z

@hsanjuan ack

Adding in sharness test fixes to pass tests

ZenGround0 added 5 commits October 7, 2017 20:27

Peers saved in bootstrapper upon peer rm

a1ec459

Include cluster-restart script for test image

d6e1c39

Pulling in secret awareness

5924219

Using shadow to actually save bootstrapper, updating cluster restart …

67d38a0

…to respect saved config for tests

Touching up cluster restart to wait for listener clean up more effect…

0cd8e44

…ively

ZenGround0 requested a review from hsanjuan October 11, 2017 17:22

hsanjuan requested changes Oct 11, 2017

View reviewed changes

Using unshadow to save bootstrappers without changing other functiona…

e3ccc1b

…lity

hsanjuan approved these changes Oct 12, 2017

View reviewed changes

Merge branch 'master' into fix/raft-rejoin-errors

a5de825

Adding in sharness test fixes to pass tests

ZenGround0 merged commit 180807b into ipfs-cluster:master Oct 12, 2017

ZenGround0 deleted the fix/raft-rejoin-errors branch October 12, 2017 13:54

hsanjuan mentioned this pull request Oct 17, 2017

Release v0.2.0 #180

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix/raft rejoin errors #170

Fix/raft rejoin errors #170

ZenGround0 commented Oct 11, 2017 •

edited

hsanjuan left a comment

hsanjuan Oct 11, 2017

hsanjuan commented Oct 11, 2017

hsanjuan commented Oct 12, 2017

hsanjuan commented Oct 12, 2017

ZenGround0 commented Oct 12, 2017

Fix/raft rejoin errors #170

Fix/raft rejoin errors #170

Conversation

ZenGround0 commented Oct 11, 2017 • edited

hsanjuan left a comment

Choose a reason for hiding this comment

hsanjuan Oct 11, 2017

Choose a reason for hiding this comment

hsanjuan commented Oct 11, 2017

hsanjuan commented Oct 12, 2017

hsanjuan commented Oct 12, 2017

ZenGround0 commented Oct 12, 2017

ZenGround0 commented Oct 11, 2017 •

edited