New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix/raft rejoin errors #170
Fix/raft rejoin errors #170
Conversation
…to respect saved config for tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comments.
cluster.go
Outdated
@@ -423,6 +423,10 @@ func (c *Cluster) Shutdown() error { | |||
} else { | |||
time.Sleep(2 * time.Second) | |||
} | |||
/* set c.Config.Bootstrap to current peers */ | |||
c.config.Bootstrap = c.peerManager.peersAddrs() | |||
c.config.Shadow() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm yeah, unconventional :)
There is a behaviour change here. Before, if the user passed in a flag like --leave
but leave_on_shutdown
was false in the configuration, the configuration would stay as false when the cluster is shudown. With this, however, it will switch to true when this code runs.
I think the cleanest here is to do c.config.unshadow()
before editing the Bootstrap key. In both cases you are saving right afterwards, and that's when unshadow is supposed to happen anyway. Since you've done it before, whatever you write on .Bootstrap
will stay.
You may have broken sharness too |
LGTM now, but tests should pass |
Maybe sharness will be fixed upon rebasing. @ZenGround0 note that you can work directly on ipfs-cluster branches and don't need to do it on your own fork. This would allow me to push to them too more easily. |
@hsanjuan ack |
Adding in sharness test fixes to pass tests
Addresses #112 by storing peers as bootstrappers when a node is shutdown. This makes rejoining the cluster pain-free (no chance for bad raft state to error during rejoin) for the departing node. This strengthens the implicit assumption that departing nodes will rejoin a cluster by default, as starting the departing node in its own cluster will now require modifying the config.
@hsanjuan I may be using
Shadow
in an unconventional way, feedback is welcome.This PR exposed a flaw in the tests that is still a WIP. Specifically in k8s-ipfs restarting the cluster is currently handled by reinitializing ipfs-cluster-service before starting the daemon. This clears listeners bound to sockets that are left behind by the old daemon. Using the saved config between restarts means reinitializing is not an option, and the network state needs to be cleaned up elsewhere. I will comment when this is resolved and the full k8s test suite correctly preserves configs.
edit
Upon further inspection it looks like this is just a timing consideration (the re-init just gave more time for system to clean things up). Newest commit doubles restart timing buffer to help with this.