-
Notifications
You must be signed in to change notification settings - Fork 615
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
raft: synchronous raft conf change, safeguard on join/leave raft #131
Conversation
// ID returns the cluster ID | ||
ID() uint64 | ||
// Members returns a slice of members sorted by their ID | ||
Members() map[uint64]*Member |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that map can be sorted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is just a confusing description. It should say "indexed by their ID".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's better to return slice here to encourage a user to use it only for iteration and use GetMember
for access by ID. But I'm okay with map too :)
This adds some synchronization which will definitely help with #189 |
Will rework this PR in multiple parts (one for the synchronous config change and better locking on add node, and the rest depending on what we decide for the safe barrier on add and remove nodes). @aaronlehmann @aluzzardi I think we should disallow removing nodes below a safe quorum number. Even if we can, this is totally not safe from an operational perspective and this should be discouraged, or at least we should add a Huge Warning saying: |
I'm okay with the warning, but my instinct is that it should be possible to have less than three managers. We want it to be easy to experiment with swarm on a developer scale, so a single-node setup should definitely be supported. And it's really weird if adding more managers means you can't remove them later. This constraint would make sense for production, but if someone's playing with swarm on a laptop, they should be able to add and remove managers at will. |
Current coverage is
|
@aaronlehmann @LK4D4 I rebased only adding the necessary synchronous conf change and lock on Join/Leave. I think this will get rid of the race happening in #189 on Join that I could reproduce locally but not anymore with the change. |
LGTM |
*api.RaftNode | ||
started bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is residue from the rebase and the hard checks, I'll remove it
@abronan: The summary says don't merge; is that out of date? |
Let me rebase, one PR was merged in the meantime :) |
// add them itself to its local list: grpc | ||
// call add from the node sending the conf | ||
// change | ||
// TODO(aaronl): send back store snapshot after join? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was an old comment but maybe outdated with your new PR, I can remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please do.
What happens if the admin removes a node from the cluster and tries to re-add it later? It looks like that wouldn't work with the code here, because the node ID can never be deleted from |
@aaronlehmann Yes, a node that leaves gracefully can never be added back to the cluster with the same ID, but even now without this PR we don't handle that case. This should be done server side to make sure that a node who wants to join again gets its ID assigned from the node submitting the conf change or the leader. Anyway, seems like the tests timed out. Not sure if the timeout is from raft or if the tests were really slow 😕 |
ping @abronan |
// The leader steps down | ||
if n.Config.ID == n.Leader() && n.Config.ID == conf.NodeID { | ||
if n.Config.ID == n.Leader() && n.Config.ID == cc.NodeID { | ||
n.Stop() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is calling n.Stop
really the right thing to do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns out the call is unnecessary, will remove it.
I have some concerns about the locking strategy. It seems that only I recommend that we move the cluster member list locking out of the |
c.lock.Unlock() | ||
defer c.lock.Unlock() | ||
c.removed[id] = true | ||
delete(c.members, id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should call Close
on the grpc connection.
client, err := GetRaftClient(node.Addr, 0) | ||
if err == nil { | ||
n.Cluster.AddPeer(&Peer{RaftNode: node, Client: client}) | ||
err = n.cluster.AddMember(&Member{RaftNode: node, Client: client}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to be careful not to overwrite an existing member, because this will leak the grpc connection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shouldn't happen because Register
is not called if the ID already exists in the cluster (this will throw ErrIDExists
to the caller and not call register in this case).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're right.
Man, I spent several hours trying to understand the Anyway, I finally found the problem, and it turns out to be very simple. The test is creating a new listener on the wildcard port for the restarted node, but it's not okay for the node to change its address. Before, this wasn't caught by the test because it wasn't doing anything that required consensus after restarting the node. Now, the joins are funneled through raft, so that was getting stuck. It also takes a bit longer for the state to converge after the restart. The following diff fixes the test: diff --git a/manager/state/raft_test.go b/manager/state/raft_test.go
index ccaf1ca..9959c2d 100644
--- a/manager/state/raft_test.go
+++ b/manager/state/raft_test.go
@@ -193,7 +193,7 @@ func newJoinNode(t *testing.T, join string, opts ...NewNodeOptions) *Node {
}
func restartNode(t *testing.T, n *Node, join string) *Node {
- l, err := net.Listen("tcp", "127.0.0.1:0")
+ l, err := net.Listen("tcp", n.Address)
require.NoError(t, err, "can't bind to raft service port")
s := grpc.NewServer()
@@ -709,7 +709,7 @@ func TestRaftRejoin(t *testing.T) {
values[1], err = proposeValue(t, nodes[1], "id2")
assert.NoError(t, err, "failed to propose value")
- time.Sleep(500 * time.Millisecond)
+ time.Sleep(2 * time.Second)
// Nodes 1 and 2 should have the new value
checkValues := func(checkNodes ...*Node) { However, I'm afraid the change to the |
Recording a few notes to myself (not for fixing in this PR, but in followups):
|
@abronan: Here's a proper fix for the test that wraps |
Thanks @aaronlehmann, will take care of the comments and include the fix. I guess we should open issues for your notes, as we should keep track of those somewhere and make sure we address it. Agreed on the consequent |
Signed-off-by: Alexandre Beslic <alexandre.beslic@gmail.com>
…a member Signed-off-by: Alexandre Beslic <alexandre.beslic@gmail.com>
Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
type Peer struct { | ||
// Member represents a raft cluster member | ||
type Member struct { | ||
l sync.RWMutex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mu
:)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is this used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a remain of the member count check (to not remove a node if this is would result in a number of nodes that might be unsafe and would lose the quorum) that was part of the original PR, I should remove it.
…n a connection to self Signed-off-by: Alexandre Beslic <alexandre.beslic@gmail.com>
…efore checkValues Signed-off-by: Alexandre Beslic <alexandre.beslic@gmail.com>
LGTM |
This is very WIP, don't merge it but I'll leave it around for now.
/cc @aaronlehmann
Signed-off-by: Alexandre Beslic alexandre.beslic@gmail.com