Sometimes raft deadlocks #189

LK4D4 · 2016-03-18T21:01:23Z

We saw it on ci: on 3-machine cluster creation sometimes nodes can't decide who is leader and hang on term 2 forever:

    node 1: term 2, leader 1
    node 2: term 2, leader 1
    node 3: term 2, leader 3

It appears on CI pretty often I think because of limited CPU time, I saw it on my machine as well when used smaller raft tick.
I dunno, maybe will be fixed by #131

The text was updated successfully, but these errors were encountered:

abronan · 2016-03-24T18:50:52Z

It's just a race on adding new nodes, this might indeed be fixed by mechanisms introduced in #131. The problem is that etcd rejects configuration changes submitted concurrently. One will pass, the other will get rejected. Thus node 3 tries to join but node 2 is still in the process of joining. node 3 join gets rejected and it ends up being alone thus electing itself as the leader.

aaronlehmann · 2016-04-06T00:36:17Z

I think this is fixed by the combination of #131 and #237. Okay to close?

LK4D4 mentioned this issue Mar 18, 2016

raft: synchronous raft conf change, safeguard on join/leave raft #131

Merged

abronan closed this as completed Apr 14, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sometimes raft deadlocks #189

Sometimes raft deadlocks #189

LK4D4 commented Mar 18, 2016

abronan commented Mar 24, 2016

aaronlehmann commented Apr 6, 2016

Sometimes raft deadlocks #189

Sometimes raft deadlocks #189

Comments

LK4D4 commented Mar 18, 2016

abronan commented Mar 24, 2016

aaronlehmann commented Apr 6, 2016