Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

raft: Set the RecentActive flag for newly added nodes #7830

Merged
merged 2 commits into from
May 5, 2017

Conversation

aaronlehmann
Copy link
Contributor

I found that enabling the CheckQuorum flag led to spurious leader elections when new nodes joined. It looks like in the time between a new node joining the cluster, and that node first communicating with the leader, the quorum check could fail because the new node looks inactive. To solve this, set the RecentActive flag when nodes are first added. This gives a grace period for the node to communicate before it causes the quorum check to fail.

This seems like a decent way to solve the problem, but please feel free to suggest other approaches instead.

I found that enabling the CheckQuorum flag led to spurious leader
elections when new nodes joined. It looks like in the time between a new
node joining the cluster, and that node first communicating with the
leader, the quorum check could fail because the new node looks inactive.
To solve this, set the RecentActive flag when nodes are first added.
This gives a grace period for the node to communicate before it causes
the quorum check to fail.

Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
@aaronlehmann
Copy link
Contributor Author

Any thoughts on this propsed change? The extra leader elections when CheckQuorum is enabled are a bit of a pain point for me.

@xiang90
Copy link
Contributor

xiang90 commented May 4, 2017

can you write a test for this?

@xiang90
Copy link
Contributor

xiang90 commented May 4, 2017

this change looks reasonable.

This test verifies that adding a node does not cause the leader to step
down until at least one full ElectionTick cycle elapses.

Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
@aaronlehmann
Copy link
Contributor Author

Thanks. I've added a test.

@xiang90
Copy link
Contributor

xiang90 commented May 4, 2017

LGTM.

@xiang90 xiang90 merged commit db6f45e into etcd-io:master May 5, 2017
@xiang90
Copy link
Contributor

xiang90 commented May 5, 2017

@aaronlehmann Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

None yet

2 participants