Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Election scheduler should not reset before publication complete #97909

Closed
DaveCTurner opened this issue Jul 24, 2023 · 4 comments · Fixed by #98354
Closed

Election scheduler should not reset before publication complete #97909

DaveCTurner opened this issue Jul 24, 2023 · 4 comments · Fixed by #98354
Labels
>bug :Distributed/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. Team:Distributed Meta label for distributed team

Comments

@DaveCTurner
Copy link
Contributor

DaveCTurner commented Jul 24, 2023

Today we close the election scheduler when the coordinator leaves mode CANDIDATE, before even starting the publication that establishes the election winner as the cluster master. If this publication subsequently fails then we start a new election scheduler with the original, short, timeout, and do not back off. With very high numbers of master-eligible nodes this can lead to constant election clashes that never resolve. We must count such failed publications as failed election attempts for election scheduling and backoff purposes.

More precisely, I think a node in mode LEADER should not count an election as truly successful until we've received join votes from all nodes in the cluster (see org.elasticsearch.cluster.coordination.CoordinationState#containsJoinVoteFor) at the end of a fully-acked publication. That should be equivalent to currentTerm() == maxTermSeen at the end of a fully-acked publication, since we increase maxTermSeen on a missing join vote.

I'm not sure how a node in mode FOLLOWER should detect that the cluster is completely stable. The master could broadcast another message when it decides things are stable perhaps? Or maybe it would be good enough to base it on elapsed time (i.e. if we've been FOLLOWER in the same term for 60s)?


Workaround

The simplest workaround is not to have so many master-eligible nodes. See these docs for more information:

However, it is good practice to limit the number of master-eligible nodes in the cluster to three. Master nodes do not scale like other node types since the cluster always elects just one of them as the master of the cluster. If there are too many master-eligible nodes then master elections may take a longer time to complete.

@DaveCTurner DaveCTurner added >bug :Distributed/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. labels Jul 24, 2023
@elasticsearchmachine elasticsearchmachine added the Team:Distributed Meta label for distributed team label Jul 24, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@weizijun

This comment was marked as off-topic.

@DaveCTurner

This comment was marked as off-topic.

@weizijun
Copy link
Contributor

weizijun commented Aug 4, 2023

@weizijun that should not be necessary if you have 3 master nodes unless you have a particularly unusual configuration, and I do not want other readers to adjust this expert setting without extremely careful consideration, so I hope you don't mind that I am going to hide your comment.

yeah, It's a expert setting. so I hope this problem can be solved inside the engine without manual parameter optimization.

elasticsearchmachine pushed a commit that referenced this issue Aug 14, 2023
Today we close the election scheduler when the coordinator leaves mode
`CANDIDATE`, before even starting the publication that establishes the
election winner as the cluster master. If this publication subsequently
fails then we start a new election scheduler with the original, short,
timeout, and do not back off. With very high numbers of master-eligible
nodes this can lead to constant election clashes that never resolve. We
must count such failed publications as failed election attempts for
election scheduling and backoff purposes.

This commit keeps the election scheduler open until a published state is
applied, which means we continue to back off until a publication has
completed.

Closes #97909
csoulios pushed a commit to csoulios/elasticsearch that referenced this issue Aug 18, 2023
Today we close the election scheduler when the coordinator leaves mode
`CANDIDATE`, before even starting the publication that establishes the
election winner as the cluster master. If this publication subsequently
fails then we start a new election scheduler with the original, short,
timeout, and do not back off. With very high numbers of master-eligible
nodes this can lead to constant election clashes that never resolve. We
must count such failed publications as failed election attempts for
election scheduling and backoff purposes.

This commit keeps the election scheduler open until a published state is
applied, which means we continue to back off until a publication has
completed.

Closes elastic#97909
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. Team:Distributed Meta label for distributed team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants