Election scheduler should not reset before publication complete #97909

DaveCTurner · 2023-07-24T20:37:13Z

Today we close the election scheduler when the coordinator leaves mode CANDIDATE, before even starting the publication that establishes the election winner as the cluster master. If this publication subsequently fails then we start a new election scheduler with the original, short, timeout, and do not back off. With very high numbers of master-eligible nodes this can lead to constant election clashes that never resolve. We must count such failed publications as failed election attempts for election scheduling and backoff purposes.

More precisely, I think a node in mode LEADER should not count an election as truly successful until we've received join votes from all nodes in the cluster (see org.elasticsearch.cluster.coordination.CoordinationState#containsJoinVoteFor) at the end of a fully-acked publication. That should be equivalent to currentTerm() == maxTermSeen at the end of a fully-acked publication, since we increase maxTermSeen on a missing join vote.

I'm not sure how a node in mode FOLLOWER should detect that the cluster is completely stable. The master could broadcast another message when it decides things are stable perhaps? Or maybe it would be good enough to base it on elapsed time (i.e. if we've been FOLLOWER in the same term for 60s)?

Workaround

The simplest workaround is not to have so many master-eligible nodes. See these docs for more information:

However, it is good practice to limit the number of master-eligible nodes in the cluster to three. Master nodes do not scale like other node types since the cluster always elects just one of them as the master of the cluster. If there are too many master-eligible nodes then master elections may take a longer time to complete.

The text was updated successfully, but these errors were encountered:

elasticsearchmachine · 2023-07-24T20:37:38Z

Pinging @elastic/es-distributed (Team:Distributed)

weizijun · 2023-08-04T07:14:13Z

@weizijun that should not be necessary if you have 3 master nodes unless you have a particularly unusual configuration, and I do not want other readers to adjust this expert setting without extremely careful consideration, so I hope you don't mind that I am going to hide your comment.

yeah, It's a expert setting. so I hope this problem can be solved inside the engine without manual parameter optimization.

Today we close the election scheduler when the coordinator leaves mode `CANDIDATE`, before even starting the publication that establishes the election winner as the cluster master. If this publication subsequently fails then we start a new election scheduler with the original, short, timeout, and do not back off. With very high numbers of master-eligible nodes this can lead to constant election clashes that never resolve. We must count such failed publications as failed election attempts for election scheduling and backoff purposes. This commit keeps the election scheduler open until a published state is applied, which means we continue to back off until a publication has completed. Closes #97909

Today we close the election scheduler when the coordinator leaves mode `CANDIDATE`, before even starting the publication that establishes the election winner as the cluster master. If this publication subsequently fails then we start a new election scheduler with the original, short, timeout, and do not back off. With very high numbers of master-eligible nodes this can lead to constant election clashes that never resolve. We must count such failed publications as failed election attempts for election scheduling and backoff purposes. This commit keeps the election scheduler open until a published state is applied, which means we continue to back off until a publication has completed. Closes elastic#97909

DaveCTurner added >bug :Distributed/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. labels Jul 24, 2023

elasticsearchmachine added the Team:Distributed Meta label for distributed team label Jul 24, 2023

DaveCTurner mentioned this issue Aug 4, 2023

The cluster cannot complete the master election, resulting in the cluster being unavailable #98185

Closed

This comment was marked as off-topic.

Sign in to view

DaveCTurner mentioned this issue Aug 10, 2023

Improve reliability of elections with message delays #98354

Merged

elasticsearchmachine closed this as completed in #98354 Aug 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Election scheduler should not reset before publication complete #97909

Election scheduler should not reset before publication complete #97909

DaveCTurner commented Jul 24, 2023 •

edited

elasticsearchmachine commented Jul 24, 2023

This comment was marked as off-topic.

This comment was marked as off-topic.

weizijun commented Aug 4, 2023

Election scheduler should not reset before publication complete #97909

Election scheduler should not reset before publication complete #97909

Comments

DaveCTurner commented Jul 24, 2023 • edited

Workaround

elasticsearchmachine commented Jul 24, 2023

This comment was marked as off-topic.

This comment was marked as off-topic.

weizijun commented Aug 4, 2023

DaveCTurner commented Jul 24, 2023 •

edited