New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Election scheduler should not reset before publication complete #97909
Comments
Pinging @elastic/es-distributed (Team:Distributed) |
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
yeah, It's a expert setting. so I hope this problem can be solved inside the engine without manual parameter optimization. |
Today we close the election scheduler when the coordinator leaves mode `CANDIDATE`, before even starting the publication that establishes the election winner as the cluster master. If this publication subsequently fails then we start a new election scheduler with the original, short, timeout, and do not back off. With very high numbers of master-eligible nodes this can lead to constant election clashes that never resolve. We must count such failed publications as failed election attempts for election scheduling and backoff purposes. This commit keeps the election scheduler open until a published state is applied, which means we continue to back off until a publication has completed. Closes #97909
Today we close the election scheduler when the coordinator leaves mode `CANDIDATE`, before even starting the publication that establishes the election winner as the cluster master. If this publication subsequently fails then we start a new election scheduler with the original, short, timeout, and do not back off. With very high numbers of master-eligible nodes this can lead to constant election clashes that never resolve. We must count such failed publications as failed election attempts for election scheduling and backoff purposes. This commit keeps the election scheduler open until a published state is applied, which means we continue to back off until a publication has completed. Closes elastic#97909
Today we close the election scheduler when the coordinator leaves mode
CANDIDATE
, before even starting the publication that establishes the election winner as the cluster master. If this publication subsequently fails then we start a new election scheduler with the original, short, timeout, and do not back off. With very high numbers of master-eligible nodes this can lead to constant election clashes that never resolve. We must count such failed publications as failed election attempts for election scheduling and backoff purposes.More precisely, I think a node in mode
LEADER
should not count an election as truly successful until we've received join votes from all nodes in the cluster (seeorg.elasticsearch.cluster.coordination.CoordinationState#containsJoinVoteFor
) at the end of a fully-acked publication. That should be equivalent tocurrentTerm() == maxTermSeen
at the end of a fully-acked publication, since we increasemaxTermSeen
on a missing join vote.I'm not sure how a node in mode
FOLLOWER
should detect that the cluster is completely stable. The master could broadcast another message when it decides things are stable perhaps? Or maybe it would be good enough to base it on elapsed time (i.e. if we've beenFOLLOWER
in the same term for 60s)?Workaround
The simplest workaround is not to have so many master-eligible nodes. See these docs for more information:
The text was updated successfully, but these errors were encountered: