The cluster cannot complete the master election, resulting in the cluster being unavailable #98185

weizijun · 2023-08-04T03:20:29Z

Elasticsearch Version

master

Installed Plugins

No response

Java Version

bundled

OS Version

all

Problem Description

When the cluster has a lot of indexes, it may be impossible to choose the master.
This problem comes from the fact that the time cost of the 'selected-as-master task' plus the time cost of 'diff cluster state' exceeds the setting of cluster.election.initial_timeout (the default value is 100ms).
In this way, after the node election is successful, it times out when publishing, and enters the election of the next term. This process has been repeated, and the master cannot be selected.

The default value of cluster.election.initial_timeout is 100ms. Many users are not clear about this principle. Can this default value be adjusted to 1s or 2s? This may increase the election time, but has little impact on users. May I ask if there are other side effects of increasing this parameter?

Steps to Reproduce

The reproduced cluster created 16 nodes, which are both data nodes and master nodes, created 5000 empty indexes, and mapped about 300 rows.
Kill the current master, there will be a situation where the master cannot be selected.

Logs (if relevant)

scheduling scheduleNextElection{gracePeriod=0s, thisAttempt=0, maxDelayMillis=100, delayMillis=29, ElectionScheduler{attempt=1, ElectionSchedulerFactory{initialTimeout=100ms, backoffTime=100ms, maxTimeout=10s}}}
scheduling scheduleNextElection{gracePeriod=500ms, thisAttempt=1, maxDelayMillis=200, delayMillis=555, ElectionScheduler{attempt=2, ElectionSchedulerFactory{initialTimeout=100ms, backoffTime=100ms, maxTimeout=10s}}}
scheduling scheduleNextElection{gracePeriod=500ms, thisAttempt=2, maxDelayMillis=300, delayMillis=670, ElectionScheduler{attempt=3, ElectionSchedulerFactory{initialTimeout=100ms, backoffTime=100ms, maxTimeout=10s}}}
scheduling scheduleNextElection{gracePeriod=0s, thisAttempt=0, maxDelayMillis=100, delayMillis=10, ElectionScheduler{attempt=1, ElectionSchedulerFactory{initialTimeout=100ms, backoffTime=100ms, maxTimeout=10s}}}
scheduling scheduleNextElection{gracePeriod=500ms, thisAttempt=1, maxDelayMillis=200, delayMillis=631, ElectionScheduler{attempt=2, ElectionSchedulerFactory{initialTimeout=100ms, backoffTime=100ms, maxTimeout=10s}}}
scheduling scheduleNextElection{gracePeriod=0s, thisAttempt=0, maxDelayMillis=100, delayMillis=37, ElectionScheduler{attempt=1, ElectionSchedulerFactory{initialTimeout=100ms, backoffTime=100ms, maxTimeout=10s}}}
scheduling scheduleNextElection{gracePeriod=500ms, thisAttempt=1, maxDelayMillis=200, delayMillis=644, ElectionScheduler{attempt=2, ElectionSchedulerFactory{initialTimeout=100ms, backoffTime=100ms, maxTimeout=10s}}}

The text was updated successfully, but these errors were encountered:

DaveCTurner · 2023-08-04T06:41:04Z

I think this duplicates #97909 so I am closing this. It's strange, I'm pretty sure this problem has gone unnoticed for over 4 years, and then two of us have reported it within a few days of each other.

The preferred workaround is not to have so many master-eligible nodes. See these docs for more information:

However, it is good practice to limit the number of master-eligible nodes in the cluster to three. Master nodes do not scale like other node types since the cluster always elects just one of them as the master of the cluster. If there are too many master-eligible nodes then master elections may take a longer time to complete.

weizijun · 2023-08-04T06:46:13Z

The preferred workaround is not to have so many master-eligible nodes. See these docs for more information:

three master can also reproduce the case. I use more masters to better reproduce the case.

DaveCTurner · 2023-08-04T06:59:44Z

Yes I expect this could happen even with three masters too, if the cluster state is very very large and/or there's some other performance problem making the election process unreasonably slow. We haven't seen that happen in practice even on some of our very large clusters. Marking one of the three masters as voting-only should help too, but as a last resort you can try increasing cluster.election.initial_timeout.

DaveCTurner · 2023-08-10T10:54:35Z

Yes I expect this could happen even with three masters too, if the cluster state is very very large and/or there's some other performance problem making the election process unreasonably slow.

FWIW I am struggling to make this kind of election collision happen repeatedly with just three masters. The cluster state size doesn't matter, because the newly-elected leader suppresses other election attempts with lightweight follower checks, sent as soon as it is elected. As long as the follower checks are not delayed then the cluster stabilises pretty reliably. You have to be extremely unlucky, over and over again, for this not to happen.

In fact even with 16 masters the follower checks seem to prevent this kind of problem rather reliably. It's definitely theoretically possible for it to take a long time to stabilise, but in practice this hardly ever happens. I wonder if there's something else wrong in your setup.

weizijun added >bug needs:triage Requires assignment of a team area label labels Aug 4, 2023

DaveCTurner closed this as completed Aug 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The cluster cannot complete the master election, resulting in the cluster being unavailable #98185

The cluster cannot complete the master election, resulting in the cluster being unavailable #98185

weizijun commented Aug 4, 2023

DaveCTurner commented Aug 4, 2023

weizijun commented Aug 4, 2023

DaveCTurner commented Aug 4, 2023

DaveCTurner commented Aug 10, 2023

The cluster cannot complete the master election, resulting in the cluster being unavailable #98185

The cluster cannot complete the master election, resulting in the cluster being unavailable #98185

Comments

weizijun commented Aug 4, 2023

Elasticsearch Version

Installed Plugins

Java Version

OS Version

Problem Description

Steps to Reproduce

Logs (if relevant)

DaveCTurner commented Aug 4, 2023

weizijun commented Aug 4, 2023

DaveCTurner commented Aug 4, 2023

DaveCTurner commented Aug 10, 2023