Join GitHub today
Partial network partitioning leads to cluster unavailability. #43183
Elasticsearch version: 6.8.0
Plugins installed: 
JVM version (
OS version (
Description of the problem including expected versus actual behavior:
I have a cluster of 5 nodes (node1, node2, node3, node4, node5). Their ids are in the following order (this is critical for master election, which I believe is the core of this failure): node2 < node3 < the rest of nodes.
When the cluster first starts, as expected, node2 is elected as a master, however, after a partial network partitioning occurs, isolating node2 from nodes 3,4, and 5, (while node 3 is able to communicate with all other nodes) the cluster is stuck in an unavailability status. In other words, no new master is elected and node2 steps down from being a master.
I believe the expected behavior is to have node3 become the new master as it is connected to a majority of nodes and it has the smallest id (after node2). This does not happen, however. From examining the logs it seems that node3 keeps trying to join node2 as it thinks it should be the master while it does not accept the join requests from nodes 3, 4, and 5.
Steps to reproduce:
In my setup I have 1 index with the following settings:
I have 5 nodes, and then once a partial network partitioning occurs (as described above), the system stays unavailable (I had waited for more than 1 hour and it stayed in that status) and it does not accept index operations.
***Log files are provided for the 5 nodes.
Can you describe more precisely what "partial network partition" means in this context? Which nodes can communicate with each other? Is the partition symmetric or asymmetric?
I think this is fixed in 7.x by #32006, and we're unlikely to take any action to address this in the 6.x series. Can you reproduce it in a more recent version?
A partial network partitioning is a network partitioning where two groups of nodes cannot communicate with each other, while there is a third group of nodes that can communicate with both.
I just realized that I accidentally wrote node3 instead of node1 in some places in the original post, sorry.
In my case the three groups are:
g1 members cannot communicate with g2 members (neither can g2 members communicate with g1 members) while g3 members can communicate with all other nodes (and all other nodes can communicate with g3). Hope this clarifies what I mean by partial network partitioning.
Will try to reproduce this in 7.x soon and will let you know how it goes.
Ok, this is a situation that master elections in 6.x and earlier are known not to tolerate, but in which 7.x and later clusters will still be able to elect a master. It also doesn't seem very likely to occur in the wild. It's always possible to find a partition that results in unavailability (e.g. disconnect all pairs of nodes) so we must make choices about the kinds of partition that are reasonable to tolerate. Typically we choose to concentrate on the ones that occur in practice in a properly-configured network, which don't look like this.
Closing this as there's no further action required here, but please do report back on your experiments with 7.x.