-
Notifications
You must be signed in to change notification settings - Fork 24.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Met split-brain issue when the azure vm connection was lost #13727
Comments
Thanks for reporting. Which is ES version are you using? |
My elastic search version is 1.3.2. |
This also happened to my 1.3.9 version of elastic search cluster |
I see. So I suspect this is #2488, which was fixed in 1.4 . I suggest you upgrade (to 1.7.2) as soon as possible. Many many things have been fixed since 1.3.2. |
I'm closing this now. Please reopen if it happens again after upgrading... |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi all,
Recently we met 3 times split-brain issue of my elasticsearch cluster hold on azure VM.
I have three nodes, each node can be data//master node. I have set the discovery.zen.minimum_master_nodes to 2 and use azure discovery plugin.
Recently azure often maintain their host machines, this cause the network instability between VMs and also caused split-brain issue for my elasticsearch cluster.
We found that the node which has already joined in to one master can be forced to rejoin to another master, and then the split-brain issue happened!
From the log we can see that:
09-22 16:38:27, node1 lost connection.
09-22 16:45:15, node2 lost connection, node 3 became master(node1 has recovered)
09-22 16:45:24, node3 lost connection, node 2 became master.
09-22 16:45:43, node3 recovered and became master again!.
Below is the error or warns for these three nodes when is split-brain issue happened:
Node1(search-prod-wus1):
[2015-09-22 16:45:15,618][INFO ][cluster.service ] [caps-prod-wus1] master {new [caps-prod-wus3][qQYWMttZScaqbAfBPkV5gw][search-prod-wbu][inet[/10.3.0.6:9300]], previous [caps-prod-wus2][9v_FW_4AQ7KQ4fA5CLPdTg][search-prod-wus][inet[/10.3.0.4:9300]]}, removed {[caps-prod-wus2][9v_FW_4AQ7KQ4fA5CLPdTg][search-prod-wus][inet[/10.3.0.4:9300]],}, reason: zen-disco-receive(from master [[caps-prod-wus3][qQYWMttZScaqbAfBPkV5gw][search-prod-wbu][inet[/10.3.0.6:9300]]])
[2015-09-22 16:45:20,566][WARN ][index.store ] [caps-prod-wus1] [32c3c289eef54e42be5913a63dfd280a][2] Can't open file to read checksums
java.io.FileNotFoundException: No such file [_89i0_es090_0.doc]
[2015-09-22 16:45:21,019][WARN ][index.store ] [caps-prod-wus1] [32c3c289eef54e42be5913a63dfd280a][2] Can't open file to read checksums
java.io.FileNotFoundException: No such file [_89i0_es090_0.doc]
[2015-09-22 16:45:24,364][INFO ][cluster.service ] [caps-prod-wus1] master {new [caps-prod-wus2][9v_FW_4AQ7KQ4fA5CLPdTg][search-prod-wus][inet[/10.3.0.4:9300]], previous [caps-prod-wus3][qQYWMttZScaqbAfBPkV5gw][search-prod-wbu][inet[/10.3.0.6:9300]]}, removed {[caps-prod-wus3][qQYWMttZScaqbAfBPkV5gw][search-prod-wbu][inet[/10.3.0.6:9300]],}, added {[caps-prod-wus2][9v_FW_4AQ7KQ4fA5CLPdTg][search-prod-wus][inet[/10.3.0.4:9300]],}, reason: zen-disco-receive(from master [[caps-prod-wus2][9v_FW_4AQ7KQ4fA5CLPdTg][search-prod-wus][inet[/10.3.0.4:9300]]])
[2015-09-22 16:45:43,059][INFO ][cluster.service ] [caps-prod-wus1] master {new [caps-prod-wus3][qQYWMttZScaqbAfBPkV5gw][search-prod-wbu][inet[/10.3.0.6:9300]], previous [caps-prod-wus2][9v_FW_4AQ7KQ4fA5CLPdTg][search-prod-wus][inet[/10.3.0.4:9300]]}, removed {[caps-prod-wus2][9v_FW_4AQ7KQ4fA5CLPdTg][search-prod-wus][inet[/10.3.0.4:9300]],}, added {[caps-prod-wus3][qQYWMttZScaqbAfBPkV5gw][search-prod-wbu][inet[/10.3.0.6:9300]],}, reason: zen-disco-receive(from master [[caps-prod-wus3][qQYWMttZScaqbAfBPkV5gw][search-prod-wbu][inet[/10.3.0.6:9300]]])
Nodes(search-prod-wus2):
[2015-09-22 16:38:37,162][WARN ][action.index ] [caps-prod-wus2] Failed to perform index on remote replica [caps-prod-wus1][vcBwbCeJTQ61Mw2aOqaflg][search-prod-wbp][inet[/10.3.0.5:9300]][32c3c289eef54e42be5913a63dfd280a][3]
org.elasticsearch.transport.NodeDisconnectedException: [caps-prod-wus1][inet[/10.3.0.5:9300]][index/replica] disconnected
[2015-09-22 16:38:37,162][WARN ][cluster.action.shard ] [caps-prod-wus2] [32c3c289eef54e42be5913a63dfd280a][3] sending failed shard for [32c3c289eef54e42be5913a63dfd280a][3], node[vcBwbCeJTQ61Mw2aOqaflg], [R], s[STARTED], indexUUID [vLXa7OmETyGPGrI82L4SVg], reason [Failed to perform [index] on replica, message [NodeDisconnectedException[[caps-prod-wus1][inet[/10.3.0.5:9300]][index/replica] disconnected]]]
[2015-09-22 16:38:37,162][WARN ][action.index ] [caps-prod-wus2] Failed to perform index on remote replica [caps-prod-wus1][vcBwbCeJTQ61Mw2aOqaflg][search-prod-wbp][inet[/10.3.0.5:9300]][32c3c289eef54e42be5913a63dfd280a][3]
org.elasticsearch.transport.SendRequestTransportException: [caps-prod-wus1][inet[/10.3.0.5:9300]][index/replica]
[2015-09-22 16:38:37,162][WARN ][cluster.action.shard ] [caps-prod-wus2] [32c3c289eef54e42be5913a63dfd280a][3] sending failed shard for [32c3c289eef54e42be5913a63dfd280a][3], node[vcBwbCeJTQ61Mw2aOqaflg], [R], s[STARTED], indexUUID [vLXa7OmETyGPGrI82L4SVg], reason [Failed to perform [index] on replica, message [SendRequestTransportException[[caps-prod-wus1][inet[/10.3.0.5:9300]][index/replica]]; nested: NodeNotConnectedException[[caps-prod-wus1][inet[/10.3.0.5:9300]] Node not connected]; ]]
[2015-09-22 16:45:24,005][INFO ][cluster.service ] [caps-prod-wus2] removed {[caps-prod-wus3][qQYWMttZScaqbAfBPkV5gw][search-prod-wbu][inet[/10.3.0.6:9300]],}, reason: zen-disco-node_failed([caps-prod-wus3][qQYWMttZScaqbAfBPkV5gw][search-prod-wbu][inet[/10.3.0.6:9300]]), reason transport disconnected (with verified connect)
Node3(search-prod-wus3):
[2015-09-22 16:38:38,677][WARN ][action.index ] [caps-prod-wus3] Failed to perform index on remote replica [caps-prod-wus1][vcBwbCeJTQ61Mw2aOqaflg][search-prod-wbp][inet[/10.3.0.5:9300]][32c3c289eef54e42be5913a63dfd280a][2]
org.elasticsearch.transport.SendRequestTransportException: [caps-prod-wus1][inet[/10.3.0.5:9300]][index/replica]
[2015-09-22 16:45:15,156][INFO ][discovery.azure ] [caps-prod-wus3] master_left [[caps-prod-wus2][9v_FW_4AQ7KQ4fA5CLPdTg][search-prod-wus][inet[/10.3.0.4:9300]]], reason [transport disconnected (with verified connect)]
[2015-09-22 16:45:15,156][INFO ][cluster.service ] [caps-prod-wus3] master {new [caps-prod-wus3][qQYWMttZScaqbAfBPkV5gw][search-prod-wbu][inet[/10.3.0.6:9300]], previous [caps-prod-wus2][9v_FW_4AQ7KQ4fA5CLPdTg][search-prod-wus][inet[/10.3.0.4:9300]]}, removed {[caps-prod-wus2][9v_FW_4AQ7KQ4fA5CLPdTg][search-prod-wus][inet[/10.3.0.4:9300]],}, reason: zen-disco-master_failed ([caps-prod-wus2][9v_FW_4AQ7KQ4fA5CLPdTg][search-prod-wus][inet[/10.3.0.4:9300]])
[2015-09-22 16:45:43,126][WARN ][index.store ] [caps-prod-wus3] [0222d67c4146405497a70df65629e634][0] Can't open file to read checksums
java.io.FileNotFoundException: No such file [_3app.fdt]
The text was updated successfully, but these errors were encountered: