[CI] MinimumMasterNodesIT.testMultipleNodesShutdownNonMasterNodes sporadically fails #22120

imotov · 2016-12-12T17:04:46Z

The testMultipleNodesShutdownNonMasterNodes test in org.elasticsearch.cluster.MinimumMasterNodesIT suite started to fail regularly. It failed 6 times over the weekend and once today. I cannot reproduce it locally.

The failure is

  1> [2016-12-12T09:11:56,643][INFO ][o.e.c.MinimumMasterNodesIT] [testMultipleNodesShutdownNonMasterNodes]: after test
ERROR   4.89s J1 | MinimumMasterNodesIT.testMultipleNodesShutdownNonMasterNodes <<< FAILURES!
   > Throwable #1: ClusterBlockException[blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized];]
   >    at __randomizedtesting.SeedInfo.seed([4489C6FFC6B9860E:F68A1C476DB0C1A]:0)
   >    at org.elasticsearch.cluster.block.ClusterBlocks.globalBlockedException(ClusterBlocks.java:161)
   >    at org.elasticsearch.cluster.block.ClusterBlocks.globalBlockedRaiseException(ClusterBlocks.java:147)
   >    at org.elasticsearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:92)
   >    at org.elasticsearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:52)
   >    at org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:170)
   >    at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:142)
   >    at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:84)
   >    at org.elasticsearch.client.node.NodeClient.executeLocally(NodeClient.java:75)
   >    at org.elasticsearch.client.node.NodeClient.doExecute(NodeClient.java:64)
   >    at org.elasticsearch.client.support.AbstractClient.execute(AbstractClient.java:400)
   >    at org.elasticsearch.action.ActionRequestBuilder.execute(ActionRequestBuilder.java:77)
   >    at org.elasticsearch.action.ActionRequestBuilder.execute(ActionRequestBuilder.java:51)
   >    at org.elasticsearch.cluster.MinimumMasterNodesIT.testMultipleNodesShutdownNonMasterNodes(MinimumMasterNodesIT.java:264)
   >    at java.lang.Thread.run(Thread.java:745)

The text was updated successfully, but these errors were encountered:

javanna · 2016-12-12T18:48:37Z

Not sure this is happens sporadically, I saw this a couple of times today out of probably 10 runs.

…master PR elastic#22049 changed the node update logic to never remove nodes from the cluster state when the cluster state is not published. This led to an issue electing a master (elastic#22120) based on nodes with same transport address (but different node id) as previous nodes. The joining nodes should take precedence over conflicting ones. Note that this only applies to the action of becoming master. If a master is established and a node joins an existing master, it will be rejected if there is another node with same transport address.

ywelsch · 2016-12-13T17:42:23Z

This test failure is triggered by PR #22049 which simplifies the cluster state update logic to only change the list of disco-nodes in the cluster state when said cluster state is going to be published. This means that a node cannot just locally change its cluster state and state changes always have to go through an active master. The test failure here highlights a few issues that are not only bound to happen with the above change (but which makes them more frequent):

Validating full transport connections: Pinging reuses an existing connection when it's available. It does not re-check however whether the node on the other side is who it thinks it is when reconnecting. This is an issue if the cluster state has disco-nodes in it that are dead and when there are new live nodes which have the same transport address but different node id as the dead nodes and the pinging just reconnects to the different node.
Master election gets confused by dead nodes: The reason is that master election proceeds in two steps: 1) Pinging to determine a master and collecting joins 2) Publishing a cluster state which sets the new master and which needs "approval" from the master-eligible nodes through the process of committing. These two steps can disagree, however, when we elect a master based on nodes with same transport address (but different node id) as previous nodes which are now dead. I've opened a PR (Prefer joining node with conflicting transport address when becoming master #22134) so that the live nodes take precedence over the dead nodes when completing the master-election process, so that the cluster state electing the master is published and committed by the right nodes.
Async shard fetching is brittle in presence of the issues above. The test actually fails because it ends up with unassigned shards. The reason for this is that the update logic for the cache used by the async shard fetching does not properly handle the scenario where target nodes of a shard fetch do not correspond to the nodes that the master (who is issuing the fetch) believes them to be. When the master starts fetching it adds a NodeEntry to the cache keyed by the node id. When a response is then received, it takes the node id that is contained in the response object to resolve the cache entry. In a situation where a node has a different node id than what the master believes it to have but the same transport address, the response will just be ignored and also prevent future fetching on this node id as the master thinks that fetching is still going on for that node. As such shard fetching will never complete and shards cannot be allocated as they're waiting on the fetching to complete.

…master (#22134) PR #22049 changed the node update logic to never remove nodes from the cluster state when the cluster state is not published. This led to an issue electing a master (#22120) based on nodes with same transport address (but different node id) as previous nodes. The joining nodes should take precedence over conflicting ones. Note that this only applies to the action of becoming master. If a master is established and a node joins an existing master, it will be rejected if there is another node with same transport address.

The `UnicastZenPing` shows it's age and is the result of many small changes. The current state of affairs is confusing and is hard to reason about. This PR cleans it up (while following the same original intentions). Highlights of the changes are: 1) Clear 3 round flow - no interleaving of scheduling. 2) The previous implementation did a best effort attempt to wait for ongoing pings to be sent and completed. The pings were guaranteed to complete because each used the total ping duration as a timeout. This did make it hard to reason about the total ping duration and the flow of the code. All of this is removed now and ping should just complete within the given duration or not be counted (note that it was very handy for testing, but I move the needed sync logic to the test). 3) Because of (2) the pinging scheduling changed a bit, to give a chance for the last round to complete. We now ping at the beginning, 1/3 and 2/3 of the duration. 4) To offset for (3) a bit, incoming ping requests are now added to on going ping collections. 5) UnicastZenPing never establishes full blown connections (but does reuse them if there). Relates to #22120 6) Discovery host providers are only used once per pinging round. Closes #21739 7) Usage of the ability to open a connection without connecting to a node ( #22194 ) and shorter connection timeouts helps with connections piling up. Closes #19370 8) Beefed up testing and sped them up. 9) removed light profile from production code

…master (elastic#22134) PR elastic#22049 changed the node update logic to never remove nodes from the cluster state when the cluster state is not published. This led to an issue electing a master (elastic#22120) based on nodes with same transport address (but different node id) as previous nodes. The joining nodes should take precedence over conflicting ones. Note that this only applies to the action of becoming master. If a master is established and a node joins an existing master, it will be rejected if there is another node with same transport address.

The `UnicastZenPing` shows it's age and is the result of many small changes. The current state of affairs is confusing and is hard to reason about. This PR cleans it up (while following the same original intentions). Highlights of the changes are: 1) Clear 3 round flow - no interleaving of scheduling. 2) The previous implementation did a best effort attempt to wait for ongoing pings to be sent and completed. The pings were guaranteed to complete because each used the total ping duration as a timeout. This did make it hard to reason about the total ping duration and the flow of the code. All of this is removed now and ping should just complete within the given duration or not be counted (note that it was very handy for testing, but I move the needed sync logic to the test). 3) Because of (2) the pinging scheduling changed a bit, to give a chance for the last round to complete. We now ping at the beginning, 1/3 and 2/3 of the duration. 4) To offset for (3) a bit, incoming ping requests are now added to on going ping collections. 5) UnicastZenPing never establishes full blown connections (but does reuse them if there). Relates to #22120 6) Discovery host providers are only used once per pinging round. Closes #21739 7) Usage of the ability to open a connection without connecting to a node ( #22194 ) and shorter connection timeouts helps with connections piling up. Closes #19370 8) Beefed up testing and sped them up. 9) removed light profile from production code

…master (#22134) PR #22049 changed the node update logic to never remove nodes from the cluster state when the cluster state is not published. This led to an issue electing a master (#22120) based on nodes with same transport address (but different node id) as previous nodes. The joining nodes should take precedence over conflicting ones. Note that this only applies to the action of becoming master. If a master is established and a node joins an existing master, it will be rejected if there is another node with same transport address.

jasontedor · 2017-03-17T23:36:18Z

Closed by #22134

javanna · 2017-04-10T09:24:24Z

one more failure of this test: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+multijob-unix-compatibility/os=sles/814/ .

Should I open a separate issue for it?

ywelsch · 2017-04-10T09:45:23Z

Thanks for the notification @javanna. I've pushed 12471c4 as a follow-up to a3cceb8 (see #23922), which should fix this. If more failures of this occur after this commit, feel free to open a separate issue.

imotov added :Distributed/Discovery-Plugins Anything related to our integration plugins with EC2, GCP and Azure >test Issues or PRs that are addressing/adding tests labels Dec 12, 2016

imotov assigned bleskes Dec 12, 2016

ywelsch mentioned this issue Dec 13, 2016

Prefer joining node with conflicting transport address when becoming master #22134

Merged

bleskes mentioned this issue Dec 20, 2016

Simplify Unicast Zen Ping #22277

Merged

jasontedor closed this as completed Mar 17, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI] MinimumMasterNodesIT.testMultipleNodesShutdownNonMasterNodes sporadically fails #22120

[CI] MinimumMasterNodesIT.testMultipleNodesShutdownNonMasterNodes sporadically fails #22120

imotov commented Dec 12, 2016

javanna commented Dec 12, 2016

ywelsch commented Dec 13, 2016 •

edited

Loading

jasontedor commented Mar 17, 2017

javanna commented Apr 10, 2017

ywelsch commented Apr 10, 2017

[CI] MinimumMasterNodesIT.testMultipleNodesShutdownNonMasterNodes sporadically fails #22120

[CI] MinimumMasterNodesIT.testMultipleNodesShutdownNonMasterNodes sporadically fails #22120

Comments

imotov commented Dec 12, 2016

javanna commented Dec 12, 2016

ywelsch commented Dec 13, 2016 • edited Loading

jasontedor commented Mar 17, 2017

javanna commented Apr 10, 2017

ywelsch commented Apr 10, 2017

ywelsch commented Dec 13, 2016 •

edited

Loading