[CI] AutoFollowIT#testConflictingPatterns failing #35480

talevy · 2018-11-13T06:45:54Z

Hey all, not sure if this is real or not, but popped up on my PR test. May be worth investigating?

CI: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+pull-request/2600/

./gradlew :x-pack:plugin:ccr:internalClusterTest -Dtests.seed=4205CBCCCD58E520 -Dtests.class=org.elasticsearch.xpack.ccr.AutoFollowIT -Dtests.method="testConflictingPatterns" -Dtests.security.manager=true -Dtests.locale=es-PY -Dtests.timezone=Pacific/Majuro -Dcompiler.java=11 -Druntime.java=8

stacks

  1> Caused by: org.elasticsearch.transport.RemoteTransportException: [leader0][127.0.0.1:53885][indices:data/read/xpack/ccr/shard_changes[s]]
  1> Caused by: org.elasticsearch.index.shard.IndexShardClosedException: CurrentState[CLOSED] Closed
  1> 	at org.elasticsearch.index.shard.GlobalCheckpointListeners.close(GlobalCheckpointListeners.java:158) ~[elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
  1> 	at org.elasticsearch.core.internal.io.IOUtils.close(IOUtils.java:103) ~[elasticsearch-core-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
  1> 	at org.elasticsearch.core.internal.io.IOUtils.close(IOUtils.java:61) ~[elasticsearch-core-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
  1> 	at org.elasticsearch.index.shard.IndexShard.close(IndexShard.java:1187) ~[elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
  1> 	at org.elasticsearch.index.IndexService.closeShard(IndexService.java:436) ~[elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
  1> 	at org.elasticsearch.index.IndexService.removeShard(IndexService.java:419) ~[elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
  1> 	at org.elasticsearch.index.IndexService.close(IndexService.java:277) ~[elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
  1> 	at org.elasticsearch.indices.IndicesService.removeIndex(IndicesService.java:638) ~[elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
  1> 	at org.elasticsearch.indices.cluster.IndicesClusterStateService.deleteIndices(IndicesClusterStateService.java:286) ~[elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]

The text was updated successfully, but these errors were encountered:

elasticmachine · 2018-11-13T06:45:55Z

Pinging @elastic/es-distributed

javanna · 2018-11-21T18:15:35Z

Same test but a different failure here: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+multijob-darwin-compatibility/72 .

16:58:09 FAILURE 30.1s J0 | AutoFollowIT.testConflictingPatterns <<< FAILURES!
16:58:09    > Throwable #1: java.lang.AssertionError: 
16:58:09    > Expected: <0L>
16:58:09    >      but: was <1L>
16:58:09    > 	at __randomizedtesting.SeedInfo.seed([D307FB40E8829BD7:C074EF21CD3507C2]:0)
16:58:09    > 	at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
16:58:09    > 	at org.elasticsearch.xpack.ccr.AutoFollowIT.lambda$testConflictingPatterns$6(AutoFollowIT.java:217)
16:58:09    > 	at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:848)
16:58:09    > 	at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:822)
16:58:09    > 	at org.elasticsearch.xpack.ccr.AutoFollowIT.testConflictingPatterns(AutoFollowIT.java:214)
16:58:09    > 	at java.lang.Thread.run(Thread.java:748)
16:58:09    > 	Suppressed: java.lang.AssertionError: 
16:58:09    > Expected: <1L>
16:58:09    >      but: was <0L>
16:58:09    > 		at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
16:58:09    > 		at org.elasticsearch.xpack.ccr.AutoFollowIT.lambda$testConflictingPatterns$6(AutoFollowIT.java:216)
16:58:09    > 		at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:836)
16:58:09    > 		... 39 more
16:58:09    > 	Suppressed: java.lang.AssertionError: 
16:58:09    > Expected: <1L>
16:58:09    >      but: was <0L>
16:58:09    > 		at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
16:58:09    > 		at org.elasticsearch.xpack.ccr.AutoFollowIT.lambda$testConflictingPatterns$6(AutoFollowIT.java:216)
16:58:09    > 		at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:836)
16:58:09    > 		... 39 more
16:58:09    > 	Suppressed: java.lang.AssertionError: .....................

martijnvg · 2018-11-22T08:19:52Z

So the last test failure is caused by the fact that the auto follow coordinator failed fetching the history uuids when following an index. The second time when the auto follow coordinator attempts it, the history uuids are available and auto following succeeds. The test just does not expect that auto following fails the first time. I will think about how to address this.

Initial failure:

java.lang.IllegalArgumentException: leader index's commit stats are missing
  1>    at org.elasticsearch.xpack.ccr.CcrLicenseChecker.lambda$fetchLeaderHistoryUUIDs$6(CcrLicenseChecker.java:258) ~[main/:?]
  1>    at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:60) ~[elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
  1>    at org.elasticsearch.action.ActionListenerResponseHandler.handleResponse(ActionListenerResponseHandler.java:48) ~[elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
  1>    at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleResponse(TransportService.java:1119) ~[elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
  1>    at org.elasticsearch.transport.TcpTransport$2.doRun(TcpTransport.java:1297) ~[elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
  1>    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
  1>    at org.elasticsearch.common.util.concurrent.EsExecutors$1.execute(EsExecutors.java:140) ~[elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
  1>    at org.elasticsearch.transport.TcpTransport.handleResponse(TcpTransport.java:1289) ~[elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
  1>    at org.elasticsearch.transport.TcpTransport.messageReceived(TcpTransport.java:1244) ~[elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
  1>    at org.elasticsearch.transport.TcpTransport.inboundMessage(TcpTransport.java:1030) ~[elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
  1>    at org.elasticsearch.transport.TcpTransport.consumeNetworkReads(TcpTransport.java:1055) ~[elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
  1>    at org.elasticsearch.transport.MockTcpTransport.readMessage(MockTcpTransport.java:166) ~[framework-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
  1>    at org.elasticsearch.transport.MockTcpTransport.access$800(MockTcpTransport.java:74) ~[framework-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
  1>    at org.elasticsearch.transport.MockTcpTransport$MockChannel$2.lambda$doRun$0(MockTcpTransport.java:343) ~[framework-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
  1>    at org.elasticsearch.common.util.CancellableThreads.executeIO(CancellableThreads.java:106) [elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
  1>    at org.elasticsearch.transport.MockTcpTransport$MockChannel$2.doRun(MockTcpTransport.java:343) [framework-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
  1>    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
  1>    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_192]
  1>    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_192]
  1>    at java.lang.Thread.run(Thread.java:748) [?:1.8.0_192]

martijnvg · 2018-11-22T08:41:49Z

The leader index gets created and at the same time the auto follow coordinator fetch the remote cluster state. Not all primary shards have been allocated, because the create index request for leader index has not returned. This causes index following to fail, because not all primary shards have history uuid in its index shard stats.

I will work on a change that let the auto follow coordinator wait with auto following an index until all primary have been started. This should avoid these situations.

This change adds an extra check that verifies that all primary shards have been started of an index that is about to be auto followed. If not all primary shards have been started for an index then the next auto follow run will try to follow to auto follow this index again. Closes elastic#35480

…nator, it has nothing to do with the hlrc support for put auto follow pattern api, this test was added for. Relates to #35480

… may be removed AutoFollowCoordinator should take into account that after auto following an index and while updating that a leader index has been followed, that the auto follow pattern may have been removed via delete auto follow patters api. Closes elastic#35480

martijnvg · 2018-11-27T15:59:00Z

Another instance of this failure: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.5+multijob-darwin-compatibility/64/console

…35814) This change adds an extra check that verifies that all primary shards have been started of an index that is about to be auto followed. If not all primary shards have been started for an index then the next auto follow run will try to follow to auto follow this index again. Closes #35480

… may be removed (#35945) AutoFollowCoordinator should take into account that after auto following an index and while updating that a leader index has been followed, that the auto follow pattern may have been removed via delete auto follow patterns api. Also fixed a bug that when a remote cluster connection has been removed, the auto follow coordinator does not die when it tries get a remote client for that cluster. Closes #35480

talevy added >test-failure Triaged test failures from CI :Distributed/CCR Issues around the Cross Cluster State Replication features labels Nov 13, 2018

jasontedor assigned martijnvg Nov 21, 2018

martijnvg mentioned this issue Nov 22, 2018

[CCR] Only auto follow indices when all primary shards have started #35814

Merged

martijnvg added a commit that referenced this issue Nov 27, 2018

Muted test. This test expose an issue inside the auto follower coordi…

80a8c0a

…nator, it has nothing to do with the hlrc support for put auto follow pattern api, this test was added for. Relates to #35480

martijnvg added a commit that referenced this issue Nov 27, 2018

Muted test. This test expose an issue inside the auto follower coordi…

fabffb8

…nator, it has nothing to do with the hlrc support for put auto follow pattern api, this test was added for. Relates to #35480

martijnvg mentioned this issue Nov 27, 2018

[CCR] AutoFollowCoordinator should tolerate that auto follow patterns may be removed #35945

Merged

martijnvg closed this as completed in #35814 Nov 29, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI] AutoFollowIT#testConflictingPatterns failing #35480

[CI] AutoFollowIT#testConflictingPatterns failing #35480

talevy commented Nov 13, 2018 •

edited

elasticmachine commented Nov 13, 2018

javanna commented Nov 21, 2018

martijnvg commented Nov 22, 2018

martijnvg commented Nov 22, 2018

martijnvg commented Nov 27, 2018

[CI] AutoFollowIT#testConflictingPatterns failing #35480

[CI] AutoFollowIT#testConflictingPatterns failing #35480

Comments

talevy commented Nov 13, 2018 • edited

elasticmachine commented Nov 13, 2018

javanna commented Nov 21, 2018

martijnvg commented Nov 22, 2018

martijnvg commented Nov 22, 2018

martijnvg commented Nov 27, 2018

talevy commented Nov 13, 2018 •

edited