[CI] IndexLifecycleRunnerTests.testRunStateChangePolicyWithAsyncActionNextStep failed #40018

bizybot · 2019-03-13T23:57:46Z

The test IndexLifecycleRunnerTests.testRunStateChangePolicyWithAsyncActionNextStep failed in https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.7+multijob-unix-compatibility/os=oraclelinux-7/71/console

I could not reproduce this locally but with multiple iterations, it does reproduce sporadically.

The repro command was:

./gradlew :x-pack:plugin:ilm:unitTest \
-Dtests.seed=5254DAB622BCB27F \
-Dtests.class=org.elasticsearch.xpack.indexlifecycle.IndexLifecycleRunnerTests \
-Dtests.method="testRunStateChangePolicyWithAsyncActionNextStep" \
-Dtests.security.manager=true \
-Dtests.locale=sr-CS \
-Dtests.timezone=Etc/Universal \
-Dcompiler.java=11 \
-Druntime.java=8

The text was updated successfully, but these errors were encountered:

elasticmachine · 2019-03-13T23:57:47Z

Pinging @elastic/es-core-features

talevy · 2019-03-14T14:45:47Z

From the stacktrace, the two lines that look fishy to me are:

Unable to lock JVM Memory: error=12, reason=Cannot allocate memory
...
  2> SEVERE: 3 threads leaked from SUITE scope at org.elasticsearch.xpack.indexlifecycle.IndexLifecycleRunnerTests: 
  1> [2019-03-13T21:48:28,386][WARN ][o.e.b.JNANatives         ] [[SUITE-IndexLifecycleRunnerTests-seed#[5254DAB622BCB27F]]] These can be adjusted by modifying /etc/security/limits.conf, for example: 
  2>    1) Thread[id=50, name=elasticsearch[test][masterService#updateTask][T#1], state=WAITING, group=TGRP-IndexLifecycleRunnerTests]
  2>         at sun.misc.Unsafe.park(Native Method)
  1> 	# allow user 'jenkins' mlockall
  1> 	jenkins soft memlock unlimited
  2>         at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
  1> 	jenkins hard memlock unlimited

failing suite stacktrace

[ant:junit4] JVM J3: stderr was not empty, see: /var/lib/jenkins/workspace/elastic+elasticsearch+6.7+multijob-unix-compatibility/os/oraclelinux-7/x-pack/plugin/ilm/build/testrun/unitTest/temp/junit4-J3-20190313_214826_2911127990975592378922.syserr
Suite: org.elasticsearch.xpack.indexlifecycle.IndexLifecycleRunnerTests
  1> [2019-03-13T21:48:28,380][WARN ][o.e.b.JNANatives         ] [[SUITE-IndexLifecycleRunnerTests-seed#[5254DAB622BCB27F]]] Unable to lock JVM Memory: error=12, reason=Cannot allocate memory
  2> REPRODUCE WITH: ./gradlew :x-pack:plugin:ilm:unitTest -Dtests.seed=5254DAB622BCB27F -Dtests.class=org.elasticsearch.xpack.indexlifecycle.IndexLifecycleRunnerTests -Dtests.method="testRunStateChangePolicyWithAsyncActionNextStep" -Dtests.security.manager=true -Dtests.locale=sr-CS -Dtests.timezone=Etc/Universal -Dcompiler.java=11 -Druntime.java=8
  2> NOTE: leaving temporary files on disk at: /var/lib/jenkins/workspace/elastic+elasticsearch+6.7+multijob-unix-compatibility/os/oraclelinux-7/x-pack/plugin/ilm/build/testrun/unitTest/J5/temp/org.elasticsearch.xpack.indexlifecycle.IndexLifecycleRunnerTests_5254DAB622BCB27F-001
  1> [2019-03-13T21:48:28,385][WARN ][o.e.b.JNANatives         ] [[SUITE-IndexLifecycleRunnerTests-seed#[5254DAB622BCB27F]]] This can result in part of the JVM being swapped out.
  2> Mar 13, 2019 9:48:37 PM com.carrotsearch.randomizedtesting.ThreadLeakControl checkThreadLeaks
  2> WARNING: Will linger awaiting termination of 3 leaked thread(s).
  2> Mar 13, 2019 9:48:42 PM com.carrotsearch.randomizedtesting.ThreadLeakControl checkThreadLeaks
  1> [2019-03-13T21:48:28,386][WARN ][o.e.b.JNANatives         ] [[SUITE-IndexLifecycleRunnerTests-seed#[5254DAB622BCB27F]]] Increase RLIMIT_MEMLOCK, soft limit: 65536, hard limit: 65536
  2> SEVERE: 3 threads leaked from SUITE scope at org.elasticsearch.xpack.indexlifecycle.IndexLifecycleRunnerTests: 
  1> [2019-03-13T21:48:28,386][WARN ][o.e.b.JNANatives         ] [[SUITE-IndexLifecycleRunnerTests-seed#[5254DAB622BCB27F]]] These can be adjusted by modifying /etc/security/limits.conf, for example: 
  2>    1) Thread[id=50, name=elasticsearch[test][masterService#updateTask][T#1], state=WAITING, group=TGRP-IndexLifecycleRunnerTests]
  2>         at sun.misc.Unsafe.park(Native Method)
  1> 	# allow user 'jenkins' mlockall
  1> 	jenkins soft memlock unlimited
  2>         at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
  1> 	jenkins hard memlock unlimited
  1> [2019-03-13T21:48:28,387][WARN ][o.e.b.JNANatives         ] [[SUITE-IndexLifecycleRunnerTests-seed#[5254DAB622BCB27F]]] If you are logged in interactively, you will have to re-login for the new limits to take effect.
  2>         at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
  2>         at java.util.concurrent.PriorityBlockingQueue.take(PriorityBlockingQueue.java:550)
  2>         at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
  2>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
  2>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
  1> [2019-03-13T21:48:30,320][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testMoveClusterStateToErrorStep] before test
  2>         at java.lang.Thread.run(Thread.java:748)
  1> [2019-03-13T21:48:30,952][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testMoveClusterStateToErrorStep] after test
  1> [2019-03-13T21:48:30,986][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testMoveClusterStateToNextStepSamePhase] before test
  2>    2) Thread[id=49, name=elasticsearch[test][clusterApplierService#updateTask][T#1], state=WAITING, group=TGRP-IndexLifecycleRunnerTests]
  1> [2019-03-13T21:48:31,006][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testMoveClusterStateToNextStepSamePhase] after test
  1> [2019-03-13T21:48:31,012][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testMoveClusterStateToFailedStep] before test
  2>         at sun.misc.Unsafe.park(Native Method)
  1> [2019-03-13T21:48:31,362][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testMoveClusterStateToFailedStep] after test
  1> [2019-03-13T21:48:31,368][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testRunPolicyAsyncActionStepClusterStateChangeIgnored] before test
  1> [2019-03-13T21:48:31,417][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testRunPolicyAsyncActionStepClusterStateChangeIgnored] after test
  1> [2019-03-13T21:48:31,425][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testRemovePolicyForIndexIndexInUnsafe] before test
  2>         at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
  2>         at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
  2>         at java.util.concurrent.PriorityBlockingQueue.take(PriorityBlockingQueue.java:550)
  2>         at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
  2>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
  2>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
  2>         at java.lang.Thread.run(Thread.java:748)
  2>    3) Thread[id=48, name=elasticsearch[name][[timer]], state=TIMED_WAITING, group=TGRP-IndexLifecycleRunnerTests]
  2>         at java.lang.Thread.sleep(Native Method)
  2>         at org.elasticsearch.threadpool.ThreadPool$CachedTimeThread.run(ThreadPool.java:574)
  1> [2019-03-13T21:48:31,432][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testRemovePolicyForIndexIndexInUnsafe] after test
  1> [2019-03-13T21:48:31,436][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testRunPolicyClusterStateActionStep] before test
  2> Mar 13, 2019 9:48:42 PM com.carrotsearch.randomizedtesting.ThreadLeakControl tryToInterruptAll
  1> [2019-03-13T21:48:31,484][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testRunPolicyClusterStateActionStep] after test
  2> INFO: Starting to interrupt leaked threads:
  1> [2019-03-13T21:48:31,489][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testRunStateChangePolicyWithNextStep] before test
  2>    1) Thread[id=50, name=elasticsearch[test][masterService#updateTask][T#1], state=WAITING, group=TGRP-IndexLifecycleRunnerTests]
  1> [2019-03-13T21:48:31,891][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testRunStateChangePolicyWithNextStep] --> returning mock step
  2>    2) Thread[id=49, name=elasticsearch[test][clusterApplierService#updateTask][T#1], state=WAITING, group=TGRP-IndexLifecycleRunnerTests]
  2>    3) Thread[id=48, name=elasticsearch[name][[timer]], state=TIMED_WAITING, group=TGRP-IndexLifecycleRunnerTests]
  2> Mar 13, 2019 9:48:45 PM com.carrotsearch.randomizedtesting.ThreadLeakControl tryToInterruptAll
  2> SEVERE: There are still zombie threads that couldn't be terminated:
  2>    1) Thread[id=50, name=elasticsearch[test][masterService#updateTask][T#1], state=WAITING, group=TGRP-IndexLifecycleRunnerTests]
  1> [2019-03-13T21:48:31,904][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [test] --> returning mock step
  2>         at sun.misc.Unsafe.park(Native Method)
  2>         at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
  2>         at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
  2>         at java.util.concurrent.PriorityBlockingQueue.take(PriorityBlockingQueue.java:550)
  2>         at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
  2>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
  2>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
  2>         at java.lang.Thread.run(Thread.java:748)
  1> [2019-03-13T21:48:31,905][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [test] --> returning mock step
  1> [2019-03-13T21:48:31,937][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testRunStateChangePolicyWithNextStep] after test
  2>    2) Thread[id=49, name=elasticsearch[test][clusterApplierService#updateTask][T#1], state=WAITING, group=TGRP-IndexLifecycleRunnerTests]
  2>         at sun.misc.Unsafe.park(Native Method)
  1> [2019-03-13T21:48:31,940][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testRunPeriodicStep] before test
  2>         at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
  1> [2019-03-13T21:48:31,961][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testRunPeriodicStep] --> state: cluster uuid: _na_
  1> version: 0
  1> state uuid: CVRS1_AvTU6Sd516p5ugkA
  2>         at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
  1> from_diff: false
  1> meta data version: 0
  2>         at java.util.concurrent.PriorityBlockingQueue.take(PriorityBlockingQueue.java:550)
  1>    [test]: v[2], mv[1], sv[1]
  1>       0: p_term [0], isa_ids []
  1> metadata customs:
  1>    index_lifecycle: {
  1>   "policies" : { },
  1>   "operation_mode" : "RUNNING"
  1> }   index-graveyard: IndexGraveyard[[]]
  2>         at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
  2>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
  1> nodes: 
  1>    {node}{cI6DToX7R16S7zmeHNZk7Q}{0.0.0.0}{0.0.0.0:2}, local, master
  1> routing_table (version 0):
  2>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
  1> routing_nodes:
  1> -----node_id[node][V]
  1> ---- unassigned
  1> [2019-03-13T21:48:31,963][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testRunPeriodicStep] --> returning mock step
  2>         at java.lang.Thread.run(Thread.java:748)
  1> [2019-03-13T21:48:31,967][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testRunPeriodicStep] after test
  2> REPRODUCE WITH: ./gradlew :x-pack:plugin:ilm:unitTest -Dtests.seed=5254DAB622BCB27F -Dtests.class=org.elasticsearch.xpack.indexlifecycle.IndexLifecycleRunnerTests -Dtests.security.manager=true -Dtests.locale=en-US -Dtests.timezone=UTC -Dcompiler.java=11 -Druntime.java=8
  1> [2019-03-13T21:48:31,971][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testValidatedMoveClusterStateToNextStepInvalidCurrentStep] before test
  2> REPRODUCE WITH: ./gradlew :x-pack:plugin:ilm:unitTest -Dtests.seed=5254DAB622BCB27F -Dtests.class=org.elasticsearch.xpack.indexlifecycle.IndexLifecycleRunnerTests -Dtests.security.manager=true -Dtests.locale=en-US -Dtests.timezone=UTC -Dcompiler.java=11 -Druntime.java=8
  1> [2019-03-13T21:48:31,987][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testValidatedMoveClusterStateToNextStepInvalidCurrentStep] after test
  2> NOTE: test params are: codec=Asserting(Lucene70): {}, docValues:{}, maxPointsInLeafNode=104, maxMBSortInHeap=6.713483380503481, sim=RandomSimilarity(queryNorm=false): {}, locale=sr-CS, timezone=Etc/Universal
  1> [2019-03-13T21:48:31,992][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testMoveClusterStateToFailedStepInvalidPolicySetting] before test
  2> NOTE: Linux 4.1.12-112.16.7.el7uek.x86_64 amd64/Oracle Corporation 1.8.0_202 (64-bit)/cpus=16,threads=3,free=452897456,total=514850816
  1> [2019-03-13T21:48:32,008][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testMoveClusterStateToFailedStepInvalidPolicySetting] after test
  2> NOTE: All tests run in this JVM: [IndexLifecycleRunnerTests]
  1> [2019-03-13T21:48:32,012][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testSuccessfulValidatedMoveClusterStateToNextStep] before test
  1> [2019-03-13T21:48:32,027][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testSuccessfulValidatedMoveClusterStateToNextStep] after test
  1> [2019-03-13T21:48:32,031][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testRunStateChangePolicyWithNoNextStep] before test
  1> [2019-03-13T21:48:32,058][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testRunStateChangePolicyWithNoNextStep] after test
  1> [2019-03-13T21:48:32,062][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testRunPolicyAsyncWaitStepClusterStateChangeIgnored] before test
  1> [2019-03-13T21:48:32,068][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testRunPolicyAsyncWaitStepClusterStateChangeIgnored] after test
  1> [2019-03-13T21:48:32,071][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testIsReadyToTransition] before test
  1> [2019-03-13T21:48:32,083][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testIsReadyToTransition] after test
  1> [2019-03-13T21:48:32,086][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testMoveClusterStateToFailedStepIndexNotFound] before test
  1> [2019-03-13T21:48:32,093][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testMoveClusterStateToFailedStepIndexNotFound] after test
  1> [2019-03-13T21:48:32,097][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testRunStateChangePolicyWithAsyncActionNextStep] before test
  1> [2019-03-13T21:48:32,103][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testRunStateChangePolicyWithAsyncActionNextStep] --> state: cluster uuid: _na_
  1> version: 0
  1> state uuid: L4aIgnW8Skevzwn-5U90Ow
  1> from_diff: false
  1> meta data version: 0
  1>    [test]: v[2], mv[1], sv[1]
  1>       0: p_term [0], isa_ids []
  1> metadata customs:
  1>    index_lifecycle: {
  1>   "policies" : { },
  1>   "operation_mode" : "RUNNING"
  1> }   index-graveyard: IndexGraveyard[[]]
  1> nodes: 
  1>    {node}{QpeJujksT6SEQZXzf7h5jg}{0.0.0.0}{0.0.0.0:4}, local, master
  1> routing_table (version 0):
  1> routing_nodes:
  1> -----node_id[node][V]
  1> ---- unassigned
  1> [2019-03-13T21:48:32,105][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testRunStateChangePolicyWithAsyncActionNextStep] --> returning mock step
  1> [2019-03-13T21:48:32,107][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [test] --> returning mock step
  1> [2019-03-13T21:48:32,115][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [test] --> returning mock step
  1> [2019-03-13T21:48:32,117][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [test] --> returning mock step
  1> [2019-03-13T21:48:37,127][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testRunStateChangePolicyWithAsyncActionNextStep] after test
FAILURE 5.04s J5 | IndexLifecycleRunnerTests.testRunStateChangePolicyWithAsyncActionNextStep <<< FAILURES!
   > Throwable #1: java.lang.AssertionError: expected latch to be counted down after 5s, but was not
   > Expected: is <true>
   >      but: was <false>
   > 	at __randomizedtesting.SeedInfo.seed([5254DAB622BCB27F:EB19404E8EAA535F]:0)
   > 	at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
   > 	at org.elasticsearch.test.hamcrest.ElasticsearchAssertions.awaitLatch(ElasticsearchAssertions.java:711)
   > 	at org.elasticsearch.xpack.indexlifecycle.IndexLifecycleRunnerTests.testRunStateChangePolicyWithAsyncActionNextStep(IndexLifecycleRunnerTests.java:397)
   > 	at java.lang.Thread.run(Thread.java:748)
  1> [2019-03-13T21:48:37,137][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testMoveClusterStateToFailedNotOnError] before test
  1> [2019-03-13T21:48:37,140][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testMoveClusterStateToFailedNotOnError] after test
  1> [2019-03-13T21:48:37,142][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testMoveClusterStateToNextStepSameAction] before test
  1> [2019-03-13T21:48:37,145][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testMoveClusterStateToNextStepSameAction] after test
  1> [2019-03-13T21:48:37,147][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testRemovePolicyWithIndexingComplete] before test
  1> [2019-03-13T21:48:37,149][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testRemovePolicyWithIndexingComplete] after test
  1> [2019-03-13T21:48:37,151][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testGetCurrentStep] before test
  1> [2019-03-13T21:48:37,340][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testGetCurrentStep] after test
  1> [2019-03-13T21:48:37,343][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testRunPolicyThatDoesntExist] before test
  1> [2019-03-13T21:48:37,348][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testRunPolicyThatDoesntExist] after test
  1> [2019-03-13T21:48:37,350][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testGetCurrentStepKey] before test
  1> [2019-03-13T21:48:37,353][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testGetCurrentStepKey] after test
  1> [2019-03-13T21:48:37,355][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testMoveClusterStateToFailedStepWithUnknownStep] before test
  1> [2019-03-13T21:48:37,358][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testMoveClusterStateToFailedStepWithUnknownStep] after test
  1> [2019-03-13T21:48:37,360][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testValidatedMoveClusterStateToNextStepWithoutPolicy] before test
  1> [2019-03-13T21:48:37,363][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testValidatedMoveClusterStateToNextStepWithoutPolicy] after test
  1> [2019-03-13T21:48:37,365][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testRemovePolicyForIndexNoCurrentPolicy] before test
  1> [2019-03-13T21:48:37,367][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testRemovePolicyForIndexNoCurrentPolicy] after test
  1> [2019-03-13T21:48:37,369][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testMoveClusterStateToNextStep] before test
  1> [2019-03-13T21:48:37,373][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testMoveClusterStateToNextStep] after test
  1> [2019-03-13T21:48:37,375][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testValidatedMoveClusterStateToNextStepInvalidNextStep] before test
  1> [2019-03-13T21:48:37,377][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testValidatedMoveClusterStateToNextStepInvalidNextStep] after test
  1> [2019-03-13T21:48:37,380][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testRunPolicyErrorStep] before test
  1> [2019-03-13T21:48:37,382][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testRunPolicyErrorStep] --> retrieving step {"phase":"phase","action":"action","name":"ERROR"}
  1> [2019-03-13T21:48:37,382][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testRunPolicyErrorStep] after test
  1> [2019-03-13T21:48:37,384][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testAddStepInfoToClusterState] before test
  1> [2019-03-13T21:48:37,387][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testAddStepInfoToClusterState] after test
  1> [2019-03-13T21:48:37,389][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testRunAsyncActionDoesNotRun] before test
  1> [2019-03-13T21:48:37,394][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testRunAsyncActionDoesNotRun] after test
  1> [2019-03-13T21:48:37,396][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testRemovePolicyForIndex] before test
  1> [2019-03-13T21:48:37,398][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testRemovePolicyForIndex] after test
  1> [2019-03-13T21:48:37,400][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testRemovePolicyForIndexIndexDoesntExist] before test
  1> [2019-03-13T21:48:37,402][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testRemovePolicyForIndexIndexDoesntExist] after test
  1> [2019-03-13T21:48:37,404][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testRunPolicyClusterStateWaitStep] before test
  1> [2019-03-13T21:48:37,406][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testRunPolicyClusterStateWaitStep] after test
  1> [2019-03-13T21:48:37,408][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testRunPolicyTerminalPolicyStep] before test
  1> [2019-03-13T21:48:37,409][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testRunPolicyTerminalPolicyStep] after test
ERROR   0.00s J5 | IndexLifecycleRunnerTests (suite) <<< FAILURES!
   > Throwable #1: com.carrotsearch.randomizedtesting.ThreadLeakError: 3 threads leaked from SUITE scope at org.elasticsearch.xpack.indexlifecycle.IndexLifecycleRunnerTests: 
   >    1) Thread[id=50, name=elasticsearch[test][masterService#updateTask][T#1], state=WAITING, group=TGRP-IndexLifecycleRunnerTests]
   >         at sun.misc.Unsafe.park(Native Method)
   >         at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
   >         at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
   >         at java.util.concurrent.PriorityBlockingQueue.take(PriorityBlockingQueue.java:550)
   >         at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
   >         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
   >         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   >         at java.lang.Thread.run(Thread.java:748)
   >    2) Thread[id=49, name=elasticsearch[test][clusterApplierService#updateTask][T#1], state=WAITING, group=TGRP-IndexLifecycleRunnerTests]
   >         at sun.misc.Unsafe.park(Native Method)
   >         at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
   >         at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
   >         at java.util.concurrent.PriorityBlockingQueue.take(PriorityBlockingQueue.java:550)
   >         at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
   >         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
   >         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   >         at java.lang.Thread.run(Thread.java:748)
   >    3) Thread[id=48, name=elasticsearch[name][[timer]], state=TIMED_WAITING, group=TGRP-IndexLifecycleRunnerTests]
   >         at java.lang.Thread.sleep(Native Method)
   >         at org.elasticsearch.threadpool.ThreadPool$CachedTimeThread.run(ThreadPool.java:574)
   > 	at __randomizedtesting.SeedInfo.seed([5254DAB622BCB27F]:0)Throwable #2: com.carrotsearch.randomizedtesting.ThreadLeakError: There are still zombie threads that couldn't be terminated:
   >    1) Thread[id=50, name=elasticsearch[test][masterService#updateTask][T#1], state=WAITING, group=TGRP-IndexLifecycleRunnerTests]
   >         at sun.misc.Unsafe.park(Native Method)
   >         at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
   >         at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
   >         at java.util.concurrent.PriorityBlockingQueue.take(PriorityBlockingQueue.java:550)
   >         at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
   >         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
   >         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   >         at java.lang.Thread.run(Thread.java:748)
   >    2) Thread[id=49, name=elasticsearch[test][clusterApplierService#updateTask][T#1], state=WAITING, group=TGRP-IndexLifecycleRunnerTests]
   >         at sun.misc.Unsafe.park(Native Method)
   >         at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
   >         at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
   >         at java.util.concurrent.PriorityBlockingQueue.take(PriorityBlockingQueue.java:550)
   >         at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
   >         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
   >         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   >         at java.lang.Thread.run(Thread.java:748)
   > 	at __randomizedtesting.SeedInfo.seed([5254DAB622BCB27F]:0)
Completed [15/16] on J5 in 18.37s, 34 tests, 1 failure, 2 errors <<< FAILURES!

Suite: org.elasticsearch.xpack.indexlifecycle.PolicyStepsRegistryTests
Completed [16/16] on J5 in 0.01s, 10 tests, 10 skipped

[ant:junit4] JVM J5: stderr was not empty, see: /var/lib/jenkins/workspace/elastic+elasticsearch+6.7+multijob-unix-compatibility/os/oraclelinux-7/x-pack/plugin/ilm/build/testrun/unitTest/temp/junit4-J5-20190313_214826_29116867414333902381993.syserr

> Task :x-pack:plugin:ilm:unitTest FAILED

FAILURE: Build failed with an exception.
Tests with failures:

* What went wrong:
Execution failed for task ':x-pack:plugin:ilm:unitTest'.
> There were test failures: 16 suites, 104 tests, 2 suite-level errors, 1 failure, 10 ignored [seed: 5254DAB622BCB27F]

* Try:
  - org.elasticsearch.xpack.indexlifecycle.IndexLifecycleRunnerTests.testRunStateChangePolicyWithAsyncActionNextStep
Run with --stacktrace option to get the stack trace. Run with --debug option to get more log output. Run with --scan to get full insights.

  - org.elasticsearch.xpack.indexlifecycle.IndexLifecycleRunnerTests (suite)

Previously we only set the latch countdown with `nextStep.setLatch` after the cluster state change has already been counted down. However, it's possible execution could have already started, causing the latch to be missed when the `MockAsyncActionStep` is being executed. This moves the latch setting to be before the call to `runPolicyAfterStateChange`, which means it is always available when the `MockAsyncActionStep` is executed. I was able to reproduce the failure every 30-40 runs before this change. With this change, running 2000+ times the test passes. Resolves elastic#40018

…40707) Previously we only set the latch countdown with `nextStep.setLatch` after the cluster state change has already been counted down. However, it's possible execution could have already started, causing the latch to be missed when the `MockAsyncActionStep` is being executed. This moves the latch setting to be before the call to `runPolicyAfterStateChange`, which means it is always available when the `MockAsyncActionStep` is executed. I was able to reproduce the failure every 30-40 runs before this change. With this change, running 2000+ times the test passes. Resolves #40018

…lastic#40707) Previously we only set the latch countdown with `nextStep.setLatch` after the cluster state change has already been counted down. However, it's possible execution could have already started, causing the latch to be missed when the `MockAsyncActionStep` is being executed. This moves the latch setting to be before the call to `runPolicyAfterStateChange`, which means it is always available when the `MockAsyncActionStep` is executed. I was able to reproduce the failure every 30-40 runs before this change. With this change, running 2000+ times the test passes. Resolves elastic#40018

bizybot added >test-failure Triaged test failures from CI :Data Management/ILM+SLM Index and Snapshot lifecycle management labels Mar 13, 2019

dakrone self-assigned this Apr 1, 2019

dakrone mentioned this issue Apr 1, 2019

Fix testRunStateChangePolicyWithAsyncActionNextStep race condition #40707

Merged

dakrone closed this as completed in #40707 Apr 2, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI] IndexLifecycleRunnerTests.testRunStateChangePolicyWithAsyncActionNextStep failed #40018

[CI] IndexLifecycleRunnerTests.testRunStateChangePolicyWithAsyncActionNextStep failed #40018

bizybot commented Mar 13, 2019

elasticmachine commented Mar 13, 2019

talevy commented Mar 14, 2019 •

edited

Loading

[CI] IndexLifecycleRunnerTests.testRunStateChangePolicyWithAsyncActionNextStep failed #40018

[CI] IndexLifecycleRunnerTests.testRunStateChangePolicyWithAsyncActionNextStep failed #40018

Comments

bizybot commented Mar 13, 2019

elasticmachine commented Mar 13, 2019

talevy commented Mar 14, 2019 • edited Loading

talevy commented Mar 14, 2019 •

edited

Loading