Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] IndexLifecycleRunnerTests.testRunStateChangePolicyWithAsyncActionNextStep failed #40018

Closed
bizybot opened this issue Mar 13, 2019 · 2 comments · Fixed by #40707
Closed
Assignees
Labels
:Data Management/ILM+SLM Index and Snapshot lifecycle management >test-failure Triaged test failures from CI

Comments

@bizybot
Copy link
Contributor

bizybot commented Mar 13, 2019

The test IndexLifecycleRunnerTests.testRunStateChangePolicyWithAsyncActionNextStep failed in https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.7+multijob-unix-compatibility/os=oraclelinux-7/71/console

I could not reproduce this locally but with multiple iterations, it does reproduce sporadically.

The repro command was:

./gradlew :x-pack:plugin:ilm:unitTest \
-Dtests.seed=5254DAB622BCB27F \
-Dtests.class=org.elasticsearch.xpack.indexlifecycle.IndexLifecycleRunnerTests \
-Dtests.method="testRunStateChangePolicyWithAsyncActionNextStep" \
-Dtests.security.manager=true \
-Dtests.locale=sr-CS \
-Dtests.timezone=Etc/Universal \
-Dcompiler.java=11 \
-Druntime.java=8
@bizybot bizybot added >test-failure Triaged test failures from CI :Data Management/ILM+SLM Index and Snapshot lifecycle management labels Mar 13, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-features

@talevy
Copy link
Contributor

talevy commented Mar 14, 2019

From the stacktrace, the two lines that look fishy to me are:

Unable to lock JVM Memory: error=12, reason=Cannot allocate memory
...
  2> SEVERE: 3 threads leaked from SUITE scope at org.elasticsearch.xpack.indexlifecycle.IndexLifecycleRunnerTests: 
  1> [2019-03-13T21:48:28,386][WARN ][o.e.b.JNANatives         ] [[SUITE-IndexLifecycleRunnerTests-seed#[5254DAB622BCB27F]]] These can be adjusted by modifying /etc/security/limits.conf, for example: 
  2>    1) Thread[id=50, name=elasticsearch[test][masterService#updateTask][T#1], state=WAITING, group=TGRP-IndexLifecycleRunnerTests]
  2>         at sun.misc.Unsafe.park(Native Method)
  1> 	# allow user 'jenkins' mlockall
  1> 	jenkins soft memlock unlimited
  2>         at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
  1> 	jenkins hard memlock unlimited
failing suite stacktrace

[ant:junit4] JVM J3: stderr was not empty, see: /var/lib/jenkins/workspace/elastic+elasticsearch+6.7+multijob-unix-compatibility/os/oraclelinux-7/x-pack/plugin/ilm/build/testrun/unitTest/temp/junit4-J3-20190313_214826_2911127990975592378922.syserr
Suite: org.elasticsearch.xpack.indexlifecycle.IndexLifecycleRunnerTests
  1> [2019-03-13T21:48:28,380][WARN ][o.e.b.JNANatives         ] [[SUITE-IndexLifecycleRunnerTests-seed#[5254DAB622BCB27F]]] Unable to lock JVM Memory: error=12, reason=Cannot allocate memory
  2> REPRODUCE WITH: ./gradlew :x-pack:plugin:ilm:unitTest -Dtests.seed=5254DAB622BCB27F -Dtests.class=org.elasticsearch.xpack.indexlifecycle.IndexLifecycleRunnerTests -Dtests.method="testRunStateChangePolicyWithAsyncActionNextStep" -Dtests.security.manager=true -Dtests.locale=sr-CS -Dtests.timezone=Etc/Universal -Dcompiler.java=11 -Druntime.java=8
  2> NOTE: leaving temporary files on disk at: /var/lib/jenkins/workspace/elastic+elasticsearch+6.7+multijob-unix-compatibility/os/oraclelinux-7/x-pack/plugin/ilm/build/testrun/unitTest/J5/temp/org.elasticsearch.xpack.indexlifecycle.IndexLifecycleRunnerTests_5254DAB622BCB27F-001
  1> [2019-03-13T21:48:28,385][WARN ][o.e.b.JNANatives         ] [[SUITE-IndexLifecycleRunnerTests-seed#[5254DAB622BCB27F]]] This can result in part of the JVM being swapped out.
  2> Mar 13, 2019 9:48:37 PM com.carrotsearch.randomizedtesting.ThreadLeakControl checkThreadLeaks
  2> WARNING: Will linger awaiting termination of 3 leaked thread(s).
  2> Mar 13, 2019 9:48:42 PM com.carrotsearch.randomizedtesting.ThreadLeakControl checkThreadLeaks
  1> [2019-03-13T21:48:28,386][WARN ][o.e.b.JNANatives         ] [[SUITE-IndexLifecycleRunnerTests-seed#[5254DAB622BCB27F]]] Increase RLIMIT_MEMLOCK, soft limit: 65536, hard limit: 65536
  2> SEVERE: 3 threads leaked from SUITE scope at org.elasticsearch.xpack.indexlifecycle.IndexLifecycleRunnerTests: 
  1> [2019-03-13T21:48:28,386][WARN ][o.e.b.JNANatives         ] [[SUITE-IndexLifecycleRunnerTests-seed#[5254DAB622BCB27F]]] These can be adjusted by modifying /etc/security/limits.conf, for example: 
  2>    1) Thread[id=50, name=elasticsearch[test][masterService#updateTask][T#1], state=WAITING, group=TGRP-IndexLifecycleRunnerTests]
  2>         at sun.misc.Unsafe.park(Native Method)
  1> 	# allow user 'jenkins' mlockall
  1> 	jenkins soft memlock unlimited
  2>         at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
  1> 	jenkins hard memlock unlimited
  1> [2019-03-13T21:48:28,387][WARN ][o.e.b.JNANatives         ] [[SUITE-IndexLifecycleRunnerTests-seed#[5254DAB622BCB27F]]] If you are logged in interactively, you will have to re-login for the new limits to take effect.
  2>         at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
  2>         at java.util.concurrent.PriorityBlockingQueue.take(PriorityBlockingQueue.java:550)
  2>         at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
  2>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
  2>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
  1> [2019-03-13T21:48:30,320][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testMoveClusterStateToErrorStep] before test
  2>         at java.lang.Thread.run(Thread.java:748)
  1> [2019-03-13T21:48:30,952][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testMoveClusterStateToErrorStep] after test
  1> [2019-03-13T21:48:30,986][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testMoveClusterStateToNextStepSamePhase] before test
  2>    2) Thread[id=49, name=elasticsearch[test][clusterApplierService#updateTask][T#1], state=WAITING, group=TGRP-IndexLifecycleRunnerTests]
  1> [2019-03-13T21:48:31,006][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testMoveClusterStateToNextStepSamePhase] after test
  1> [2019-03-13T21:48:31,012][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testMoveClusterStateToFailedStep] before test
  2>         at sun.misc.Unsafe.park(Native Method)
  1> [2019-03-13T21:48:31,362][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testMoveClusterStateToFailedStep] after test
  1> [2019-03-13T21:48:31,368][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testRunPolicyAsyncActionStepClusterStateChangeIgnored] before test
  1> [2019-03-13T21:48:31,417][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testRunPolicyAsyncActionStepClusterStateChangeIgnored] after test
  1> [2019-03-13T21:48:31,425][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testRemovePolicyForIndexIndexInUnsafe] before test
  2>         at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
  2>         at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
  2>         at java.util.concurrent.PriorityBlockingQueue.take(PriorityBlockingQueue.java:550)
  2>         at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
  2>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
  2>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
  2>         at java.lang.Thread.run(Thread.java:748)
  2>    3) Thread[id=48, name=elasticsearch[name][[timer]], state=TIMED_WAITING, group=TGRP-IndexLifecycleRunnerTests]
  2>         at java.lang.Thread.sleep(Native Method)
  2>         at org.elasticsearch.threadpool.ThreadPool$CachedTimeThread.run(ThreadPool.java:574)
  1> [2019-03-13T21:48:31,432][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testRemovePolicyForIndexIndexInUnsafe] after test
  1> [2019-03-13T21:48:31,436][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testRunPolicyClusterStateActionStep] before test
  2> Mar 13, 2019 9:48:42 PM com.carrotsearch.randomizedtesting.ThreadLeakControl tryToInterruptAll
  1> [2019-03-13T21:48:31,484][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testRunPolicyClusterStateActionStep] after test
  2> INFO: Starting to interrupt leaked threads:
  1> [2019-03-13T21:48:31,489][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testRunStateChangePolicyWithNextStep] before test
  2>    1) Thread[id=50, name=elasticsearch[test][masterService#updateTask][T#1], state=WAITING, group=TGRP-IndexLifecycleRunnerTests]
  1> [2019-03-13T21:48:31,891][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testRunStateChangePolicyWithNextStep] --> returning mock step
  2>    2) Thread[id=49, name=elasticsearch[test][clusterApplierService#updateTask][T#1], state=WAITING, group=TGRP-IndexLifecycleRunnerTests]
  2>    3) Thread[id=48, name=elasticsearch[name][[timer]], state=TIMED_WAITING, group=TGRP-IndexLifecycleRunnerTests]
  2> Mar 13, 2019 9:48:45 PM com.carrotsearch.randomizedtesting.ThreadLeakControl tryToInterruptAll
  2> SEVERE: There are still zombie threads that couldn't be terminated:
  2>    1) Thread[id=50, name=elasticsearch[test][masterService#updateTask][T#1], state=WAITING, group=TGRP-IndexLifecycleRunnerTests]
  1> [2019-03-13T21:48:31,904][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [test] --> returning mock step
  2>         at sun.misc.Unsafe.park(Native Method)
  2>         at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
  2>         at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
  2>         at java.util.concurrent.PriorityBlockingQueue.take(PriorityBlockingQueue.java:550)
  2>         at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
  2>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
  2>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
  2>         at java.lang.Thread.run(Thread.java:748)
  1> [2019-03-13T21:48:31,905][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [test] --> returning mock step
  1> [2019-03-13T21:48:31,937][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testRunStateChangePolicyWithNextStep] after test
  2>    2) Thread[id=49, name=elasticsearch[test][clusterApplierService#updateTask][T#1], state=WAITING, group=TGRP-IndexLifecycleRunnerTests]
  2>         at sun.misc.Unsafe.park(Native Method)
  1> [2019-03-13T21:48:31,940][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testRunPeriodicStep] before test
  2>         at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
  1> [2019-03-13T21:48:31,961][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testRunPeriodicStep] --> state: cluster uuid: _na_
  1> version: 0
  1> state uuid: CVRS1_AvTU6Sd516p5ugkA
  2>         at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
  1> from_diff: false
  1> meta data version: 0
  2>         at java.util.concurrent.PriorityBlockingQueue.take(PriorityBlockingQueue.java:550)
  1>    [test]: v[2], mv[1], sv[1]
  1>       0: p_term [0], isa_ids []
  1> metadata customs:
  1>    index_lifecycle: {
  1>   "policies" : { },
  1>   "operation_mode" : "RUNNING"
  1> }   index-graveyard: IndexGraveyard[[]]
  2>         at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
  2>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
  1> nodes: 
  1>    {node}{cI6DToX7R16S7zmeHNZk7Q}{0.0.0.0}{0.0.0.0:2}, local, master
  1> routing_table (version 0):
  2>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
  1> routing_nodes:
  1> -----node_id[node][V]
  1> ---- unassigned
  1> [2019-03-13T21:48:31,963][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testRunPeriodicStep] --> returning mock step
  2>         at java.lang.Thread.run(Thread.java:748)
  1> [2019-03-13T21:48:31,967][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testRunPeriodicStep] after test
  2> REPRODUCE WITH: ./gradlew :x-pack:plugin:ilm:unitTest -Dtests.seed=5254DAB622BCB27F -Dtests.class=org.elasticsearch.xpack.indexlifecycle.IndexLifecycleRunnerTests -Dtests.security.manager=true -Dtests.locale=en-US -Dtests.timezone=UTC -Dcompiler.java=11 -Druntime.java=8
  1> [2019-03-13T21:48:31,971][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testValidatedMoveClusterStateToNextStepInvalidCurrentStep] before test
  2> REPRODUCE WITH: ./gradlew :x-pack:plugin:ilm:unitTest -Dtests.seed=5254DAB622BCB27F -Dtests.class=org.elasticsearch.xpack.indexlifecycle.IndexLifecycleRunnerTests -Dtests.security.manager=true -Dtests.locale=en-US -Dtests.timezone=UTC -Dcompiler.java=11 -Druntime.java=8
  1> [2019-03-13T21:48:31,987][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testValidatedMoveClusterStateToNextStepInvalidCurrentStep] after test
  2> NOTE: test params are: codec=Asserting(Lucene70): {}, docValues:{}, maxPointsInLeafNode=104, maxMBSortInHeap=6.713483380503481, sim=RandomSimilarity(queryNorm=false): {}, locale=sr-CS, timezone=Etc/Universal
  1> [2019-03-13T21:48:31,992][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testMoveClusterStateToFailedStepInvalidPolicySetting] before test
  2> NOTE: Linux 4.1.12-112.16.7.el7uek.x86_64 amd64/Oracle Corporation 1.8.0_202 (64-bit)/cpus=16,threads=3,free=452897456,total=514850816
  1> [2019-03-13T21:48:32,008][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testMoveClusterStateToFailedStepInvalidPolicySetting] after test
  2> NOTE: All tests run in this JVM: [IndexLifecycleRunnerTests]
  1> [2019-03-13T21:48:32,012][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testSuccessfulValidatedMoveClusterStateToNextStep] before test
  1> [2019-03-13T21:48:32,027][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testSuccessfulValidatedMoveClusterStateToNextStep] after test
  1> [2019-03-13T21:48:32,031][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testRunStateChangePolicyWithNoNextStep] before test
  1> [2019-03-13T21:48:32,058][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testRunStateChangePolicyWithNoNextStep] after test
  1> [2019-03-13T21:48:32,062][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testRunPolicyAsyncWaitStepClusterStateChangeIgnored] before test
  1> [2019-03-13T21:48:32,068][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testRunPolicyAsyncWaitStepClusterStateChangeIgnored] after test
  1> [2019-03-13T21:48:32,071][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testIsReadyToTransition] before test
  1> [2019-03-13T21:48:32,083][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testIsReadyToTransition] after test
  1> [2019-03-13T21:48:32,086][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testMoveClusterStateToFailedStepIndexNotFound] before test
  1> [2019-03-13T21:48:32,093][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testMoveClusterStateToFailedStepIndexNotFound] after test
  1> [2019-03-13T21:48:32,097][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testRunStateChangePolicyWithAsyncActionNextStep] before test
  1> [2019-03-13T21:48:32,103][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testRunStateChangePolicyWithAsyncActionNextStep] --> state: cluster uuid: _na_
  1> version: 0
  1> state uuid: L4aIgnW8Skevzwn-5U90Ow
  1> from_diff: false
  1> meta data version: 0
  1>    [test]: v[2], mv[1], sv[1]
  1>       0: p_term [0], isa_ids []
  1> metadata customs:
  1>    index_lifecycle: {
  1>   "policies" : { },
  1>   "operation_mode" : "RUNNING"
  1> }   index-graveyard: IndexGraveyard[[]]
  1> nodes: 
  1>    {node}{QpeJujksT6SEQZXzf7h5jg}{0.0.0.0}{0.0.0.0:4}, local, master
  1> routing_table (version 0):
  1> routing_nodes:
  1> -----node_id[node][V]
  1> ---- unassigned
  1> [2019-03-13T21:48:32,105][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testRunStateChangePolicyWithAsyncActionNextStep] --> returning mock step
  1> [2019-03-13T21:48:32,107][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [test] --> returning mock step
  1> [2019-03-13T21:48:32,115][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [test] --> returning mock step
  1> [2019-03-13T21:48:32,117][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [test] --> returning mock step
  1> [2019-03-13T21:48:37,127][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testRunStateChangePolicyWithAsyncActionNextStep] after test
FAILURE 5.04s J5 | IndexLifecycleRunnerTests.testRunStateChangePolicyWithAsyncActionNextStep <<< FAILURES!
   > Throwable #1: java.lang.AssertionError: expected latch to be counted down after 5s, but was not
   > Expected: is <true>
   >      but: was <false>
   > 	at __randomizedtesting.SeedInfo.seed([5254DAB622BCB27F:EB19404E8EAA535F]:0)
   > 	at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
   > 	at org.elasticsearch.test.hamcrest.ElasticsearchAssertions.awaitLatch(ElasticsearchAssertions.java:711)
   > 	at org.elasticsearch.xpack.indexlifecycle.IndexLifecycleRunnerTests.testRunStateChangePolicyWithAsyncActionNextStep(IndexLifecycleRunnerTests.java:397)
   > 	at java.lang.Thread.run(Thread.java:748)
  1> [2019-03-13T21:48:37,137][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testMoveClusterStateToFailedNotOnError] before test
  1> [2019-03-13T21:48:37,140][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testMoveClusterStateToFailedNotOnError] after test
  1> [2019-03-13T21:48:37,142][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testMoveClusterStateToNextStepSameAction] before test
  1> [2019-03-13T21:48:37,145][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testMoveClusterStateToNextStepSameAction] after test
  1> [2019-03-13T21:48:37,147][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testRemovePolicyWithIndexingComplete] before test
  1> [2019-03-13T21:48:37,149][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testRemovePolicyWithIndexingComplete] after test
  1> [2019-03-13T21:48:37,151][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testGetCurrentStep] before test
  1> [2019-03-13T21:48:37,340][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testGetCurrentStep] after test
  1> [2019-03-13T21:48:37,343][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testRunPolicyThatDoesntExist] before test
  1> [2019-03-13T21:48:37,348][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testRunPolicyThatDoesntExist] after test
  1> [2019-03-13T21:48:37,350][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testGetCurrentStepKey] before test
  1> [2019-03-13T21:48:37,353][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testGetCurrentStepKey] after test
  1> [2019-03-13T21:48:37,355][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testMoveClusterStateToFailedStepWithUnknownStep] before test
  1> [2019-03-13T21:48:37,358][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testMoveClusterStateToFailedStepWithUnknownStep] after test
  1> [2019-03-13T21:48:37,360][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testValidatedMoveClusterStateToNextStepWithoutPolicy] before test
  1> [2019-03-13T21:48:37,363][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testValidatedMoveClusterStateToNextStepWithoutPolicy] after test
  1> [2019-03-13T21:48:37,365][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testRemovePolicyForIndexNoCurrentPolicy] before test
  1> [2019-03-13T21:48:37,367][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testRemovePolicyForIndexNoCurrentPolicy] after test
  1> [2019-03-13T21:48:37,369][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testMoveClusterStateToNextStep] before test
  1> [2019-03-13T21:48:37,373][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testMoveClusterStateToNextStep] after test
  1> [2019-03-13T21:48:37,375][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testValidatedMoveClusterStateToNextStepInvalidNextStep] before test
  1> [2019-03-13T21:48:37,377][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testValidatedMoveClusterStateToNextStepInvalidNextStep] after test
  1> [2019-03-13T21:48:37,380][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testRunPolicyErrorStep] before test
  1> [2019-03-13T21:48:37,382][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testRunPolicyErrorStep] --> retrieving step {"phase":"phase","action":"action","name":"ERROR"}
  1> [2019-03-13T21:48:37,382][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testRunPolicyErrorStep] after test
  1> [2019-03-13T21:48:37,384][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testAddStepInfoToClusterState] before test
  1> [2019-03-13T21:48:37,387][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testAddStepInfoToClusterState] after test
  1> [2019-03-13T21:48:37,389][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testRunAsyncActionDoesNotRun] before test
  1> [2019-03-13T21:48:37,394][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testRunAsyncActionDoesNotRun] after test
  1> [2019-03-13T21:48:37,396][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testRemovePolicyForIndex] before test
  1> [2019-03-13T21:48:37,398][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testRemovePolicyForIndex] after test
  1> [2019-03-13T21:48:37,400][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testRemovePolicyForIndexIndexDoesntExist] before test
  1> [2019-03-13T21:48:37,402][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testRemovePolicyForIndexIndexDoesntExist] after test
  1> [2019-03-13T21:48:37,404][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testRunPolicyClusterStateWaitStep] before test
  1> [2019-03-13T21:48:37,406][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testRunPolicyClusterStateWaitStep] after test
  1> [2019-03-13T21:48:37,408][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testRunPolicyTerminalPolicyStep] before test
  1> [2019-03-13T21:48:37,409][INFO ][o.e.x.i.IndexLifecycleRunnerTests] [testRunPolicyTerminalPolicyStep] after test
ERROR   0.00s J5 | IndexLifecycleRunnerTests (suite) <<< FAILURES!
   > Throwable #1: com.carrotsearch.randomizedtesting.ThreadLeakError: 3 threads leaked from SUITE scope at org.elasticsearch.xpack.indexlifecycle.IndexLifecycleRunnerTests: 
   >    1) Thread[id=50, name=elasticsearch[test][masterService#updateTask][T#1], state=WAITING, group=TGRP-IndexLifecycleRunnerTests]
   >         at sun.misc.Unsafe.park(Native Method)
   >         at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
   >         at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
   >         at java.util.concurrent.PriorityBlockingQueue.take(PriorityBlockingQueue.java:550)
   >         at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
   >         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
   >         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   >         at java.lang.Thread.run(Thread.java:748)
   >    2) Thread[id=49, name=elasticsearch[test][clusterApplierService#updateTask][T#1], state=WAITING, group=TGRP-IndexLifecycleRunnerTests]
   >         at sun.misc.Unsafe.park(Native Method)
   >         at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
   >         at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
   >         at java.util.concurrent.PriorityBlockingQueue.take(PriorityBlockingQueue.java:550)
   >         at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
   >         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
   >         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   >         at java.lang.Thread.run(Thread.java:748)
   >    3) Thread[id=48, name=elasticsearch[name][[timer]], state=TIMED_WAITING, group=TGRP-IndexLifecycleRunnerTests]
   >         at java.lang.Thread.sleep(Native Method)
   >         at org.elasticsearch.threadpool.ThreadPool$CachedTimeThread.run(ThreadPool.java:574)
   > 	at __randomizedtesting.SeedInfo.seed([5254DAB622BCB27F]:0)Throwable #2: com.carrotsearch.randomizedtesting.ThreadLeakError: There are still zombie threads that couldn't be terminated:
   >    1) Thread[id=50, name=elasticsearch[test][masterService#updateTask][T#1], state=WAITING, group=TGRP-IndexLifecycleRunnerTests]
   >         at sun.misc.Unsafe.park(Native Method)
   >         at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
   >         at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
   >         at java.util.concurrent.PriorityBlockingQueue.take(PriorityBlockingQueue.java:550)
   >         at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
   >         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
   >         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   >         at java.lang.Thread.run(Thread.java:748)
   >    2) Thread[id=49, name=elasticsearch[test][clusterApplierService#updateTask][T#1], state=WAITING, group=TGRP-IndexLifecycleRunnerTests]
   >         at sun.misc.Unsafe.park(Native Method)
   >         at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
   >         at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
   >         at java.util.concurrent.PriorityBlockingQueue.take(PriorityBlockingQueue.java:550)
   >         at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
   >         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
   >         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   >         at java.lang.Thread.run(Thread.java:748)
   > 	at __randomizedtesting.SeedInfo.seed([5254DAB622BCB27F]:0)
Completed [15/16] on J5 in 18.37s, 34 tests, 1 failure, 2 errors <<< FAILURES!

Suite: org.elasticsearch.xpack.indexlifecycle.PolicyStepsRegistryTests
Completed [16/16] on J5 in 0.01s, 10 tests, 10 skipped

[ant:junit4] JVM J5: stderr was not empty, see: /var/lib/jenkins/workspace/elastic+elasticsearch+6.7+multijob-unix-compatibility/os/oraclelinux-7/x-pack/plugin/ilm/build/testrun/unitTest/temp/junit4-J5-20190313_214826_29116867414333902381993.syserr

> Task :x-pack:plugin:ilm:unitTest FAILED

FAILURE: Build failed with an exception.
Tests with failures:

* What went wrong:
Execution failed for task ':x-pack:plugin:ilm:unitTest'.
> There were test failures: 16 suites, 104 tests, 2 suite-level errors, 1 failure, 10 ignored [seed: 5254DAB622BCB27F]

* Try:
  - org.elasticsearch.xpack.indexlifecycle.IndexLifecycleRunnerTests.testRunStateChangePolicyWithAsyncActionNextStep
Run with --stacktrace option to get the stack trace. Run with --debug option to get more log output. Run with --scan to get full insights.

  - org.elasticsearch.xpack.indexlifecycle.IndexLifecycleRunnerTests (suite)

@dakrone dakrone self-assigned this Apr 1, 2019
dakrone added a commit to dakrone/elasticsearch that referenced this issue Apr 1, 2019
Previously we only set the latch countdown with `nextStep.setLatch` after the
cluster state change has already been counted down. However, it's possible
execution could have already started, causing the latch to be missed when the
`MockAsyncActionStep` is being executed.

This moves the latch setting to be before the call to
`runPolicyAfterStateChange`, which means it is always available when the
`MockAsyncActionStep` is executed.

I was able to reproduce the failure every 30-40 runs before this change. With
this change, running 2000+ times the test passes.

Resolves elastic#40018
dakrone added a commit that referenced this issue Apr 2, 2019
…40707)

Previously we only set the latch countdown with `nextStep.setLatch` after the
cluster state change has already been counted down. However, it's possible
execution could have already started, causing the latch to be missed when the
`MockAsyncActionStep` is being executed.

This moves the latch setting to be before the call to
`runPolicyAfterStateChange`, which means it is always available when the
`MockAsyncActionStep` is executed.

I was able to reproduce the failure every 30-40 runs before this change. With
this change, running 2000+ times the test passes.

Resolves #40018
dakrone added a commit that referenced this issue Apr 2, 2019
…40707)

Previously we only set the latch countdown with `nextStep.setLatch` after the
cluster state change has already been counted down. However, it's possible
execution could have already started, causing the latch to be missed when the
`MockAsyncActionStep` is being executed.

This moves the latch setting to be before the call to
`runPolicyAfterStateChange`, which means it is always available when the
`MockAsyncActionStep` is executed.

I was able to reproduce the failure every 30-40 runs before this change. With
this change, running 2000+ times the test passes.

Resolves #40018
dakrone added a commit that referenced this issue Apr 2, 2019
…40707)

Previously we only set the latch countdown with `nextStep.setLatch` after the
cluster state change has already been counted down. However, it's possible
execution could have already started, causing the latch to be missed when the
`MockAsyncActionStep` is being executed.

This moves the latch setting to be before the call to
`runPolicyAfterStateChange`, which means it is always available when the
`MockAsyncActionStep` is executed.

I was able to reproduce the failure every 30-40 runs before this change. With
this change, running 2000+ times the test passes.

Resolves #40018
dakrone added a commit that referenced this issue Apr 2, 2019
…40707)

Previously we only set the latch countdown with `nextStep.setLatch` after the
cluster state change has already been counted down. However, it's possible
execution could have already started, causing the latch to be missed when the
`MockAsyncActionStep` is being executed.

This moves the latch setting to be before the call to
`runPolicyAfterStateChange`, which means it is always available when the
`MockAsyncActionStep` is executed.

I was able to reproduce the failure every 30-40 runs before this change. With
this change, running 2000+ times the test passes.

Resolves #40018
dakrone added a commit that referenced this issue Apr 2, 2019
…40707)

Previously we only set the latch countdown with `nextStep.setLatch` after the
cluster state change has already been counted down. However, it's possible
execution could have already started, causing the latch to be missed when the
`MockAsyncActionStep` is being executed.

This moves the latch setting to be before the call to
`runPolicyAfterStateChange`, which means it is always available when the
`MockAsyncActionStep` is executed.

I was able to reproduce the failure every 30-40 runs before this change. With
this change, running 2000+ times the test passes.

Resolves #40018
gurkankaymak pushed a commit to gurkankaymak/elasticsearch that referenced this issue May 27, 2019
…lastic#40707)

Previously we only set the latch countdown with `nextStep.setLatch` after the
cluster state change has already been counted down. However, it's possible
execution could have already started, causing the latch to be missed when the
`MockAsyncActionStep` is being executed.

This moves the latch setting to be before the call to
`runPolicyAfterStateChange`, which means it is always available when the
`MockAsyncActionStep` is executed.

I was able to reproduce the failure every 30-40 runs before this change. With
this change, running 2000+ times the test passes.

Resolves elastic#40018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/ILM+SLM Index and Snapshot lifecycle management >test-failure Triaged test failures from CI
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants