Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HistoryTemplateSearchInputMappingsTests failure on master with RejectedExecutionException #47749

Closed
dakrone opened this issue Oct 8, 2019 · 2 comments · Fixed by #48658
Closed
Labels
:Data Management/ILM+SLM Index and Snapshot lifecycle management >test-failure Triaged test failures from CI

Comments

@dakrone
Copy link
Member

dakrone commented Oct 8, 2019

The failure looks like:

 2> Oct 08, 2019 2:38:57 PM com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException
  2> WARNING: Uncaught exception in thread: Thread[elasticsearch[node_sm1][snapshot][T#2],5,TGRP-HistoryTemplateSearchInputMappingsTests]
  2> java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@71681ccd[Not completed, task = java.util.concurrent.Executors$RunnableAdapter@305516f6[Wrapped task = org.elasticsearch.xpack.core.scheduler.SchedulerEngine$ActiveSchedule@346a8424]] rejected from java.util.concurrent.ScheduledThreadPoolExecutor@5829a5fb[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0]
  2> 	at __randomizedtesting.SeedInfo.seed([B75BDA1771E40CE4]:0)
  2> 	at java.base/java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2055)
  2> 	at java.base/java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:825)
  2> 	at java.base/java.util.concurrent.ScheduledThreadPoolExecutor.delayedExecute(ScheduledThreadPoolExecutor.java:340)
  2> 	at java.base/java.util.concurrent.ScheduledThreadPoolExecutor.schedule(ScheduledThreadPoolExecutor.java:562)
  2> 	at org.elasticsearch.xpack.core.scheduler.SchedulerEngine$ActiveSchedule.scheduleNextRun(SchedulerEngine.java:222)
  2> 	at org.elasticsearch.xpack.core.scheduler.SchedulerEngine$ActiveSchedule.<init>(SchedulerEngine.java:196)
  2> 	at org.elasticsearch.xpack.core.scheduler.SchedulerEngine.add(SchedulerEngine.java:147)
  2> 	at org.elasticsearch.xpack.slm.SnapshotRetentionService.rescheduleRetentionJob(SnapshotRetentionService.java:88)
  2> 	at org.elasticsearch.xpack.slm.SnapshotRetentionService.onMaster(SnapshotRetentionService.java:73)
  2> 	at org.elasticsearch.cluster.service.ClusterApplierService$OnMasterRunnable.run(ClusterApplierService.java:644)
  2> 	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:699)
  2> 	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
  2> 	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
  2> 	at java.base/java.lang.Thread.run(Thread.java:834)

  2> com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=168, name=elasticsearch[node_sm1][snapshot][T#2], state=RUNNABLE, group=TGRP-HistoryTemplateSearchInputMappingsTests]

        Caused by:
        java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@71681ccd[Not completed, task = java.util.concurrent.Executors$RunnableAdapter@305516f6[Wrapped task = org.elasticsearch.xpack.core.scheduler.SchedulerEngine$ActiveSchedule@346a8424]] rejected from java.util.concurrent.ScheduledThreadPoolExecutor@5829a5fb[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0]

I was not able to reproduce this with

REPRODUCE WITH: ./gradlew ':x-pack:plugin:watcher:test' --tests "org.elasticsearch.xpack.watcher.history.HistoryTemplateSearchInputMappingsTests" -Dtests.seed=B75BDA1771E40CE4 -Dtests.security.manager=true -Dtests.locale=en-US -Dtests.timezone=Etc/UTC -Dcompiler.java=12 -Druntime.java=11

https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+pull-request-2/8367/console
https://gradle-enterprise.elastic.co/s/bsbxzot5q5dfy

@dakrone dakrone added >test-failure Triaged test failures from CI :Data Management/Watcher labels Oct 8, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-features (:Core/Features/Watcher)

@romseygeek
Copy link
Contributor

ChainIntegrationTests failed with the same error:

2> Oct 18, 2019 11:34:05 AM com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException
--
2> WARNING: Uncaught exception in thread: Thread[elasticsearch[node_sm1][snapshot][T#1],5,TGRP-ChainIntegrationTests]
2> java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@2fce700a[Not completed, task = java.util.concurrent.Executors$RunnableAdapter@2fa46546[Wrapped task = org.elasticsearch.xpack.core.scheduler.SchedulerEngine$ActiveSchedule@6aa6eed6]] rejected from java.util.concurrent.ScheduledThreadPoolExecutor@5b864592[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0]
2> 	at __randomizedtesting.SeedInfo.seed([9CD142BD98DB2226]:0)
2> 	at java.base/java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2055)
2> 	at java.base/java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:825)
2> 	at java.base/java.util.concurrent.ScheduledThreadPoolExecutor.delayedExecute(ScheduledThreadPoolExecutor.java:340)
2> 	at java.base/java.util.concurrent.ScheduledThreadPoolExecutor.schedule(ScheduledThreadPoolExecutor.java:562)
2> 	at org.elasticsearch.xpack.core.scheduler.SchedulerEngine$ActiveSchedule.scheduleNextRun(SchedulerEngine.java:222)
2> 	at org.elasticsearch.xpack.core.scheduler.SchedulerEngine$ActiveSchedule.<init>(SchedulerEngine.java:196)
2> 	at org.elasticsearch.xpack.core.scheduler.SchedulerEngine.add(SchedulerEngine.java:147)
2> 	at org.elasticsearch.xpack.slm.SnapshotRetentionService.rescheduleRetentionJob(SnapshotRetentionService.java:88)
2> 	at org.elasticsearch.xpack.slm.SnapshotRetentionService.onMaster(SnapshotRetentionService.java:73)
2> 	at org.elasticsearch.cluster.service.ClusterApplierService$OnMasterRunnable.run(ClusterApplierService.java:644)
2> 	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:699)
2> 	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
2> 	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
2> 	at java.base/java.lang.Thread.run(Thread.java:834)

It seems to be something to do with the way the SnapshotRetentionService is being shut down?

Build scan is here: https://gradle-enterprise.elastic.co/s/5z5xvauuwaqxg/console-log?task=:x-pack:plugin:watcher:test

@romseygeek romseygeek added the :Data Management/ILM+SLM Index and Snapshot lifecycle management label Oct 18, 2019
dakrone added a commit to dakrone/elasticsearch that referenced this issue Oct 29, 2019
This adds a guard for the SLM lifecycle and retention service that
prevents new jobs from being scheduled once the service has been
stopped. Previous if the node were shut down the service would be
stopped, but a cluster state or local master election would cause a job
to attempt to be scheduled. This could lead to an uncaught
`RejectedExecutionException`.

Resolves elastic#47749
dakrone added a commit that referenced this issue Oct 30, 2019
This adds a guard for the SLM lifecycle and retention service that
prevents new jobs from being scheduled once the service has been
stopped. Previous if the node were shut down the service would be
stopped, but a cluster state or local master election would cause a job
to attempt to be scheduled. This could lead to an uncaught
`RejectedExecutionException`.

Resolves #47749
dakrone added a commit to dakrone/elasticsearch that referenced this issue Oct 30, 2019
This adds a guard for the SLM lifecycle and retention service that
prevents new jobs from being scheduled once the service has been
stopped. Previous if the node were shut down the service would be
stopped, but a cluster state or local master election would cause a job
to attempt to be scheduled. This could lead to an uncaught
`RejectedExecutionException`.

Resolves elastic#47749
dakrone added a commit to dakrone/elasticsearch that referenced this issue Oct 30, 2019
This adds a guard for the SLM lifecycle and retention service that
prevents new jobs from being scheduled once the service has been
stopped. Previous if the node were shut down the service would be
stopped, but a cluster state or local master election would cause a job
to attempt to be scheduled. This could lead to an uncaught
`RejectedExecutionException`.

Resolves elastic#47749
dakrone added a commit to dakrone/elasticsearch that referenced this issue Oct 30, 2019
This adds a guard for the SLM lifecycle and retention service that
prevents new jobs from being scheduled once the service has been
stopped. Previous if the node were shut down the service would be
stopped, but a cluster state or local master election would cause a job
to attempt to be scheduled. This could lead to an uncaught
`RejectedExecutionException`.

Resolves elastic#47749
dakrone added a commit that referenced this issue Oct 30, 2019
This adds a guard for the SLM lifecycle and retention service that
prevents new jobs from being scheduled once the service has been
stopped. Previous if the node were shut down the service would be
stopped, but a cluster state or local master election would cause a job
to attempt to be scheduled. This could lead to an uncaught
`RejectedExecutionException`.

Resolves #47749
dakrone added a commit that referenced this issue Oct 30, 2019
This adds a guard for the SLM lifecycle and retention service that
prevents new jobs from being scheduled once the service has been
stopped. Previous if the node were shut down the service would be
stopped, but a cluster state or local master election would cause a job
to attempt to be scheduled. This could lead to an uncaught
`RejectedExecutionException`.

Resolves #47749
dakrone added a commit that referenced this issue Oct 30, 2019
* Don't schedule SLM jobs when services have been stopped (#48658)

This adds a guard for the SLM lifecycle and retention service that
prevents new jobs from being scheduled once the service has been
stopped. Previous if the node were shut down the service would be
stopped, but a cluster state or local master election would cause a job
to attempt to be scheduled. This could lead to an uncaught
`RejectedExecutionException`.

Resolves #47749

* Fix test for backport
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/ILM+SLM Index and Snapshot lifecycle management >test-failure Triaged test failures from CI
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants