Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bubble-up exceptions from scheduler #38317

Merged

Conversation

Projects
None yet
4 participants
@henningandersen
Copy link
Contributor

commented Feb 4, 2019

Instead of logging warnings we now rethrow exceptions thrown inside
scheduled/submitted tasks. This will still log them as warnings in
production but has the added benefit that if they are thrown during
unit/integration test runs, the test will be flagged as an error.

This is a continuation of #38014

Primary review target is Scheduler.SafeScheduledThreadPoolExecutor.afterExecute

Bubble-up exceptions from scheduler
Instead of logging warnings we now rethrow exceptions thrown inside
scheduled/submitted tasks. This will still log them as warnings in
production but has the added benefit that if they are thrown during
unit/integration test runs, the test will be flagged as an error.

This is a continuation of #38014
@elasticmachine

This comment has been minimized.

Copy link

commented Feb 4, 2019

henningandersen added some commits Feb 4, 2019

NPE fix in GlobalCheckPointListeners
Fixed NPE that caused CCR tests (IndexFollowingIT and likely others)
to fail.
Fix ThreadPool.scheduleUnlessShuttingDown
scheduleUnlessShuttingDown could bubble rejected exception to uncaught
exception handler when not using SAME executor. Now ignore rejected
exception if executor is shutdown.
Fix ThreadPool.scheduleUnlessShuttingDown
scheduleUnlessShuttingDown could bubble rejected exception to uncaught
exception handler when not using SAME executor. Now ignore rejected
exception if executor is shutdown.
@henningandersen

This comment has been minimized.

Copy link
Contributor Author

commented Feb 4, 2019

@elasticmachine please run elasticsearch-ci/2

@henningandersen henningandersen requested a review from ywelsch Feb 4, 2019

@ywelsch
Copy link
Contributor

left a comment

I've left one question.

@@ -354,8 +354,11 @@ public ScheduledCancellable schedule(Runnable command, TimeValue delay, String e
}

public void scheduleUnlessShuttingDown(TimeValue delay, String executor, Runnable command) {
if (!Names.SAME.equals(executor)) {
command = new ThreadedRunnableAllowShutdown(command, executor(executor));

This comment has been minimized.

Copy link
@ywelsch

ywelsch Feb 4, 2019

Contributor

what's the reason for this change? It's unrelated to this PR, no?

This comment has been minimized.

Copy link
@henningandersen

henningandersen Feb 4, 2019

Author Contributor

It is related to this PR. If you schedule something on a non-SAME executor and then shutdown the threadpool afterwards (before the delay is passed), the execute on the executor will fail, causing an exception. This exception is thrown on the scheduler thread. So far this caused no issues, since it was ignored (and recently logged). But with the changes in this PR, the exception was bubbled to the uncaught exception handler causing tests to fail.

Above fixes that such that scheduleUnlessShuttingDown allows both the schedule call itself to pass and the subsequent execute on the executor.

This comment has been minimized.

Copy link
@ywelsch

ywelsch Feb 4, 2019

Contributor

Will the regular .schedule method not suffer from the same issue?

This comment has been minimized.

Copy link
@henningandersen

henningandersen Feb 4, 2019

Author Contributor

Yes, it will potentially, though none of the CI runs ran into it.

I guess we could argue that before the warning/exception changes, this would go unnoticed for both schedule and scheduleUnlessShuttingDown and therefore we should silently ignore this in both cases?

This comment has been minimized.

Copy link
@henningandersen

henningandersen Feb 5, 2019

Author Contributor

I think the shutdown procedure is relevant to this question. ThreadPool.terminate does a regular shutdown first on both scheduler and pool executors. This allows any scheduled tasks on the ScheduledThreadPoolExecutor to run to completion but does not allow executing them on the pool executor.

So scheduling jobs using SAME executor with delay until after shutdown will complete, using any other executor will not.

Given that we have both methods, I think the right solution is to make sure that tasks scheduled to run within the terminate timeout will run. I think this belongs in a follow-up PR. Two options:

  1. During terminate, we can do awaitTermination of the scheduler before calling shutdown on the rest of the executors.
  2. If a scheduled job fails to call execute on the executor, we simply call it in current thread (ie. one of the scheduler threads).

This comment has been minimized.

Copy link
@henningandersen

henningandersen Feb 5, 2019

Author Contributor

Discussed this on another channel. Conclusion is to silently ignore scheduled tasks that fail to re-execute on the target executor if the target executor is shutdown (ie. like it was before all our changes).

scheduleUnlessShuttingDown was primarily added to avoid having to catch and handle the rejected exception everywhere. Down the road we should likely merge the two methods into one after analyzing all usages.

@henningandersen

This comment has been minimized.

Copy link
Contributor Author

commented Feb 4, 2019

Thanks @ywelsch, have responded to your question above.

@henningandersen

This comment has been minimized.

Copy link
Contributor Author

commented Feb 5, 2019

@elasticmachine run elasticsearch-ci/1
@elasticmachine run elasticsearch-ci/2

ThreadPool.schedule disregard shutdown
Like scheduleUnlessShuttingDown, we want to silently ignore exceptions
thrown during execute on target executor. Once ThreadPool.terminate()
has done shutdown of thread pools, all scheduled tasks not using SAME
executor will not run.

@henningandersen henningandersen requested a review from ywelsch Feb 5, 2019

@ywelsch

ywelsch approved these changes Feb 5, 2019

Copy link
Contributor

left a comment

LGTM

@henningandersen

This comment has been minimized.

Copy link
Contributor Author

commented Feb 5, 2019

@elasticmachine run elasticsearch-ci/1
@elasticmachine run elasticsearch-ci/2
@elasticmachine run elasticsearch-ci/default-distro

@henningandersen

This comment has been minimized.

Copy link
Contributor Author

commented Feb 5, 2019

@elasticmachine run elasticsearch-ci/default-distro

@henningandersen

This comment has been minimized.

Copy link
Contributor Author

commented Feb 5, 2019

@elasticmachine run elasticsearch-ci/1

1 similar comment
@henningandersen

This comment has been minimized.

Copy link
Contributor Author

commented Feb 5, 2019

@elasticmachine run elasticsearch-ci/1

@henningandersen

This comment has been minimized.

Copy link
Contributor Author

commented Feb 5, 2019

@elasticmachine run elasticsearch-ci/packaging-sample

1 similar comment
@henningandersen

This comment has been minimized.

Copy link
Contributor Author

commented Feb 5, 2019

@elasticmachine run elasticsearch-ci/packaging-sample

henningandersen added a commit that referenced this pull request Feb 5, 2019

Bubble-up exceptions from scheduler (#38441)
Instead of logging warnings we now rethrow exceptions thrown inside
scheduled/submitted tasks. This will still log them as warnings in
production but has the added benefit that if they are thrown during
unit/integration test runs, the test will be flagged as an error.

Fixed NPE in GlobalCheckPointListeners that caused CCR tests
(IndexFollowingIT and likely others) to fail.

This is a continuation of #38014

Backports #38317

@henningandersen henningandersen merged commit 20c66c5 into elastic:master Feb 5, 2019

7 checks passed

CLA Commit author is a member of Elasticsearch
Details
elasticsearch-ci/1 Build finished.
Details
elasticsearch-ci/2 Build finished.
Details
elasticsearch-ci/default-distro Build finished.
Details
elasticsearch-ci/docbldesx Build finished.
Details
elasticsearch-ci/oss-distro-docs Build finished.
Details
elasticsearch-ci/packaging-sample Build finished.
Details

@colings86 colings86 added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019

jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Feb 11, 2019

Merge branch 'master' into retention-lease-ccr
* master:
  Add an authentication cache for API keys (elastic#38469)
  Fix exit code in certutil packaging test (elastic#38393)
  Enable logs for intermittent test failure (elastic#38426)
  Disable BWC to backport recovering retention leases (elastic#38477)
  Enable bwc tests now that elastic#38443 is backported. (elastic#38462)
  Fix Master Failover and DataNode Leave Blocking Snapshot (elastic#38460)
  Recover retention leases during peer recovery (elastic#38435)
  Set update mappings mater node timeout to 30 min (elastic#38439)
  Assert job is not null in FullClusterRestartIT (elastic#38218)
  Update ilm-api.asciidoc, point to REMOVE policy (elastic#38235) (elastic#38463)
  SQL: Fix esType for DATETIME/DATE and INTERVALS (elastic#38179)
  Handle deprecation header-AbstractUpgradeTestCase (elastic#38396)
  XPack: core/ccr/Security-cli migration to java-time (elastic#38415)
  Disable bwc tests for elastic#38443 (elastic#38456)
  Bubble-up exceptions from scheduler (elastic#38317)
  Re-enable TasksClientDocumentationIT.testCancelTasks (elastic#38234)
  Allow custom authorization with an authorization engine  (elastic#38358)
  CRUDDocumentationIT fix documentation references
  Remove support for internal versioning for concurrency control (elastic#38254)

dimitris-athanasiou added a commit to dimitris-athanasiou/elasticsearch that referenced this pull request Feb 12, 2019

Bubble-up exceptions from scheduler (elastic#38317)
Instead of logging warnings we now rethrow exceptions thrown inside
scheduled/submitted tasks. This will still log them as warnings in
production but has the added benefit that if they are thrown during
unit/integration test runs, the test will be flagged as an error.

This is a continuation of elastic#38014

Fixed NPE that caused CCR tests (IndexFollowingIT and likely others)
to fail.

schedule could bubble rejected exception to uncaught exception
handler when not using SAME executor if thread pool is terminated.
Now ignore rejected exception silently if executor is shutdown.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.