-
Notifications
You must be signed in to change notification settings - Fork 24.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] Ignore exceptions while opening job after SIGTERM to JVM #75850
Merged
elasticsearchmachine
merged 2 commits into
elastic:master
from
droberts195:ignore_exceptions_when_node_dying
Jul 30, 2021
Merged
[ML] Ignore exceptions while opening job after SIGTERM to JVM #75850
elasticsearchmachine
merged 2 commits into
elastic:master
from
droberts195:ignore_exceptions_when_node_dying
Jul 30, 2021
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
We observed that some jobs failed during a rolling upgrade in Elastic Cloud. This happened because steps of the job open sequence failed with exceptions after core Elasticsearch services shut down in response to the SIGTERM. This change makes the persistent task executor for anomaly detection jobs ignore exceptions received after the JVM has received a shutdown signal, for example a SIGTERM. By doing nothing in response to such exceptions the persistent task remains in cluster state and will get assigned to a different node after the current node leaves the cluster.
Pinging @elastic/ml-core (Team:ML) |
benwtrent
approved these changes
Jul 29, 2021
Comment on lines
278
to
280
if (autodetectProcessManager.isNodeDying() == false) { | ||
hasRunningDatafeedTask(jobTask.getJobId(), hasRunningDatafeedTaskListener); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally would like there to be a predicate before hasRunningDatafeedTaskListener
is even created. Similar to the FAILED
check.
if (autodetectProcessManager.isNodeDying()) {
return;
}
But that is not a huge deal, it just reads a little strange.
elasticsearchmachine
pushed a commit
to elasticsearchmachine/elasticsearch
that referenced
this pull request
Jul 30, 2021
…c#75850) * [ML] Ignore exceptions while opening job after SIGTERM to JVM We observed that some jobs failed during a rolling upgrade in Elastic Cloud. This happened because steps of the job open sequence failed with exceptions after core Elasticsearch services shut down in response to the SIGTERM. This change makes the persistent task executor for anomaly detection jobs ignore exceptions received after the JVM has received a shutdown signal, for example a SIGTERM. By doing nothing in response to such exceptions the persistent task remains in cluster state and will get assigned to a different node after the current node leaves the cluster. * Address review comment
💚 Backport successful
|
elasticsearchmachine
added a commit
that referenced
this pull request
Jul 30, 2021
#75872) * [ML] Ignore exceptions while opening job after SIGTERM to JVM We observed that some jobs failed during a rolling upgrade in Elastic Cloud. This happened because steps of the job open sequence failed with exceptions after core Elasticsearch services shut down in response to the SIGTERM. This change makes the persistent task executor for anomaly detection jobs ignore exceptions received after the JVM has received a shutdown signal, for example a SIGTERM. By doing nothing in response to such exceptions the persistent task remains in cluster state and will get assigned to a different node after the current node leaves the cluster. * Address review comment Co-authored-by: David Roberts <dave.roberts@elastic.co>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
auto-backport
Automatically create backport pull requests when merged
auto-merge
Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!)
>bug
:ml
Machine learning
Team:ML
Meta label for the ML team
v7.15.0
v8.0.0-alpha1
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
We observed that some jobs failed during a rolling upgrade
in Elastic Cloud. This happened because steps of the job
open sequence failed with exceptions after core Elasticsearch
services shut down in response to the SIGTERM.
This change makes the persistent task executor for anomaly
detection jobs ignore exceptions received after the JVM has
received a shutdown signal, for example a SIGTERM. By doing
nothing in response to such exceptions the persistent task
remains in cluster state and will get assigned to a different
node after the current node leaves the cluster.