Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Task Manager] Increment task attempts when they fail during markTaskAsRunning #88669

Merged

Conversation

gmmorris
Copy link
Contributor

@gmmorris gmmorris commented Jan 19, 2021

Summary

Addresses 1 of 3 problems identified in #87874

When something causes an exception in TaskRunner.markTaskAsRunning() its execution fails, but this happens before we update the SO, which means that this failure does not count towards the attempts on the task. Task Manager will continue to try running this task for ever.

This PR increments the attempts when a failure occurs during TaskRunner.markTaskAsRunning() to ensure such a task doesn't continue to run to infinity.
Note that this fix will not affect scheduled tasks, as they are designed to ignore their attempts and run for ever. In such a case this task will continue to consume Task Manager resources until canceled, but these failures will be logged and could be identified when needed.

Checklist

Delete any items that are not applicable to this PR.

For maintainers

@gmmorris gmmorris requested a review from a team as a code owner January 19, 2021 12:22
@gmmorris gmmorris added Feature:Task Manager release_note:fix Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) v7.12.0 v8.0.0 labels Jan 19, 2021
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-alerting-services (Team:Alerting Services)

Copy link
Contributor

@ymao1 ymao1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@gmmorris
Copy link
Contributor Author

@elasticmachine merge upstream

@kibanamachine
Copy link
Contributor

💚 Build Succeeded

Metrics [docs]

✅ unchanged

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

Copy link
Member

@pmuellr pmuellr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@gmmorris gmmorris merged commit c89f1f1 into elastic:master Jan 21, 2021
gmmorris added a commit to gmmorris/kibana that referenced this pull request Jan 21, 2021
…skAsRunning (elastic#88669)

When something causes an exception in `TaskRunner.markTaskAsRunning()` its execution fails, but this happens before we update the SO, which means that this failure does not count towards the `attempts` on the task. Task Manager will continue to try running this task for ever.

This PR increments the `attempts` when a failure occurs during `TaskRunner.markTaskAsRunning()` to ensure such a task doesn't continue to run to infinity.
Note that this fix will not affect `scheduled` tasks, as they are designed to _ignore_ their `attempts` and run for ever. In such a case this task will continue to consume Task Manager resources until canceled, but these failures will be logged and could be identified when needed.
gmmorris added a commit that referenced this pull request Jan 21, 2021
…skAsRunning (#88669) (#88970)

When something causes an exception in `TaskRunner.markTaskAsRunning()` its execution fails, but this happens before we update the SO, which means that this failure does not count towards the `attempts` on the task. Task Manager will continue to try running this task for ever.

This PR increments the `attempts` when a failure occurs during `TaskRunner.markTaskAsRunning()` to ensure such a task doesn't continue to run to infinity.
Note that this fix will not affect `scheduled` tasks, as they are designed to _ignore_ their `attempts` and run for ever. In such a case this task will continue to consume Task Manager resources until canceled, but these failures will be logged and could be identified when needed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:Task Manager release_note:fix Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) v7.12.0 v8.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants