Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Task Manager] cancel expired tasks as part of the available workers check #88483

Merged
merged 5 commits into from Jan 20, 2021

Conversation

gmmorris
Copy link
Contributor

@gmmorris gmmorris commented Jan 15, 2021

Summary

Addresses 1 of 3 problems identified in #87874

When a task expires it continues to reside in the queue until TaskPool.cancelExpiredTasks() is called. We call this in TaskPool.run(), but run won't get called if there is no capacity, as we gate the poller on TaskPool.availableWorkers() and that means that if you have as many expired tasks as you have workers - your poller will continually restart but the queue will remain full and that Task Manager is then in capable of taking on any more work. This is what caused [Task Poller Monitor]: Observable Monitor: Hung Observable...

Checklist

Delete any items that are not applicable to this PR.

For maintainers

@gmmorris gmmorris requested a review from a team as a code owner January 15, 2021 17:34
@gmmorris gmmorris marked this pull request as ready for review January 15, 2021 17:34
@gmmorris gmmorris added 8.0.0 Feature:Task Manager release_note:fix Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) v7.12.0 labels Jan 15, 2021
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-alerting-services (Team:Alerting Services)

@gmmorris gmmorris changed the title Tm/cancel expired tasks on hung tm [Task Manager] cancel expired tasks as part of the available workers check Jan 15, 2021
Copy link
Contributor

@YulNaumenko YulNaumenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

* master: (33 commits)
  [Security Solution][Case] Fix patch cases integration test with alerts (elastic#88311)
  [Security Solutions][Detection Engine] Removes duplicate API calls (elastic#88420)
  Fix log msg (elastic#88370)
  [Test] Add tag cloud visualization to dashboard in functional test for reporting (elastic#87600)
  removing kibana-core-ui from codeowners (elastic#88111)
  [Alerting] Migrate Event Log plugin to TS project references (elastic#81557)
  [Maps] fix zooming while drawing shape filter logs errors in console (elastic#88413)
  Porting fixes 1 (elastic#88477)
  [APM] Explicitly set environment for cross-service links (elastic#87481)
  chore(NA): remove mocha junit ci integrations (elastic#88129)
  [APM] Only display relevant sections for rum agent in service overview (elastic#88410)
  [Enterprise Search] Automatically mock shared logic files (elastic#88494)
  [APM] Disable Create custom link button on Transaction details page for read-only users
  [Docs] clean-up vega map reference documenation (elastic#88487)
  [Security Solution] Fix Timeline event details layout (elastic#88377)
  Change DELETE to POST for _bulk_delete to avoid incompatibility issues (elastic#87914)
  [Monitoring] Change cloud messaging on no data page (elastic#88375)
  [Uptime] clear ping state when PingList component in unmounted (elastic#88321)
  [APM] Consistent terminology for latency and throughput (elastic#88452)
  fix copy (elastic#88481)
  ...
Copy link
Contributor

@ymao1 ymao1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@gmmorris
Copy link
Contributor Author

@elasticmachine merge upstream

1 similar comment
@gmmorris
Copy link
Contributor Author

@elasticmachine merge upstream

@kibanamachine
Copy link
Contributor

💚 Build Succeeded

Metrics [docs]

✅ unchanged

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@gmmorris gmmorris merged commit 4878554 into elastic:master Jan 20, 2021
gmmorris added a commit to gmmorris/kibana that referenced this pull request Jan 20, 2021
…check (elastic#88483)

When a task expires it continues to reside in the queue until `TaskPool.cancelExpiredTasks()` is called. We call this in `TaskPool.run()`, but `run` won't get called if there is no capacity, as we gate the poller on `TaskPool.availableWorkers()` and that means that if you have as many expired tasks as you have workers - your poller will continually restart but the queue will remain full and that Task Manager is then in capable of taking on any more work. This is what caused `[Task Poller Monitor]: Observable Monitor: Hung Observable...`
gmmorris added a commit that referenced this pull request Jan 20, 2021
…check (#88483) (#88874)

When a task expires it continues to reside in the queue until `TaskPool.cancelExpiredTasks()` is called. We call this in `TaskPool.run()`, but `run` won't get called if there is no capacity, as we gate the poller on `TaskPool.availableWorkers()` and that means that if you have as many expired tasks as you have workers - your poller will continually restart but the queue will remain full and that Task Manager is then in capable of taking on any more work. This is what caused `[Task Poller Monitor]: Observable Monitor: Hung Observable...`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:Task Manager release_note:fix Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) v7.12.0 v8.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants