Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Response Ops][Task Manager] Integration test for switching between task claim strategies #186419

Merged
merged 4 commits into from
Jun 21, 2024

Conversation

ymao1
Copy link
Contributor

@ymao1 ymao1 commented Jun 18, 2024

Resolves #184941

Summary

Adds integration test to verify that restarting Kibana with a different task claim strategy does not break anything and tasks are claimed as expected.

},
{
name: 'plugins.taskManager.metrics-subscribe-debugger',
level: 'warn',
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just to make the tests a little less noisy

@ymao1 ymao1 self-assigned this Jun 18, 2024
@ymao1 ymao1 added release_note:skip Skip the PR/issue when compiling release notes Feature:Task Manager Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) v8.15.0 labels Jun 18, 2024
@ymao1 ymao1 marked this pull request as ready for review June 18, 2024 20:35
@ymao1 ymao1 requested a review from a team as a code owner June 18, 2024 20:35
@elasticmachine
Copy link
Contributor

Pinging @elastic/response-ops (Team:ResponseOps)

Copy link
Member

@pmuellr pmuellr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

We'll eventually want some more elaborate tests, I think, but they may be painful.

Basically we want to have some tasks in running state when we shutdown the first time, should be easy to arrange with a long-running task. Waiting for it to pick it up to run again - 5m? ouch :-). Likewise, we want some claimed-but-not-yet-running tasks when we shutdown, but I have no idea how we can do that with just Kibana. We may need to write some task docs to the TM index directly, before starting up the second one.

I'm not terribly concerned about these now, but I think we'll need the "claimed-but-not-run" test if we ever go right from idle -> running and skipping marking the tasks as claimed. Something for the future ...

@ymao1
Copy link
Contributor Author

ymao1 commented Jun 18, 2024

Hmm...should I try to add some of those here? Or create follow-up issues for them? I'm not sure I understand the concern about things breaking for claimed but not yet run tasks when switching between task claimers since both claimers use the same RunningOrClaimingTaskWithExpiredRetryAt clause in the query. Is the worry that after 5m the task won't get returned in the query for some reason?

@pmuellr
Copy link
Member

pmuellr commented Jun 20, 2024

should I try to add some of those here? Or create follow-up issues for them?

It might be worth making sure we can re-claim a task that is running when we shutdown, after the 5m wait, if we can cut down the 5m wait somehow (I'm thinking it's not configurable?).

I'm not sure I understand the concern about things breaking for claimed but not yet run tasks when switching between task claimers since both claimers use the same RunningOrClaimingTaskWithExpiredRetryAt clause in the query. Is the worry that after 5m the task won't get returned in the query for some reason?

This one we can defer - and I don't think we'll need an issue for it. Mike has been talking about skipping the marking of docs as "claimed"; going straight from "idle" to "running", to try to cut down on # of updates. When we do this, I want to make sure we still LOOK for "claimed" tasks, which we might forget. Should be obvious when we get there ...

@mikecote
Copy link
Contributor

This one we can defer - and I don't think we'll need an issue for it. Mike has been talking about skipping the marking of docs as "claimed"; going straight from "idle" to "running", to try to cut down on # of updates. When we do this, I want to make sure we still LOOK for "claimed" tasks, which we might forget. Should be obvious when we get there ...

Yeah, and my thinking is we'll still look for expired retryAt from tasks in claiming status when we claim tasks, so it should just work ™️ . 😁

@ymao1
Copy link
Contributor Author

ymao1 commented Jun 20, 2024

Added a test for switching between claimers with tasks that are still running. It adds a delay in the task run function and checks that the task still exists and is running when the second Kibana instance starts up. Then checks that the task is (eventually) claimed again.

f95ab04

@ymao1
Copy link
Contributor Author

ymao1 commented Jun 20, 2024

@elasticmachine merge upstream

@ymao1
Copy link
Contributor Author

ymao1 commented Jun 21, 2024

@elasticmachine merge upstream

@kibana-ci
Copy link
Collaborator

💛 Build succeeded, but was flaky

Failed CI Steps

Test Failures

  • [job] [logs] Jest Tests #4 / ManualRuleRunModal should render confirmation button disabled if invalid time range has been selected
  • [job] [logs] Jest Tests #4 / ManualRuleRunModal should render confirmation button disabled if selected end date is in future
  • [job] [logs] Jest Tests #4 / ManualRuleRunModal should render confirmation button disabled if selected start date is more than 90 days in the past

Metrics [docs]

✅ unchanged

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @ymao1

@ymao1 ymao1 merged commit 6faadda into elastic:main Jun 21, 2024
41 checks passed
@kibanamachine kibanamachine added the backport:skip This commit does not require backporting label Jun 21, 2024
@ymao1 ymao1 deleted the tm-mget-switch branch June 21, 2024 13:31
bhapas pushed a commit to bhapas/kibana that referenced this pull request Jun 24, 2024
…ask claim strategies (elastic#186419)

Resolves elastic#184941

## Summary

Adds integration test to verify that restarting Kibana with a different
task claim strategy does not break anything and tasks are claimed as
expected.

---------

Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:skip This commit does not require backporting Feature:Task Manager release_note:skip Skip the PR/issue when compiling release notes Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) v8.15.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ResponseOps][Task Manager][mget Claimer] ensure we can switch between claimers after a restart
6 participants