-
Notifications
You must be signed in to change notification settings - Fork 8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Alerting] Add cancel to alert and action tasks #64148
Comments
Pinging @elastic/kibana-alerting-services (Team:Alerting Services) |
Adding a comment here: in observability, we have a problem where when a composite agg query has a ton of pages to page through, it can run for a very very long time (days, sometimes) inside of a rule execution. In these situations, the user doesn't really have a good way out of it besides restarting Kibana, which would kill all of their currently running rules. Disabling the specific rule is also not enough, because the existing execution would still be running. We could check in the executor to see if the rule has been disabled since the start of the execution run (if that context was accessible), but we figure that in many if not most cases, the user doesn't want disabling the rule to have this effect. In other words, it seems like the only way to give users a way out of these situations is to give them control over cancelling individually running tasks. But we're open to any other ideas for how to solve this problem (outside of how to prevent it, which we are also looking into separately). |
@jasonrhodes, I'm thinking there may be a story about leveraging task manager timeouts. The task manager already has this concept and calls the One thing also worth revisiting is how timeouts are handled for recurring tasks. They equal whichever is greater between 5m or the schedule (ex: Thoughts? |
I think it's difficult because it's hard to know the difference between "a long query that is taking a while because it's running against frozen indices or cold tier storage and it's fine, we planned for this" vs. "a query that is taking way too long and the user wants to make it stop". That's what makes it seem like automated heuristics are going to be tough here. But maybe some kind of default timeout that can be adjusted for users who expect to be making very long queries would help? |
Thanks for the feedback, @jasonrhodes! It's is indeed hard to distinguish between expected and unexpected query times based on those cases. From the sounds of it, research is needed to determine what approach should be taken to solve this problem. Regarding capacity, the alerting team won't be able to look into this soon but if you feel the change is best done at the platform level, we are open for someone from O11y team to do the research and come up with a proposal and implementation. |
@YulNaumenko sounds like it :) |
Follow up from #64075 (comment)
We should consider:
cancel
implementation to Alert and Action tasks.cancel
implementation a required field on tasks definitions. 馃The text was updated successfully, but these errors were encountered: