-
Notifications
You must be signed in to change notification settings - Fork 8.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failed tasks and action type execution param objects remain as saved objects forever #55340
Comments
Pinging @elastic/kibana-alerting-services (Team:Alerting Services) |
An additional complication for actions is that they can return a retry status, which should cause them to get retried. I don't remember what the logic is to stop doing retries, TM must have some max retry count or something. Anyhoo, we'll need to figure out how to have TM delete the Another thought is that - I think when we added |
This could have a UI to allow the user to manage this as a feature. |
Yes, per Bill's comment in the sync this week: "not deleting these is a feature" :-) We'd need to build a UI to show them, delete them, etc. Sounds like work ... |
Following this PR these Saved Object no longer remain forever, but rather they are cleaned up. Any objection? |
I spoke to @mikecote and we feel this issue can remain open as it raises the more fundamental question of why these tasks remain as a failed task at all rather than closed at the time? |
Adding this issue to 7.16/8.0 candidates as it would benefit the core team to addressing some debt early (#106991) if/when this issue resolves. |
Action failures are logged already in the action/execute event log documents (see below); that doc includes the error message, and also includes the id, typeId, and namespace (name would be good to add; but more generally, we should grow a kibana/x-pack/plugins/actions/server/lib/action_executor.ts Lines 233 to 253 in 9edcf9e
You mean, "enables a human to notice them and then manually do something" - or for code that does this, somehow. I could imagine a UI where you could see failed connector executions, and then retry them. That what you're thinking? I'm thinking if all we need is the params for the connector - and probably the config, but not the secrets - it would be nice to put these in the event log (for failures only!) in an object enabled: false field (would be some new ECS field), so we could add them, but they wouldn't impact indexing, and hopefully not impact event log size (hoping there aren't a lot of these). |
@gmmorris ^^ regarding the last two paragraphs (in reply to your comment) |
I don't know TBH... but yeah I was thinking in that general direction... perhaps a "retry" button next to each failure that tries to refire the action? 🤷 |
Do we need this in the event log, if we can just query the saved objects for the same information? Or is there an potential issue accessing this in the event of an issue? |
I synced a bit with @mikecote about this issue and I think we have a potential path forward. Task manager will persist any and all failed tasks and this behavior is not something we necessarily want to change now. However, we do not want to persist failed To avoid needing to change any behavior in task manger directly, we can leverage the fact that task manager will remove all successful non-recurring tasks and simply pass off all action failures as a "success" to task manager. Of course, we still want to preserve existing behavior as much as possible (which primarily includes error/warning logs to the Kibana server log as well as failed entries in the event log) and this change will do that. One point of note is retry logic. We currently support retry-able actions, but the action type itself needs to specify this and none of built-in action types do this today. This behavior should be preserved as well and we need to ensure that if the action task is retried >= the max attempts, we still do not store the failed task forever. I did a PoC of this in this PR by modifying our existing email action type to support retry. FlowsCurrent
Suggested change
|
ya, that ^^^ works for me! One slight change - add the space and action name to the error message. It's difficult for a user to go from connector id (or rule id) to the actual object in the UI - much easier with the name and space :-) |
This would mean that failed actions appear in the Task Manager health stats as successful.... Can we live with that? |
@mikecote @chrisronline and I discussed this synchronously. We agreed that:
Does this summary sound right @chrisronline @mikecote ? |
LGTM 👍 |
LGTM! |
When a task in task manager fails and will no longer use retry logic, it gets a status of
failed
and remains a saved object forever.When executing an action type via task manager (ex from alerting) and it fails, the same applies for its task in task manager but also the same for the
action_task_params
object.The text was updated successfully, but these errors were encountered: