-
Notifications
You must be signed in to change notification settings - Fork 8.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is the alerting framework the solution for ETL? #92197
Comments
Pinging @elastic/kibana-alerting-services (Team:Alerting Services) |
I also wondered why we allow alerts not to have any actions. 🤔 |
Thanks for posting this issue Mike, it something I've been a little worried about for a while. It doesn't make sense for long running ETL tasks to cause notification of detection in the ETLed data from firing, for example, but that could easily happen today.
I think that's a separate question tbh. I don't think preview is quite enough for that if you want to leave a few detections running in parallels for a while and compare them later. |
I just thought I'd add some context that might be missing for people less familiar with Task Manager. 😄
I recently documented how these things work, and what the default scale of TM is. From the docs:
This means a long running ETL tasks might take up a slot for an extended period of time. We can always push these default numbers higher, but that comes at a cost. Scaling a system is always hard... doing so in a general purpose manner that is adaptive is even harder.
Tasks are scheduled for a certain time and it's only after they exceed that time that they are picked up by one of the Kibana instances. We could in theory pick tasks up preemptively, but there's a lot of complexity to that:
At a first glimpse these are simple problems, but in fact they are complex problems due to the distributed nature of Kibana and the requirements at play here. Another thing worth understanding is that Task Manager prioritises system stability over schedule accuracy. We have discussed ideas such as reactively scaling vertically when there are resources available, but we haven't moved ahead with that kind of work yet. As I said before: Addressing these complexities is of course possible, we just need to agree that it's the priority over other things.
These are all related, I believe.
It does occur to me that if solutions need ETL, they can use Task Manager for it directly (assuming we do some more work to improve scalability), rather than by going via the alerting framework.
As we documented here:
but...
I think we can improve this by working with the ES team to address some of the limitations and rethinking some of our task ownership strategies. I've been playing around with coordination methods between Kibana instances (leader elected via a SO or long running ownership of tasks, for example) that would allow us to reduce this kind of coordination. This is large complex problem that would require us to think about a wide range of concerns around scheduling, work load balancing etc.
There are a few aspects to this beyond the technical (such as the UX around that), but from a scaling stand point, the main concern here is the constant firing of actions and how this might clog up Task Manager. |
Closing due to lack of activity. |
I've noticed a few places using the alerting framework as a solution to do ETL (Maps, Security Detections, etc.). I created this issue to discuss where we draw the line with the Kibana Alerting framework.
Some alert types leverage the alerting framework to do some data processing for other alerts to use as inputs (see below).
The question is, is our framework the place to do ETL? If so, should we architect something to do ETL (example below)?
Some of the issues we have today:
The text was updated successfully, but these errors were encountered: