Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to require that the index used in a rule exists #133035

Open
mcpate opened this issue May 26, 2022 · 4 comments
Open

Add ability to require that the index used in a rule exists #133035

mcpate opened this issue May 26, 2022 · 4 comments
Labels
Feature:Alerting/RuleTypes Issues related to specific Alerting Rules Types Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams)

Comments

@mcpate
Copy link

mcpate commented May 26, 2022

Describe the feature:
It would be nice to have the ability (a checkbox or the like) to say that a particular index must exist, or the rule should "fire".

Describe a specific use case for the feature:
We have several indices that have ILM policies that cause a daily or monthly rollover. We would like to have rules that use these indices, and that verify that the indices don't contain specific data over a given time period (i.e. check that, over the last two hours, we don't have any documents in index X that match query Y). This might work correctly when initially setting up the rule, but in the future, if index rollover fails (as specified by the ILM policy) and no new index is created, then when the rule runs the index won't exist and the query in the rule won't find any hits. As far as I know, this will not cause a problem for the rule. It would be nice if the "missing expected index" caused the rule to alert.

@mcpate mcpate added the Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) label May 26, 2022
@elasticmachine
Copy link
Contributor

Pinging @elastic/response-ops (Team:ResponseOps)

@gmmorris gmmorris added the Feature:Alerting/RuleTypes Issues related to specific Alerting Rules Types label May 26, 2022
@gmmorris
Copy link
Contributor

Tagged as RuleType as @mcpate is referring here to the implementations of the ES Query and Index Threshold rule types.

That said, I do wonder if this sort of thing is better served as a dedicated Stack Monitoring rule that alerts when ILM rollover fails 🤔.

@pmuellr
Copy link
Member

pmuellr commented May 26, 2022

Is the idea that this is a new rule type, or a new option on all/some index-based rules? Mentioning ILM in the description confuses me, because even if ILM fails a roll-over, there's still going to be an index that it's writing to, so I don't understand how a rule based on the ILM'd alias would ever not see an index. In any case, adding ILM to the story here seems to adding complexity (in my mind anyway), but I can definitely imagine scenarios of changing rule behavior based on a non-ILM story of querying an index that doesn't exist. So just wondering if we can re-cast the story like that.

Or, as Gidi mentioned, some kind of ILM-specific stack monitoring rule.

We'd need to figure out what to do if the "index" being queried is actually a pattern or alias, and if "some" of the indices specified in the pattern/alias don't exist, does this "fire"? Or would it have to be "all" of them? Probably all.

If we added this as an "option" for existing rule types, I think we'd actually want a new "free" action group (like our "free" RECOVERED group now), so that you could target THAT aspect of the rule run vs what the rule typically does - allowing you to change where the notification happens and what the messaging is. This is similar to our thoughts a long time ago about a "NO DATA" action group that we could provide in the framework that all rules could use if they wanted.

In fact, I wonder if "NO DATA" would suffice in this case - again, it doesn't exist as a general thing yet, but there are issues open on it. Metric threshold already supports their own flavor of "NO DATA", as an option in the rule definition, but does not support a specific action group for it, so not quite sure how that works. Presumably it sets a context variable indicating this condition.

@EricDavisX
Copy link
Contributor

I logged a related issue: #143315 I won't close them out just now. My issue suggests a slightly simpler path (though simpler, it should be validated if it makes full sense) - I submit that we throw an error if the index is found during execution to not exist at the time of execution. It would help triage, and doesn't require 'options'. My concern was helping triage and not supporting any other feature needs which this may be about.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:Alerting/RuleTypes Issues related to specific Alerting Rules Types Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams)
Projects
No open projects
Development

No branches or pull requests

5 participants