Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Discussion] CASE on top of IF in alert definitions? #3025

Closed
RichiH opened this Issue Aug 7, 2017 · 5 comments

Comments

Projects
None yet
3 participants
@RichiH
Copy link
Member

RichiH commented Aug 7, 2017

Having a diverse environment with different SLAs towards customers, we need to duplicate quite a lot of alert rules to account for different severities, escalation paths, or even ignoring some things.

If we had a CASE stanza which allowed us to define the matching rules and then, in a second step, decide what labels to attach, finished by a catchall for anything we didn't match, would save quite a bit of effort.

I do think that there should be a mandatory catchall to make sure people don't fully handle alarms by mistake.

@brian-brazil argues that this "can be handled with configuration management"

My argument would be that this is an orthogonal issue. It would be easier to create distinct definitions from config management and the amount of config text does not matter when autogenerated and never looked at. But in reality, humans will verify the alert rules on a regular basis, and a truly heterogenic environment will always require a few one-offs; both of which are better handled by humans who appreciate not needing to read, or write, autogenerated text.

@beorn7

This comment has been minimized.

Copy link
Member

beorn7 commented Aug 7, 2017

We have just solved a similar-looking use-case at SoundCloud with stock PromQL. We have defined some "static" rules that contain the labels to attach and the label we want to decide on if said labels should be attached, e.g.

severity_path_service:alert_threshold:perc1m{path="/some/endpoint", service="foo", severity="warning"} = 0.5

In the above example, /some/endpoint is an endpoint belonging to service foo. If more than 0.5% of the requests to that endpoint fail, then a warning should be directed to the owners of service foo. (We route alerts by service.) The actual alerting expression joins ON path. A catch-all for paths not specified in any static rule is implemented as a separate rule.

The whole thing is a bit arcane, but maintenance is easy for our users (just modify/add/remove static rules), and the use-case is probably sufficiently rare/advance to justify a bit of PromQL magic.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Aug 7, 2017

That use is something I ensured was supported when the group_left logic was changed. It's presently only documented in a unittest: https://github.com/prometheus/prometheus/blob/master/promql/testdata/operators.test#L325-L335

@RichiH

This comment has been minimized.

Copy link
Member Author

RichiH commented Aug 14, 2017

Adding @bastischubert

Interesting; will test this internally.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Aug 21, 2017

Given this is already possible in a reasonable way via promql, that this is getting into offering more than one way to do things (we already have IF), and more complex cases will require configuration management an/or complex PromQL anyway I don't think this is something we should ever add.

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 23, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 23, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.