Alerting: Firing/Notification Severity (Critical / Warning / Info) #6553

torkelo · 2016-11-11T18:01:45Z

When an alert rule is firing (in state Alerting), should there be different severity states as well?

By Severity I mean: Critical, Warn, Info, etc

How should severity specified?
Per alert rule or per condition / threshold?

This is very from from being worked on but is a placeholder issue for feature request & discussions of this nature.

roscoe57 · 2016-11-12T15:06:51Z

I know you are trying to keep it simple, but I would for sure like say for disk space free % prefer multi stage with varying priorities
eg free spare < 20% info (alrrt but optional notification), <10% warning & notify, <5% critical & notify
my other use case is temperature > info/warn/severe

calind · 2016-11-15T09:02:57Z

Severities for alerts is something one expects, and when there we'll be multiple alerts per graph, people will start working around this missing feature by sending emails to different email address for example.

My take is that the severities should be per condition, (eg. the same from CPU Load alert you should configure the conditions for Warning/Critical/No Data). Also the No Data condition should not fire an alert when transitioning from initial state to no data state (eg. when a new server is added).

Btw. thanks for Grafana!

haron · 2016-11-15T09:39:39Z

There should be Critical and Warning levels specified per condition. It helps users prioritize events. Also, alerting is a feature for power user, there is no need to keep it that simple.

elvarb · 2016-11-19T00:06:36Z

There was a talk I saw a while back from the etsy team or the stack exchange team that brought up a very good point:

All alerts should be actionable.

Which basically meant do away with warning alerts because when you get many of them you will consider them noise and they will be ignored.

For example free disk space. Common practice is to warn on 10% free and critical on 5% free. Regardless of the value it requires someone to take action, if it's not done from a warning message then it's done by the critical message.

What you have is just one alert state, the one that makes you take action.

What is still needed is an escalation path.

If an alert is not acknowledged or fixed within some period of time then send a second alert to a different group or channel.

Disk space is at 10% free is the first alert and requires an action, free up some space, add more space, modify the alert, something is done handle it.

If it's not fixed after 3 hour for example alerts could be sent to a second engineer, a team lead or the stakeholders.

utkarshcmu · 2016-11-19T00:10:28Z

@elvarb I think escalation policies can set in pagerduty(obviously one has to pay for their subscription) which is integrated with Grafana!

elvarb · 2016-11-19T00:17:10Z

True, but that is only one tool of many.

But in a sense I agree with that method, to have a dedicated alert handling tool, alerta.io for example.

The groups I always see in that picture.

metric collectors
queues
parsing
storing
visualizing
alert triggers
alert handling
issue tracking

In a perfect world

pdf · 2017-01-14T05:00:43Z

Rather than explicitly adding a severity property to alerts, consider instead allowing multiple alerts per panel (this is more generally useful, too).

To handle severity (and various other scenarios), you might add a colour property to the alerts for annotations, and implement variables that can be referenced in notifications/annotations.

Perhaps the template variables would be best configured globally within the top level Alerting configuration - a user could create a template var called 'severity' (with a default value?) which would then be available for population from each alert, and which could be referenced by notifiers.

The notifiers would expose their output content as templates, allowing interpolation of the template variables. Exposing the output templates would be useful in general for users to be able to customise notifications.

Including a severity var by default might be a good way to close off this issue, and provide a default example for the functionality.

magnuspwhite · 2017-01-18T14:18:06Z

Multiple alerts per panel would be a fantastic addition. Figuring out how to display the alert lines would be interesting.

Also having alert severity levels would be extremely useful. A common example would be disk usage with a critical state at 80% then an alert at 90%. Having different alerting channels and custom notification messages for each alert level would be required.

yannispanousis · 2017-02-08T15:57:40Z

I was super excited to get started with Grafana. Best tool I've worked with in so long in so many ways. I was excited to find a way to model the capacity (0-100%) of multiple different services (of the same type) within just one graph, instead of having to create new a new graph per service.

However I was disappointed to see 1) the single alert per graph limit and 2) the alert did not re-trigger when the metric value changed, provided that the overall alert state had not changed.

E.g. It might be desirable to manage ~10-20 states / series within one graph. And (sorry if I'm waffling but I got super excited with this tool) I think that's what #6041 is about. I think that'd be a great addition.

SilentGlasses · 2018-12-26T16:47:19Z

Any word on the progress of adding Criticality to alerting?

MichaelMitchellM · 2019-08-14T18:53:45Z

Any update on this?

yemble · 2019-09-25T22:15:10Z

Somewhat related PR: #19425

Lets a PagerDuty notification channel specify the severity instead of just hardcoding critical.

jpmcb · 2019-12-24T17:37:04Z

Any update on this?

We would like to specify the severity of an alert, specifically, we'd like it to be included in the dashboard JSON schema.

We saw that this PR #19425 merged functionality for PagerDuty but we'd like this to be included for all alerts.

yemble · 2019-12-24T17:57:04Z

That PR puts the severity attribute on the notification channel. You could create a different channel for each severity: "Notify critical" "Notify warning" etc and that channel selectio would be in the dashboard json (I think).

Agree that's a bit of a hack - a severity in the actual alert would be a bit nicer so the dashboard config isn't dependant on the channel config.

jpmcb · 2019-12-24T18:12:05Z

Ah I see, yes, we could use the Alerting API to create those various channels and then assign the "notifications" section of the alert json to that channel UID.

Agreed, abit of a work around, but would be great to see this feature come directly to the dashboard directly.

aviadbi1 · 2020-01-30T18:43:02Z

Any update that is not related to pagerduty?

siegenthalerroger · 2021-06-18T11:32:56Z

I think this is still relevant even with the addition of the next-gen alerts. With the new expressions however there seems to be a very easy way of achieving the required result (feel free to tell me this is already possible).

It'd be great to be able to add a label based on the evaluated metrics/expressions.

Example:
A: A promQL query that results in some timeseries
B: A reduce expression (let's say average)
C: A math expression that is evaluating whether $B is above/below a certain threshold.

Now if C could somehow add a label depending on some math output this whole issue would be solved as the notifications can already be bound to specific label values.

m-wack · 2022-10-14T15:40:39Z

Is there any update on this topic regarding NG-Alerts?

For example I want to build a RAM Check. Currently I need three checks which do:

A: A promQL query that results in some timeseries
B: A reduce expression (let's say average)
C: A math expression that is evaluating whether $B is above/below a certain threshold.

Where I then change the threshold of C to lets say 85 for Rule 1, 90 for Rule 2 and 95 for Rule 3.

Then these Rules have a label called severity to push to a webhook, which is warning for Rule 1, major for Rule 2 and critical for Rule 3.

Ideally there would be a way to either set my Label text dynamically based on the value of $B or to be able to have multiple expressions in $C which then set a label depending on what matches.

Is this still not possible and needs multiple rules?

hajdukda · 2022-11-17T08:17:32Z

Thats how currently it works in DataDog. You have "severity thresholds" on top of priority specific Monitor gets.

m-wack · 2022-12-09T15:34:22Z

For anyone still having this issue, this is actually working, thanks to a member of the slack community (https://grafana.slack.com/archives/C0Y4TLW74/p1662366010165239?thread_ts=1662121247.062299&cid=C0Y4TLW74).

'You can create a custom label severity with value like {{ if and (gt $values.B0.Value 5.0) (lt $values.B0.Value 7.5) }}critical{{ else }}warning{{ end }} adjusting the conditions to your needs.'

I tried and it did work, so i can dynamically assign a severity label

grobinson-grafana · 2023-10-03T19:45:05Z

Hi! 👋

Just to add to what @m-wack said, this is preferred approach to adding severity in Grafana Managed Alerts for the time being. I appreciate adding this for each alert is quite laborious though. Perhaps we could look into providing something in the UI that that made this easier if there is enough demand for it.

Another approach you can use is to have two alerts with different severity labels. For example, in Prometheus/Mimir:

groups:
- name: example
  rules:
  - alert: HighLatency
    expr: histogram_quantile(rate(http_request_duration_seconds_bucket[5m]), 0.95) > 0.5
    for: 5m
    labels:
      severity: high
  - alert: HighLatency
    expr: histogram_quantile(rate(http_request_duration_seconds_bucket[5m]), 0.95) > 0.1
    for: 5m
    labels:
      severity: low

This also works in Grafana Managed Alerts, but with the exception that each alert must have a different name. For example, HighLatencyLowSeverity and HighLatencyHighSeverity.

torkelo added type/feature-request type/discussion Issue to start a discussion labels Nov 11, 2016

torkelo mentioned this issue Nov 11, 2016

Alerting Query to allow multiple level triggering #6551

Closed

1 task

This was referenced Jan 14, 2017

[Feature request] Email templating in Alert engine #7121

Open

Alerting: Support per series state change tracking for queries that return multiple series #6041

Closed

antjori mentioned this issue Feb 14, 2017

[Feature request] Send notification when threshold is reached #7555

Closed

This was referenced Aug 23, 2017

Alerting support for queries using template variables #6557

Closed

[Feature request] Multiple alerts per graph #7832

Closed

pdf mentioned this issue Oct 24, 2017

Rich alert messages in notifications #5848

Closed

torkelo mentioned this issue Dec 12, 2018

[Feature Request] Multiple thresholds for alert notifications #14470

Closed

marefr mentioned this issue Oct 2, 2019

Alerting: Add configurable severity support for PagerDuty notifier #19425

Merged

marefr added area/alerting Grafana Alerting area/alerting/notifications Issues when sending alert notifications labels Jan 30, 2020

bobheadxi mentioned this issue Jun 24, 2020

monitoring: granular alerts notifications with Alertmanager sourcegraph/sourcegraph#11452

Closed

armandgrillet mentioned this issue Feb 28, 2024

Enable Warning thresholds for Grafana Alerts #82928

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Alerting: Firing/Notification Severity (Critical / Warning / Info) #6553

Alerting: Firing/Notification Severity (Critical / Warning / Info) #6553

torkelo commented Nov 11, 2016 •

edited

roscoe57 commented Nov 12, 2016

calind commented Nov 15, 2016

haron commented Nov 15, 2016

elvarb commented Nov 19, 2016

utkarshcmu commented Nov 19, 2016

elvarb commented Nov 19, 2016

pdf commented Jan 14, 2017 •

edited

magnuspwhite commented Jan 18, 2017

yannispanousis commented Feb 8, 2017

SilentGlasses commented Dec 26, 2018

MichaelMitchellM commented Aug 14, 2019

yemble commented Sep 25, 2019

jpmcb commented Dec 24, 2019

yemble commented Dec 24, 2019

jpmcb commented Dec 24, 2019

aviadbi1 commented Jan 30, 2020

siegenthalerroger commented Jun 18, 2021

m-wack commented Oct 14, 2022

hajdukda commented Nov 17, 2022

m-wack commented Dec 9, 2022

grobinson-grafana commented Oct 3, 2023

Alerting: Firing/Notification Severity (Critical / Warning / Info) #6553

Alerting: Firing/Notification Severity (Critical / Warning / Info) #6553

Comments

torkelo commented Nov 11, 2016 • edited

roscoe57 commented Nov 12, 2016

calind commented Nov 15, 2016

haron commented Nov 15, 2016

elvarb commented Nov 19, 2016

utkarshcmu commented Nov 19, 2016

elvarb commented Nov 19, 2016

pdf commented Jan 14, 2017 • edited

magnuspwhite commented Jan 18, 2017

yannispanousis commented Feb 8, 2017

SilentGlasses commented Dec 26, 2018

MichaelMitchellM commented Aug 14, 2019

yemble commented Sep 25, 2019

jpmcb commented Dec 24, 2019

yemble commented Dec 24, 2019

jpmcb commented Dec 24, 2019

aviadbi1 commented Jan 30, 2020

siegenthalerroger commented Jun 18, 2021

m-wack commented Oct 14, 2022

hajdukda commented Nov 17, 2022

m-wack commented Dec 9, 2022

grobinson-grafana commented Oct 3, 2023

torkelo commented Nov 11, 2016 •

edited

pdf commented Jan 14, 2017 •

edited