Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there a way let rule support levels #1989

Closed
songjiayang opened this Issue Sep 14, 2016 · 5 comments

Comments

Projects
None yet
4 participants
@songjiayang
Copy link

songjiayang commented Sep 14, 2016

I am big fans of prometheus , i want to use prometheus rules and alert system , but I find the rules doesn't support different levels.

Example with load1:

If instanceload1 is in 20~30 alert as error, if bigger than 30 , alert as critical. The resolved alert only sent just load1 < 20 .

Can I do some configure work make it works ? anybody can help me.

@grobie

This comment has been minimized.

Copy link
Member

grobie commented Sep 14, 2016

The common way for this use case is to define two alerts with different thresholds and labels. You can use the same alertname in both rules, which makes it easier to create silences and inhibit rules in alertmanager.

You can use configuration management systems to script / automate the creation of such similar rules.

ALERT NodeHighLoad
  IF node_load1 > 20
  FOR 2m
  LABELS {
    severity = "warning"
  }
  ANNOTATIONS {
    summary = "...",
    description = "...",
  }

ALERT NodeHighLoad
  IF node_load1 > 30
  FOR 2m
  LABELS {
    severity = "critical"
  }
  ANNOTATIONS {
    summary = "...",
    description = "...",
  }

Alerting on high load is usually very noisy and will quickly lead to alerting fatigue. It's better to alert on symptoms then causes, I recommend reading this document for more information on this topic: https://docs.google.com/document/d/199PqyG3UsyXlwieHaqbGiWVa8eMWi8zzAn0YfcApr8Q/edit

@songjiayang

This comment has been minimized.

Copy link
Author

songjiayang commented Sep 14, 2016

thanks @grobie

There is my rules:

  ALERT InstanceLoad
    IF node_load1{job="node"} > 20 and node_load1{job="node"} <= 30
    FOR 10s
    LABELS {
      event_id = "E2.1.3",
      type = "server",
      subtype = "load",
      resource="{{$labels.instance}}/load",
      level = "Error",
      threshold = "20",
      value = "{{$value}}",
      resolved_threshold = "20",
      instance="{{$labels.instance}}"
    }

  ALERT InstanceLoad
    IF node_load1{job="node"} > 30
    FOR 10s
    LABELS {
      event_id = "E2.1.3",
      type = "server",
      subtype = "load",
      resource="{{$labels.instance}}/load",
      level = "Critical",
      threshold = "30",
      value = "{{$value}}",
      resolved_threshold = "20",
      instance="{{$labels.instance}}"
    }

With this rule config, I can get error and critical alert with alerting value. But no real resolved info.

I just try:

The resolved_thresholdlabel used to compare with the value , if the value < resolved_threshold, the real info it is.

But with prometheus solved logic, firing alerting not existing again, it means solved. So I can't get the value when solved.


I use webhook to process alert json data to check real solved in my self app.

@aecolley

This comment has been minimized.

Copy link

aecolley commented Sep 18, 2016

@songjiayang If you want to lookup alerts which have cleared, use the ALERTS pseudo-variable, for example ALERTS{alertname="InstanceLoad",alertstate="firing"}[1h].

If you want the values when the alerts clear, you can use the fact that ALERTS becomes zero when an alert clears. Request (node_load1{job="node"} and on(instance, job) ALERTS{alertname="InstanceLoad",alertstate="firing",job="node"} == 0)[1h].

@songjiayang

This comment has been minimized.

Copy link
Author

songjiayang commented Sep 19, 2016

Thank @aecolley , I will try.

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 24, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 24, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.