[Question/Bug] Multi Alerts(some kind) recover alert bug. #950

regardfs · 2017-08-16T11:51:01Z

Hi:
I have host CPU alert.rules like

ALERT HostCPUUsage
IF (100 - (avg by (instance) (irate(node_cpu{mode="idle"}[5m])) * 100)) > 2
FOR 2m
LABELS {
severity="critical"
}
ANNOTATIONS {
summary = "{{$labels.instance}}: High CPU usage detected",
description = "{{$labels.instance}}: CPU usage is above 80% (current value is: {{ $value }})",
}

and I met a problem of when more than two hosts have this kind of issue, then it will trigger alert like

[FIRING:10] HostCPUUsage (my-project critical)
summary:
10.10.0.86:9100: High CPU usage detected\n10.10.0.142:9100: High CPU usage detected\n10.10.0.241:9100: High CPU usage detected\n10.10.0.143:9100: High CPU usage detected\n10.10.0.92:9100: High CPU usage detected\n10.10.0.141:9100: High CPU usage detected\n10.10.0.20:9100: High CPU usage detected\n10.10.0.10:9100: High CPU usage detected\n10.10.0.201:9100: High CPU usage detected\n10.10.0.215:9100: High CPU usage detected\n
description:
10.0.0.86:9100: CPU usage is above 80% (current value Show more…

but when one recovers from the abnormal state , It will send a message like

[RESOLVED] HostCPUUsage (my-project critical)
summary:
10.10.0.86:9100: High CPU usage detected\n10.10.0.142:9100: High CPU usage detected\n10.10.0.241:9100: High CPU usage detected\n10.10.0.143:9100: High CPU usage detected\n10.10.0.92:9100: High CPU usage detected\n10.10.0.141:9100: High CPU usage detected\n10.10.0.20:9100: High CPU usage detected\n10.10.0.10:9100: High CPU usage detected\n10.10.0.201:9100: High CPU usage detected\n10.10.0.215:9100: High CPU usage detected\n
description:
10.0.0.86:9100: CPU usage is above 80% (current value Show more…

So, I think it could not reflect the real scenario, It should just print the message concerned with the certain host extremely.

My question is that might I miss something or alert-manager do not support for this?

Thanks a ton!

mxinden · 2017-08-21T11:04:03Z

@regardfs I do not understand your question. In case this is a usage question, please reopen it in https://groups.google.com/forum/#!forum/prometheus-users. In case you think you found a bug in Alertmanager or if you want to report a missing feature, please add more details to your question, e.g.:

Where are the above blobs copied from?
What is your Prometheus and Alertmanager config?

regardfs · 2017-08-21T12:15:25Z

@mxinden , Hi I just want separately alert in slack if several fire triggered at the some time...
You could see that my blob: summary and description combine all some type alerts

Where are the above blobs copied from?
I use slack to receive alert, this is the slack alert message
What is your Prometheus and Alertmanager config?
prometheus.yml import rules files

rule_files:
- "alert-rules/zixin-alert.rules"
- "alert-rules/host-alert.rules"
- "alert-rules/rabbitmq-alert.rules"

alertmanager.yml

global:
resolve_timeout: 15s
route:
receiver: 'slack'
receivers:
- name: 'slack'
slack_configs:
- send_resolved: true
channel: '#alert'
api_url: 'slack-api url'
text: '{{ template "slack.myorg.text" . }}'
templates:
- '/etc/alertmanager/templates/alertText.tmpl'

mxinden · 2017-08-21T19:58:25Z

@regardfs So if I understand you correctly you want a separate notification per Alert that was send by Prometheus, right?

Can you post a properly formatted Alertmanager.yaml?
https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet#code-and-syntax-highlighting

regardfs · 2017-08-22T04:23:52Z

@mxinden, You got it, That is just what i want, separate notification per Alert !
Sorry for pasting the wrong format of config yaml file......

config.yaml

global:
    resolve_timeout: 15s
route:
    receiver: 'slack'
receivers:
    - name: 'slack'
      slack_configs:
          - send_resolved: true
            channel: '#alert'
            api_url: 'https://hooks.slack.com/services/OOXXOOXX'
            text: '{{ template "slack.myorg.text" . }}'
templates:
- '/etc/alertmanager/templates/alertText.tmpl'

/etc/alertmanager/templates/alertText.tmp

{{ define "slack_summary" }}
{{ range .Alerts }}{{ .Annotations.summary }}
{{ end }}
{{ end }}

{{ define "slack_description" }}
{{ range .Alerts }}{{ .Annotations.description }}
{{ end }}
{{ end }}

{{ define "slack.text" }}summary: {{ template "slack_summary" . }}description: {{ template "slack_description" . }}{{ end }}

mxinden · 2017-08-23T19:57:13Z

@regardfs That is rather surprising as you don't have any alert grouping configured. Could you try to add group_by: [instance] to your route config?

global:
    resolve_timeout: 15s
route:
    receiver: 'slack'
    group_by: [instance]
receivers:
    - name: 'slack'
      slack_configs:
          - send_resolved: true
            channel: '#alert'
            api_url: 'https://hooks.slack.com/services/OOXXOOXX'
            text: '{{ template "slack.myorg.text" . }}'
templates:
- '/etc/alertmanager/templates/alertText.tmpl'

regardfs · 2017-08-27T07:01:44Z

@mxinden
Thanks a ton, I will try ASAP, I will close this issue now.

* Add processes exporter Signed-off-by: Pavel Kutishchev <pavel.kutishchev@olx.com> Signed-off-by: Ben Kochie <superq@gmail.com>

mxinden added component/notify kind/question labels Aug 21, 2017

regardfs closed this as completed Aug 27, 2017

hh pushed a commit to ii/alertmanager that referenced this issue Jun 18, 2018

Add processes exporter (prometheus#950)

456bf50

* Add processes exporter Signed-off-by: Pavel Kutishchev <pavel.kutishchev@olx.com> Signed-off-by: Ben Kochie <superq@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question/Bug] Multi Alerts(some kind) recover alert bug. #950

[Question/Bug] Multi Alerts(some kind) recover alert bug. #950

regardfs commented Aug 16, 2017

mxinden commented Aug 21, 2017

regardfs commented Aug 21, 2017

mxinden commented Aug 21, 2017

regardfs commented Aug 22, 2017 •

edited

mxinden commented Aug 23, 2017

regardfs commented Aug 27, 2017

[Question/Bug] Multi Alerts(some kind) recover alert bug. #950

[Question/Bug] Multi Alerts(some kind) recover alert bug. #950

Comments

regardfs commented Aug 16, 2017

mxinden commented Aug 21, 2017

regardfs commented Aug 21, 2017

mxinden commented Aug 21, 2017

regardfs commented Aug 22, 2017 • edited

mxinden commented Aug 23, 2017

regardfs commented Aug 27, 2017

regardfs commented Aug 22, 2017 •

edited