Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question/Bug] Multi Alerts(some kind) recover alert bug. #950

Closed
regardfs opened this issue Aug 16, 2017 · 6 comments
Closed

[Question/Bug] Multi Alerts(some kind) recover alert bug. #950

regardfs opened this issue Aug 16, 2017 · 6 comments

Comments

@regardfs
Copy link

Hi:
I have host CPU alert.rules like

ALERT HostCPUUsage
IF (100 - (avg by (instance) (irate(node_cpu{mode="idle"}[5m])) * 100)) > 2
FOR 2m
LABELS {
severity="critical"
}
ANNOTATIONS {
summary = "{{$labels.instance}}: High CPU usage detected",
description = "{{$labels.instance}}: CPU usage is above 80% (current value is: {{ $value }})",
}

and I met a problem of when more than two hosts have this kind of issue, then it will trigger alert like

[FIRING:10] HostCPUUsage (my-project critical)
summary:
10.10.0.86:9100: High CPU usage detected\n10.10.0.142:9100: High CPU usage detected\n10.10.0.241:9100: High CPU usage detected\n10.10.0.143:9100: High CPU usage detected\n10.10.0.92:9100: High CPU usage detected\n10.10.0.141:9100: High CPU usage detected\n10.10.0.20:9100: High CPU usage detected\n10.10.0.10:9100: High CPU usage detected\n10.10.0.201:9100: High CPU usage detected\n10.10.0.215:9100: High CPU usage detected\n
description:
10.0.0.86:9100: CPU usage is above 80% (current value Show more…

but when one recovers from the abnormal state , It will send a message like

[RESOLVED] HostCPUUsage (my-project critical)
summary:
10.10.0.86:9100: High CPU usage detected\n10.10.0.142:9100: High CPU usage detected\n10.10.0.241:9100: High CPU usage detected\n10.10.0.143:9100: High CPU usage detected\n10.10.0.92:9100: High CPU usage detected\n10.10.0.141:9100: High CPU usage detected\n10.10.0.20:9100: High CPU usage detected\n10.10.0.10:9100: High CPU usage detected\n10.10.0.201:9100: High CPU usage detected\n10.10.0.215:9100: High CPU usage detected\n
description:
10.0.0.86:9100: CPU usage is above 80% (current value Show more…

So, I think it could not reflect the real scenario, It should just print the message concerned with the certain host extremely.

My question is that might I miss something or alert-manager do not support for this?

Thanks a ton!

@mxinden
Copy link
Member

mxinden commented Aug 21, 2017

@regardfs I do not understand your question. In case this is a usage question, please reopen it in https://groups.google.com/forum/#!forum/prometheus-users. In case you think you found a bug in Alertmanager or if you want to report a missing feature, please add more details to your question, e.g.:

  • Where are the above blobs copied from?
  • What is your Prometheus and Alertmanager config?

@regardfs
Copy link
Author

@mxinden , Hi I just want separately alert in slack if several fire triggered at the some time...
You could see that my blob: summary and description combine all some type alerts

  • Where are the above blobs copied from?
    I use slack to receive alert, this is the slack alert message
  • What is your Prometheus and Alertmanager config?
    prometheus.yml import rules files

rule_files:
- "alert-rules/zixin-alert.rules"
- "alert-rules/host-alert.rules"
- "alert-rules/rabbitmq-alert.rules"

alertmanager.yml

global:
resolve_timeout: 15s
route:
receiver: 'slack'
receivers:
- name: 'slack'
slack_configs:
- send_resolved: true
channel: '#alert'
api_url: 'slack-api url'
text: '{{ template "slack.myorg.text" . }}'
templates:
- '/etc/alertmanager/templates/alertText.tmpl'

@mxinden
Copy link
Member

mxinden commented Aug 21, 2017

@regardfs So if I understand you correctly you want a separate notification per Alert that was send by Prometheus, right?

Can you post a properly formatted Alertmanager.yaml?
https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet#code-and-syntax-highlighting

@regardfs
Copy link
Author

regardfs commented Aug 22, 2017

@mxinden, You got it, That is just what i want, separate notification per Alert !
Sorry for pasting the wrong format of config yaml file......

config.yaml

global:
    resolve_timeout: 15s
route:
    receiver: 'slack'
receivers:
    - name: 'slack'
      slack_configs:
          - send_resolved: true
            channel: '#alert'
            api_url: 'https://hooks.slack.com/services/OOXXOOXX'
            text: '{{ template "slack.myorg.text" . }}'
templates:
- '/etc/alertmanager/templates/alertText.tmpl'

/etc/alertmanager/templates/alertText.tmp

{{ define "slack_summary" }}
{{ range .Alerts }}{{ .Annotations.summary }}
{{ end }}
{{ end }}

{{ define "slack_description" }}
{{ range .Alerts }}{{ .Annotations.description }}
{{ end }}
{{ end }}

{{ define "slack.text" }}summary: {{ template "slack_summary" . }}description: {{ template "slack_description" . }}{{ end }}

@mxinden
Copy link
Member

mxinden commented Aug 23, 2017

@regardfs That is rather surprising as you don't have any alert grouping configured. Could you try to add group_by: [instance] to your route config?

global:
    resolve_timeout: 15s
route:
    receiver: 'slack'
    group_by: [instance]
receivers:
    - name: 'slack'
      slack_configs:
          - send_resolved: true
            channel: '#alert'
            api_url: 'https://hooks.slack.com/services/OOXXOOXX'
            text: '{{ template "slack.myorg.text" . }}'
templates:
- '/etc/alertmanager/templates/alertText.tmpl'

@regardfs
Copy link
Author

@mxinden
Thanks a ton, I will try ASAP, I will close this issue now.

hh pushed a commit to ii/alertmanager that referenced this issue Jun 18, 2018
* Add processes exporter

Signed-off-by: Pavel Kutishchev <pavel.kutishchev@olx.com>
Signed-off-by: Ben Kochie <superq@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants