Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Notification lag between Firing and Resolved. #1306

Closed
smd1000 opened this issue Mar 29, 2018 · 1 comment
Closed

Notification lag between Firing and Resolved. #1306

smd1000 opened this issue Mar 29, 2018 · 1 comment

Comments

@smd1000
Copy link

smd1000 commented Mar 29, 2018

What did you do?
Manually fired an alert using the following script.

#!/bin/bash

# Generates an alert manually.

name=$RANDOM
url='localhost:9093/api/v1/alerts'

echo "firing up alert $name" 

# change url o
curl -XPOST $url -d "[{ 
	\"status\": \"firing\",
	\"labels\": {
		\"alertname\": \"$name\",
		\"service\": \"my-service\",
		\"severity\":\"warning\",
		\"instance\": \"$name.example.net\"
	},
	\"annotations\": {
		\"summary\": \"High latency is high!\"
	},
	\"generatorURL\": \"http://prometheus.int.example.net/<generating_expression>\"
}]"

echo ""

echo "press enter to resolve alert"
read

echo "sending resolve"
curl -XPOST $url -d "[{ 
	\"status\": \"resolved\",
	\"labels\": {
		\"alertname\": \"$name\",
		\"service\": \"my-service\",
		\"severity\":\"warning\",
		\"instance\": \"$name.example.net\"
	},
	\"annotations\": {
		\"summary\": \"High latency is high!\"
	},
	\"generatorURL\": \"http://prometheus.int.example.net/<generating_expression>\"
}]"

echo ""

What did you expect to see?
After hitting enter the alert should be resolved.

What did you see instead? Under which circumstances?
Running the script, the alert fires almost immediately. I see this event in amtool and in a slack integration.

Instead, after resolving the alert there is a 5min lag (always) until alertmanager updates with the resolved status. amtool shows the alert as still open for 5 min. and I don't receive a resolved message in slack until after 5min.

Environment

  • System information:
Linux 4.9.58-18.51.amzn1.x86_64 x86_64
  • Alertmanager version:
alertmanager, version 0.14.0 (branch: HEAD, revision: 30af4d051b37ce817ea7e35b56c57a0e2ec9dbb0)
  build user:       root@37b6a49ebba9
  build date:       20180213-08:16:42
  go version:       go1.9.2
  • Prometheus version:
prometheus, version 2.0.0 (branch: HEAD, revision: 0a74f98628a0463dddc90528220c94de5032d1a0)
  build user:       root@615b82cb36b6
  build date:       20171108-07:11:59
  go version:       go1.9.2
  • Alertmanager configuration file:
global:
  # The smarthost and SMTP sender used for mail notifications.
  smtp_smarthost: '<redacted>'
  smtp_from: '<redacted>'
  smtp_auth_username: '<redacted>'
  smtp_auth_password: '<redacted>'

  slack_api_url: 'https://hooks.slack.com/services/<redacted>'

# The root route on which each incoming alert enters.
route:
  # The labels by which incoming alerts are grouped together. For example,
  # multiple alerts coming in for cluster=A and alertname=LatencyHigh would
  # be batched into a single group.
  group_by: ['alertname', 'cluster', 'service']

  # When a new group of alerts is created by an incoming alert, wait at
  # least 'group_wait' to send the initial notification.
  # This way ensures that you get multiple alerts for the same group that start
  # firing shortly after another are batched together on the first
  # notification.
  group_wait: 15s

  # When the first notification was sent, wait 'group_interval' to send a batch
  # of new alerts that started firing for that group.
  group_interval: 30s

  # If an alert has successfully been sent, wait 'repeat_interval' to
  # resend them.
  repeat_interval: 3h

  # A default receiver
  receiver: 'v_prod_alerts' #fallback

  # All the above attributes are inherited by all child routes and can
  # overwritten on each.
  #
  # Critical alert flow
  routes:
    - match:
        severity: 'critical'
      continue: true
      receiver: 'opsgenie'
    - match:
        severity: 'critical'
      continue: true
      receiver: 'v_prod_alerts'
    - match:
        severity: 'critical'
      receiver: 'devops-mail'

# Inhibition rules allow to mute a set of alerts given that another alert is
# firing.
# We use this to mute any warning-level notifications if the same alert is
# already critical.
inhibit_rules:
- source_match:
    severity: 'critical'
  target_match:
    severity: 'warning'
  # Apply inhibition if the alertname is the same.
  equal: ['alertname', 'cluster', 'service']


receivers:
- name: 'devops-mail'
  email_configs:
  - to: '<redacted>'

- name: 'devops-pager'
  email_configs:
  - to: '<redacted>'

- name: 'v_prod_alerts'
  slack_configs:
  - channel: 'v_prod_alerts'
    send_resolved: true

- name: 'opsgenie'
  opsgenie_configs:
 - api_key: '<redacted>'
    teams: 'devops_team'

@simonpasquier
Copy link
Member

The second curl command doesn't send a resolved alert, see https://prometheus.io/docs/alerting/clients/ for a description of the protocol.

Also it makes more sense to ask questions like this on the prometheus-users mailing list rather than in a GitHub issue. On the mailing list, more people are available to potentially respond to your question, and the whole community can benefit from the answers provided.

hh pushed a commit to ii/alertmanager that referenced this issue Apr 11, 2019
* OpenBSD rc.d script

Signed-off-by: Shawn Craver <craversp@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants