Notification lag between Firing and Resolved. #1306

smd1000 · 2018-03-29T21:53:14Z

What did you do?
Manually fired an alert using the following script.

#!/bin/bash

# Generates an alert manually.

name=$RANDOM
url='localhost:9093/api/v1/alerts'

echo "firing up alert $name" 

# change url o
curl -XPOST $url -d "[{ 
	\"status\": \"firing\",
	\"labels\": {
		\"alertname\": \"$name\",
		\"service\": \"my-service\",
		\"severity\":\"warning\",
		\"instance\": \"$name.example.net\"
	},
	\"annotations\": {
		\"summary\": \"High latency is high!\"
	},
	\"generatorURL\": \"http://prometheus.int.example.net/<generating_expression>\"
}]"

echo ""

echo "press enter to resolve alert"
read

echo "sending resolve"
curl -XPOST $url -d "[{ 
	\"status\": \"resolved\",
	\"labels\": {
		\"alertname\": \"$name\",
		\"service\": \"my-service\",
		\"severity\":\"warning\",
		\"instance\": \"$name.example.net\"
	},
	\"annotations\": {
		\"summary\": \"High latency is high!\"
	},
	\"generatorURL\": \"http://prometheus.int.example.net/<generating_expression>\"
}]"

echo ""

What did you expect to see?
After hitting enter the alert should be resolved.

What did you see instead? Under which circumstances?
Running the script, the alert fires almost immediately. I see this event in amtool and in a slack integration.

Instead, after resolving the alert there is a 5min lag (always) until alertmanager updates with the resolved status. amtool shows the alert as still open for 5 min. and I don't receive a resolved message in slack until after 5min.

Environment

System information:

Linux 4.9.58-18.51.amzn1.x86_64 x86_64

Alertmanager version:

alertmanager, version 0.14.0 (branch: HEAD, revision: 30af4d051b37ce817ea7e35b56c57a0e2ec9dbb0)
  build user:       root@37b6a49ebba9
  build date:       20180213-08:16:42
  go version:       go1.9.2

Prometheus version:

prometheus, version 2.0.0 (branch: HEAD, revision: 0a74f98628a0463dddc90528220c94de5032d1a0)
  build user:       root@615b82cb36b6
  build date:       20171108-07:11:59
  go version:       go1.9.2

Alertmanager configuration file:

global:
  # The smarthost and SMTP sender used for mail notifications.
  smtp_smarthost: '<redacted>'
  smtp_from: '<redacted>'
  smtp_auth_username: '<redacted>'
  smtp_auth_password: '<redacted>'

  slack_api_url: 'https://hooks.slack.com/services/<redacted>'

# The root route on which each incoming alert enters.
route:
  # The labels by which incoming alerts are grouped together. For example,
  # multiple alerts coming in for cluster=A and alertname=LatencyHigh would
  # be batched into a single group.
  group_by: ['alertname', 'cluster', 'service']

  # When a new group of alerts is created by an incoming alert, wait at
  # least 'group_wait' to send the initial notification.
  # This way ensures that you get multiple alerts for the same group that start
  # firing shortly after another are batched together on the first
  # notification.
  group_wait: 15s

  # When the first notification was sent, wait 'group_interval' to send a batch
  # of new alerts that started firing for that group.
  group_interval: 30s

  # If an alert has successfully been sent, wait 'repeat_interval' to
  # resend them.
  repeat_interval: 3h

  # A default receiver
  receiver: 'v_prod_alerts' #fallback

  # All the above attributes are inherited by all child routes and can
  # overwritten on each.
  #
  # Critical alert flow
  routes:
    - match:
        severity: 'critical'
      continue: true
      receiver: 'opsgenie'
    - match:
        severity: 'critical'
      continue: true
      receiver: 'v_prod_alerts'
    - match:
        severity: 'critical'
      receiver: 'devops-mail'

# Inhibition rules allow to mute a set of alerts given that another alert is
# firing.
# We use this to mute any warning-level notifications if the same alert is
# already critical.
inhibit_rules:
- source_match:
    severity: 'critical'
  target_match:
    severity: 'warning'
  # Apply inhibition if the alertname is the same.
  equal: ['alertname', 'cluster', 'service']


receivers:
- name: 'devops-mail'
  email_configs:
  - to: '<redacted>'

- name: 'devops-pager'
  email_configs:
  - to: '<redacted>'

- name: 'v_prod_alerts'
  slack_configs:
  - channel: 'v_prod_alerts'
    send_resolved: true

- name: 'opsgenie'
  opsgenie_configs:
 - api_key: '<redacted>'
    teams: 'devops_team'

The text was updated successfully, but these errors were encountered:

simonpasquier · 2018-03-30T08:05:12Z

The second curl command doesn't send a resolved alert, see https://prometheus.io/docs/alerting/clients/ for a description of the protocol.

Also it makes more sense to ask questions like this on the prometheus-users mailing list rather than in a GitHub issue. On the mailing list, more people are available to potentially respond to your question, and the whole community can benefit from the answers provided.

* OpenBSD rc.d script Signed-off-by: Shawn Craver <craversp@gmail.com>

stuartnelson3 closed this as completed Apr 3, 2018

hh pushed a commit to ii/alertmanager that referenced this issue Apr 11, 2019

OpenBSD rc.d script (prometheus#1306)

b8b0195

* OpenBSD rc.d script Signed-off-by: Shawn Craver <craversp@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Notification lag between Firing and Resolved. #1306

Notification lag between Firing and Resolved. #1306

smd1000 commented Mar 29, 2018

simonpasquier commented Mar 30, 2018

Notification lag between Firing and Resolved. #1306

Notification lag between Firing and Resolved. #1306

Comments

smd1000 commented Mar 29, 2018

simonpasquier commented Mar 30, 2018