-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Description
What did you do?
I have a Docker Swarm Environment with Prometheus and Alertmanager containers along with a node Webhook endpoint , For Robustness testing of the setup, I did the following steps.
- I created a alarm in Prometheus and the alarm is generated no problems there.
- Later I scaled the Alertmanager container to 0 (docker service scale alertmanager=0) to simulate that the alertmanager is down. I started getting post errors in the Prometheus logs which is expected as my repeat interval is 2 minutes and there is no Alertmanager to send the post to.
- Then I sent the the resolved metrics to the Prometheus resulting in the alert being resolved.
- Within 15 minutes of the alert being resolved I again scaled the Alertmanager to 1 instance (docker service scale alertmanager=1) as I know that Prometheus keeps the resolved alert for 15 minutes
What did you expect to see?
As soon as the alert manager is scaled back up, it receives the resolved alert from Prometheus(seen in logs of alertmanager) and it sends the resolved alert payload to the webhook.
What did you see instead?
As soon as the alert manager is scaled back up, it receives the resolved alert from prometheus but dosent send the resolved alert to webhook.(It is seen in the alertmanager debug logs that it is indeed getting the resolved alert from Prometheus)
Environment
Docker Swarm
-
System information:
Linux 5.4.0-72-generic x86_64 -
Alertmanager version:
alertmanager, version 0.21.0 (branch: HEAD, revision: 4c6c03e)
build user: root@dee35927357f
build date: 20200617-08:54:02
go version: go1.14.4 -
Prometheus version:
prometheus, version 2.24.1 (branch: HEAD, revision: e4487274853c587717006eeda8804e597d120340)
build user: root@0b5231a0de0f
build date: 20210120-00:09:36
go version: go1.15.6
platform: linux/amd64 -
Alertmanager configuration file:
global:route:
group_by: ['asset_id']
group_wait: 5s
group_interval: 5s
repeat_interval: 2m
receiver: nodereceivers:
- name: 'node'
webhook_configs:
- url: 'http://alarming:7070/v1/alarms/promwebhook'
send_resolved: true -
Logs:
Alertmanaeger logs after scaling it back up
root@beta:/opt/fms/fms-deployment# docker logs -f fcda7f206792
Starting Alert Manager...
level=info ts=2021-05-05T11:32:37.212Z caller=main.go:216 msg="Starting Alertmanager" version="(version=0.21.0, branch=HEAD, revision=4c6c03ebfe21009c546e4d1e9b92c371d67c021d)"
level=info ts=2021-05-05T11:32:37.212Z caller=main.go:217 build_context="(go=go1.14.4, user=root@dee35927357f, date=20200617-08:54:02)"
level=debug ts=2021-05-05T11:32:37.225Z caller=cluster.go:149 component=cluster msg="resolved peers to following addresses" peers=
level=info ts=2021-05-05T11:32:37.228Z caller=cluster.go:161 component=cluster msg="setting advertise address explicitly" addr=172.18.0.16 port=9094
level=debug ts=2021-05-05T11:32:37.230Z caller=delegate.go:230 component=cluster received=NotifyJoin node=01F4Y4T77C3B78PJBF2KPMT5E9 addr=172.18.0.16:9094
level=debug ts=2021-05-05T11:32:37.230Z caller=cluster.go:233 component=cluster msg="joined cluster" peers=0
level=info ts=2021-05-05T11:32:37.290Z caller=cluster.go:623 component=cluster msg="Waiting for gossip to settle..." interval=2s
level=debug ts=2021-05-05T11:32:37.330Z caller=main.go:355 externalURL=http://fcda7f206792:9093
level=info ts=2021-05-05T11:32:37.330Z caller=coordinator.go:119 component=configuration msg="Loading configuration file" file=/etc/alertmanager/alertmanager.yml
level=info ts=2021-05-05T11:32:37.331Z caller=coordinator.go:131 component=configuration msg="Completed loading of configuration file" file=/etc/alertmanager/alertmanager.yml
level=debug ts=2021-05-05T11:32:37.341Z caller=main.go:465 routePrefix=/
level=info ts=2021-05-05T11:32:37.342Z caller=main.go:485 msg=Listening address=:9093
level=info ts=2021-05-05T11:32:39.294Z caller=cluster.go:648 component=cluster msg="gossip not settled" polls=0 before=0 now=1 elapsed=2.000093739s
level=debug ts=2021-05-05T11:32:41.294Z caller=cluster.go:645 component=cluster msg="gossip looks settled" elapsed=4.000263439s
level=debug ts=2021-05-05T11:32:43.294Z caller=cluster.go:645 component=cluster msg="gossip looks settled" elapsed=6.000679981s
level=debug ts=2021-05-05T11:32:45.295Z caller=cluster.go:645 component=cluster msg="gossip looks settled" elapsed=8.00111269s
level=info ts=2021-05-05T11:32:47.295Z caller=cluster.go:640 component=cluster msg="gossip settled; proceeding" elapsed=10.001293499s
level=debug ts=2021-05-05T11:33:22.857Z caller=dispatch.go:138 component=dispatcher msg="Received alert" alert=olm_minor_alert[8238848][resolved]
level=debug ts=2021-05-05T11:33:22.866Z caller=dispatch.go:473 component=dispatcher aggrGroup="{}:{asset_id="1510"}" msg=flushing alerts=[olm_minor_alert[8238848][resolved]]
level=debug ts=2021-05-05T11:34:52.818Z caller=dispatch.go:138 component=dispatcher msg="Received alert" alert=olm_minor_alert[8238848][resolved]
level=debug ts=2021-05-05T11:34:52.820Z caller=dispatch.go:473 component=dispatcher aggrGroup="{}:{asset_id="1510"}" msg=flushing alerts=[olm_minor_alert[8238848][resolved]]
level=debug ts=2021-05-05T11:36:22.817Z caller=dispatch.go:138 component=dispatcher msg="Received alert" alert=olm_minor_alert[8238848][resolved]
level=debug ts=2021-05-05T11:36:22.818Z caller=dispatch.go:473 component=dispatcher aggrGroup="{}:{asset_id="1510"}" msg=flushing alerts=[olm_minor_alert[8238848][resolved]]
level=debug ts=2021-05-05T11:37:52.818Z caller=dispatch.go:138 component=dispatcher msg="Received alert" alert=olm_minor_alert[8238848][resolved]
level=debug ts=2021-05-05T11:37:52.818Z caller=dispatch.go:473 component=dispatcher aggrGroup="{}:{asset_id="1510"}" msg=flushing alerts=[olm_minor_alert[8238848][resolved]]
level=debug ts=2021-05-05T11:39:22.821Z caller=dispatch.go:138 component=dispatcher msg="Received alert" alert=olm_minor_alert[8238848][resolved]
level=debug ts=2021-05-05T11:39:22.821Z caller=dispatch.go:473 component=dispatcher aggrGroup="{}:{asset_id="1510"}" msg=flushing alerts=[olm_minor_alert[8238848][resolved]]
level=debug ts=2021-05-05T11:40:52.814Z caller=dispatch.go:138 component=dispatcher msg="Received alert" alert=olm_minor_alert[8238848][resolved]
level=debug ts=2021-05-05T11:40:52.815Z caller=dispatch.go:473 component=dispatcher aggrGroup="{}:{asset_id="1510"}" msg=flushing alerts=[olm_minor_alert[8238848][resolved]]
level=debug ts=2021-05-05T11:42:22.814Z caller=dispatch.go:138 component=dispatcher msg="Received alert" alert=olm_minor_alert[8238848][resolved]
level=debug ts=2021-05-05T11:42:22.815Z caller=dispatch.go:473 component=dispatcher aggrGroup="{}:{asset_id="1510"}" msg=flushing alerts=[olm_minor_alert[8238848][resolved]]
level=debug ts=2021-05-05T11:43:52.817Z caller=dispatch.go:138 component=dispatcher msg="Received alert" alert=olm_minor_alert[8238848][resolved]
level=debug ts=2021-05-05T11:43:52.818Z caller=dispatch.go:473 component=dispatcher aggrGroup="{}:{asset_id="1510"}" msg=flushing alerts=[olm_minor_alert[8238848][resolved]]
level=debug ts=2021-05-05T11:45:22.814Z caller=dispatch.go:138 component=dispatcher msg="Received alert" alert=olm_minor_alert[8238848][resolved]
level=debug ts=2021-05-05T11:45:22.818Z caller=dispatch.go:473 component=dispatcher aggrGroup="{}:{asset_id="1510"}" msg=flushing alerts=[olm_minor_alert[8238848][resolved]]
level=debug ts=2021-05-05T11:46:52.816Z caller=dispatch.go:138 component=dispatcher msg="Received alert" alert=olm_minor_alert[8238848][resolved]
level=debug ts=2021-05-05T11:46:52.817Z caller=dispatch.go:473 component=dispatcher aggrGroup="{}:{asset_id="1510"}" msg=flushing alerts=[olm_minor_alert[8238848][resolved]]
level=debug ts=2021-05-05T11:47:37.290Z caller=silence.go:350 component=silences msg="Running maintenance"
level=debug ts=2021-05-05T11:47:37.291Z caller=nflog.go:336 component=nflog msg="Running maintenance"
level=debug ts=2021-05-05T11:47:37.292Z caller=silence.go:352 component=silences msg="Maintenance done" duration=1.700643ms size=0
level=debug ts=2021-05-05T11:47:37.293Z caller=nflog.go:338 component=nflog msg="Maintenance done" duration=1.436057ms size=0
Metadata
Metadata
Assignees
Labels
Type
Projects
Status