Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upkube SD errors leading to loss of connectivity to alertmanager #5345
Comments
This comment has been minimized.
This comment has been minimized.
|
Could you try to reproduce with |
This comment has been minimized.
This comment has been minimized.
|
Have been trying to reproduce this for some time, but was not able to. I suspect there may have been an issue with the cluster itself. Closing this, since I cannot continue leaving debug logs enabled and am no longer actively investigating |
vsliouniaev
closed this
Apr 18, 2019
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
vsliouniaev commentedMar 12, 2019
Bug Report
What did you do?
Ddeployed prometheus and alertmanager on kubernetes.
Deleted the alertmanager pods, which were re-created.
What did you expect to see?
Prometheus should re-connect to the new alertmanager pods through service-discovery and send alerts to the new instances
What did you see instead? Under which circumstances?
Prometheus fails to discover new pods and continues trying to send alerts to the old alertmanager instances.
I have tried repeating deleting the pods a few times and the behaviour does not reoccur.
Some googling leads me to an investigation done for projectcalico/typha#227 that appears to be quite similar
Environment
docker
docker
prometheus:v2.7.1docker
alertmanager:v0.16.1Prometheus continuously repeats the message about sending alert, but will sometimes also log about the watch failure. This state doesn't get recovered from and requires prometheus to be restarted.