Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upalerts not being fired to any receiver #1681
Comments
This comment has been minimized.
This comment has been minimized.
|
So the Alertmanager is out of filedescriptors. It'd be interesting to find out what the FDs are used for:
Could you run and gist (with sensitive info removed) the following commands on your AM host:
|
This comment has been minimized.
This comment has been minimized.
|
thanks for the swift response! also, AM and prometheus run on the same box, if that makes any difference. |
This comment has been minimized.
This comment has been minimized.
|
Interesting, I'm wondering about all those open sockets that netstat doesn't show. Can you try |
This comment has been minimized.
This comment has been minimized.
|
they seem to be on CLOSE_WAIT |
This comment has been minimized.
This comment has been minimized.
|
Those are only 13 connections though. The issue is with the many other sockets that lsof shows, but for which it says "can't identify protocol". |
This comment has been minimized.
This comment has been minimized.
|
no, they're still there, but not shown on netstat |
This comment has been minimized.
This comment has been minimized.
|
That's strange. http://serverfault.com/questions/153983/sockets-found-by-lsof-but-not-by-netstat suggests it could be half-open connections, but I'm not sure where they could come from in Alertmanager. Other things that could be interesting:
Also, if you restart Alertmanager, how fast do those many sockets reappear? Gradually over time or all at once? |
This comment has been minimized.
This comment has been minimized.
|
grouting dump: strace files: after restarting AM all of those sockets return as soon as prometheus tries to "fire" active alerts |
This comment has been minimized.
This comment has been minimized.
|
also just noticed some |
This comment has been minimized.
This comment has been minimized.
|
it seems the issue was a corrupt AM db, i removed it and restarted the service and everything is back to normal. |
This comment has been minimized.
This comment has been minimized.
|
Thanks for keeping us updated. Closing this as DB issues are covered elsewhere. |
fabxc
closed this
Jun 1, 2016
This comment has been minimized.
This comment has been minimized.
lock
bot
commented
Mar 24, 2019
|
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
Dnile commentedMay 27, 2016
•
edited
ERRO[2942] Error sending 7 alerts: context deadline exceeded source=notifier.go:188appears in the logs whenever alerts should be fired.`
prometheus version:
prometheus, version 0.19.1 (branch: master, revision: 500a494)
build user: root@dfc6307dc40d
build date: 20160526-01:42:25
go version: go1.6.2
alertmanager version:
alertmanager, version 0.1.1 (branch: release-0.1, revision: 0e541bf)
build user: root@8c44a0677215
build date: 20160323-10:10:18
go version: go1.5.3
logs:
https://gist.github.com/Dnile/e299f7ca20c8f77aa4d0c92a2158c8a2