Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upPrometheus hangs after interrupt #214
Comments
ghost
assigned
matttproud
May 3, 2013
ghost
assigned
bernerdschaefer
May 23, 2013
This comment has been minimized.
This comment has been minimized.
|
I think both this and #31 are effectively fixed now, though the Gist you proposed to me in-person a few weeks back about lifecycle management would probably fix this. Are you OK if I assign this to you since you seemed to have made good headway on the design? |
This comment has been minimized.
This comment has been minimized.
typingduck
commented
Mar 7, 2014
|
+1 Finding that I have to manually kill prometheus these days as runit cannot do so. |
This comment has been minimized.
This comment has been minimized.
|
@typingduck the shutdown has actually been working reliably for a long while now (I fixed it), so I'll close this ticket. What you're seeing with runit is probably just that it times out as Prometheus is still flushing its in-memory state to disk. If you tail the log, you'll see that at some point, it says, "Done flushing" and then exits. Maybe there's a way to increase runit timeouts, but that's not a Prometheus bug then. |
juliusv
closed this
Mar 7, 2014
This comment has been minimized.
This comment has been minimized.
typingduck
commented
Mar 7, 2014
|
Do you have a estimate on how long to timeout? I have increased the runit timeout to 20 seconds but didn't work. |
This comment has been minimized.
This comment has been minimized.
|
@typingduck You can look at your Prometheus' log file to see the time elapsed between the lines "Flushing samples to disk..." and "Done flushing.". Seems like it takes approximately 2 minutes in your case, so you might want to set the timeout to 5 or something. |
This comment has been minimized.
This comment has been minimized.
typingduck
commented
Mar 7, 2014
|
tks. |
This comment has been minimized.
This comment has been minimized.
lock
bot
commented
Mar 25, 2019
|
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
bernerdschaefer commentedMay 3, 2013
Here's a transcript with a goroutine dump: https://gist.github.com/bernerdschaefer/5508760
It appears to be a problem with the ordering of calls to levigo -- see this gist which shows that the following two cases produce indefinitely blocked goroutines:
The way we capture interrupts from main.go, if the graceful shutdown is blocked by something like this, sending further interrupts are simply ignored, and the process needs to be forcibly killed in some way.
Maybe this is related to #31?