Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upWhile ingestion is suspended, neither the status page is accessible, nor is a clean shutdown possible. #1319
Comments
grobie
added
the
bug
label
Jan 15, 2016
This comment has been minimized.
This comment has been minimized.
|
And the server would not even shutdown properly. It seems to me that some parts of the server were completely hosed, among them either signal handling or the parts that would shut down targets and query them for the status page... Very difficult to debug. In general, I think we have to organize the throttling / suspension of sample ingestion more cleanly. |
This comment has been minimized.
This comment has been minimized.
|
I just saw the "no clean shutdown" issue again now after suspension of ingestion. Something breaks if that happens. |
beorn7
changed the title
Status page unavailable while server is in a degraded state
While ingestion is suspended, neither the status page is accessible, nor is a clean shutdown possible.
Jan 16, 2016
This comment has been minimized.
This comment has been minimized.
|
Theory is now: As long as any target has suspended the ingestion, both shutdown and status page don't work. We probably want one big red "suspension" switch that is flipped once we reach the limit of chunks to be persisted, and then flipped back once we are at 90% of that. |
This comment has been minimized.
This comment has been minimized.
|
This could go into (or needs to be coordinated with) #1064 |
This comment has been minimized.
This comment has been minimized.
|
The status page issues probably come from the target manager being stuck while holding a lock, e.g. waiting for old scrapers to terminate, which in turn are stuck because their samples aren't ingested. At which step does shutdown get stuck? Or does sending |
This comment has been minimized.
This comment has been minimized.
|
The theory is that SIGTERM triggers as usual, but the target manager can only shutdown once all targets are out of suspended ingestion (which might never happen). This theory is not completely proven, though. But I think we should have a central switch "stop ingestion" anyway, which would make many things much cleaner. It can be gated by an atomic variable to not cause too much lock contention. |
This was referenced Jan 25, 2016
beorn7
closed this
in
#1354
Feb 1, 2016
This comment has been minimized.
This comment has been minimized.
lock
bot
commented
Mar 24, 2019
|
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
grobie commentedJan 15, 2016
We just found one of our servers having problems to keep up with persistence up to the point that ingestion got suspended. The exact circumstances are still not clear and @beorn7 is investigating.
Unexpectedly (to me at least), the status page would not load during the whole time, while the query interface was still available. It seems the target manager got into an undefined state as well.