Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upLog successful target manager shutdown #2715
Comments
This comment has been minimized.
This comment has been minimized.
|
Were there more log lines after that line? Looking at the last here, it was probably just the checkpointing which took so long. |
This comment has been minimized.
This comment has been minimized.
|
I think the "Done checkpointing" line was the last line logged before the container was terminated. |
This comment has been minimized.
This comment has been minimized.
|
How big is the given Prometheus instance (targets, samples/sec, #series)? As @grobie said, that's most likely the checkpoint, which may take long for large instances. The target manager just waits for pending scrapes on shutdown, which can't really take longer than the maximum scrape timeout. |
This comment has been minimized.
This comment has been minimized.
|
Shouldn't we cancel all running scrapes?
…On Sat, May 13, 2017, 17:00 Fabian Reinartz ***@***.***> wrote:
How big is the given Prometheus instance (targets, samples/sec, #series)?
As @grobie <https://github.com/grobie> said, that's most likely the
checkpoint, which may take long for large instances. The target manager
just waits for pending scrapes on shutdown, which can't really take longer
than the maximum scrape timeout.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2715 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAANaIkLAyaOW5W7uyVpkJHjtV28uEIjks5r5cV_gaJpZM4NZp4I>
.
|
This comment has been minimized.
This comment has been minimized.
|
@grobie ah yes, that wasn't worded well. We do cancel in flight scrapes, but as we don't want partial inserts of scraped data, we wait for all samples currently written to the storage to be appended. Prometheus 2.0's storage has transactions and we could just rollback there. But practically writes are fast enough in either version for it to not matter. |
This comment has been minimized.
This comment has been minimized.
|
In difference to (almost) all other managers and handlers, the target manager doesn't log once it was successfully stopped. It should print a This would have prevented my request to open this issue. |
grobie
changed the title
Shutdown took over 20 minutes, 13 minutes to stop target manager
Log successful target manager shutdown
May 13, 2017
This comment has been minimized.
This comment has been minimized.
|
@fabxc This instance has 204 targets, about 6k metrics/seconds, and local storage metrics series is 198,000. The instance was overloaded when I was shutting it down, due to a target accidentally exposing over 100,000 metrics. I see I misread the log output, and it wasn't the target manager stopping that took so long. |
gouthamve
added a commit
to gouthamve/prometheus
that referenced
this issue
Jul 6, 2017
gouthamve
added a commit
to gouthamve/prometheus
that referenced
this issue
Jul 6, 2017
fabxc
closed this
in
#2907
Jul 6, 2017
This comment has been minimized.
This comment has been minimized.
lock
bot
commented
Mar 23, 2019
|
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
svend commentedMay 12, 2017
What did you do?
Shut down prometheus.
What did you expect to see?
A timely shutdown.
What did you see instead? Under which circumstances?
The shutdown took longer than 20 minutes, which is our pods shut down grace period in Kubernetes, and was killed. The logs showed 13 minutes for "Stopping target manager...".
˘
Environment
Kubernetes 1.5.6.
prometheus.yaml
Shutdown log: