Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upPrometheus configuration reloads slowly #4301
Comments
This comment has been minimized.
This comment has been minimized.
|
Do you have rule groups taking 2.5 minutes to evaluate? |
This comment has been minimized.
This comment has been minimized.
|
I have a single rule group with 5 recording rules and 12 alerting rules. Is there a way that I could hook into Prometheus to get a gauge of how long it is taking to evaluate that rule group? |
This comment has been minimized.
This comment has been minimized.
|
If you look at the Rules Status page it'll tell you. |
This comment has been minimized.
This comment has been minimized.
|
@lyonssp probably a terrible idea, but if nothing else works you can background it. there were some ideas how to speed up the reloads, but at first glance it looked like it will cause many race conditions. |
This comment has been minimized.
This comment has been minimized.
|
@brian-brazil my rules group is reporting taking 440ms to evaluate |
This comment has been minimized.
This comment has been minimized.
|
@krasi-georgiev In a continuous deployment context, the problem is that the deployment job would appear to have succeeded in the event of a failure to reload |
This comment has been minimized.
This comment has been minimized.
|
There's something odd going on then. How are you measuring the 2.5 minutes, and can you share logs? |
This comment has been minimized.
This comment has been minimized.
|
@brian-brazil I'm not measuring strictly -- more eyeballing it. I can guarantee the time is more than 90s though, as that is my failure threshold. I'll get you some details about the process and some logs as well. |
This comment has been minimized.
This comment has been minimized.
|
First thing's first -- here are some stats on timing reloads with a local curl request:
I didn't execute those curls consecutively -- I left a bit of time between executions. It definitely highlights some inconsistency in the behavior. |
This comment has been minimized.
This comment has been minimized.
|
Here are some logs from the server where I executed the above reloads b9ed65e46f5d12625ea54a0ef356eb25baf49f2e259fdb0c9f2f9ebb8206605d-json.log |
This comment has been minimized.
This comment has been minimized.
maurorappa
commented
Jul 12, 2018
|
EC2 discovery error and relative timeout, from the logs: |
This comment has been minimized.
This comment has been minimized.
tejaswiniVadlamudi
commented
Aug 21, 2018
|
Hi, Thanks, |
This comment has been minimized.
This comment has been minimized.
|
I am working on a PR that will speed up the reloading quite a bit. The problem is that stopping scraping loops for running target is now executed in serial and I am updating the code to stop these in parallel. Will link the pr when ready. |
simonpasquier
referenced this issue
Aug 21, 2018
Open
Data loss in grafana dashboard on prometheus safe reload #4519
krasi-georgiev
referenced this issue
Aug 22, 2018
Merged
Fix for the slow updates of targets changes #4526
krasi-georgiev
closed this
in
#4526
Sep 26, 2018
This comment has been minimized.
This comment has been minimized.
lock
bot
commented
Mar 25, 2019
|
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
lyonssp commentedJun 21, 2018
•
edited
Bug Report
I'm reloading Prometheus configurations through a continuous deployment process using ansible. I'm finding that configurations take such a long time to reload that what seem like sensible timeout thresholds are consistently passed by the reload process.
What are some factors that could be causing the significant reload times?
What did you do?
curl -X POST localhost:9090/-/reloadWhat did you expect to see?
Expected Prometheus to reload my configuration in < 90s
What did you see instead? Under which circumstances?
Prometheus seems to take ~2.5 minutes to reload the configuration.
Environment
Prometheus is deployed using docker with the configuration mounted from the file system to
/etc/prometheus/prometheus.yml. I am loading a new file to the local filesystem and almost immediately triggering a reload of the prometheus configuration via lifecycle APIs.
Containers aren't rebooted frequently.
System information:
Linux 4.4.0-1057-aws x86_64
Prometheus version:
prometheus, version 2.2.1
build date: 20180314-14:15:45
go version: go1.10
No relevant logs available