Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upHigh CPU Usage with Recording Rules and ZK Patch #2481
Comments
This comment has been minimized.
This comment has been minimized.
|
Testing with this a bit more, this condition happens not with the addition of recording rules, but when I SIGHUP Prometheus to reload its rules. |
This comment has been minimized.
This comment has been minimized.
|
Where you even doing a SIGHUP for the Prometheus 1.2.1 setup? If I am not mistaken it was not even working there. At least for us it always ended up in crashes when combining ZK service discovery and reloads. In addition, happened this after a single reload or only after multiple ones? Maybe we are missing a necessary cleanup of old watcher or something like this. |
This comment has been minimized.
This comment has been minimized.
|
Yeah, later resting reveals that this affects 1.2.1 as well. Maybe not as badly, as reloads are not uncommon. But yeah, there's some cleanup that's not working here. |
This comment has been minimized.
This comment has been minimized.
|
I am able to reproduce the issue with Prometheus 1.6.0. Observations after a SIGHUP:
Edit: After a couple of further reloads, the server crashed with:
|
StephanErb
referenced this issue
Apr 30, 2017
Merged
Fix reload of ZooKeeper service discovery config #2669
This comment has been minimized.
This comment has been minimized.
juliusv
closed this
in
#2669
May 2, 2017
This comment has been minimized.
This comment has been minimized.
lock
bot
commented
Mar 23, 2019
|
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
jjneely commentedMar 7, 2017
What did you do?
I grabbed Prometheus 1.5.2 and applied the ZK patch from PR #2470, built and tested it. Next I had a team use the new version who monitors a pretty healthy list of Aurora (ZooKeeper discovery) jobs, and healthy usage of recording rules. We saw an immediate jump in CPU compared to the prior Prometheus version (1.2.1). It started taking all CPU resources we granted to the Aurora job running it.
What did you expect to see?
As the configuration was unchanged, we expected to see similar CPU usage from prior versions.
What did you see instead? Under which circumstances?
CPU usage appears normal when I attempted to reproduce with just the
prometheus.ymlconfig file copied over to my test environment. CPU usage was under 2 cores. However, when I added the recording rules to my test environment the CPU usage spiked up to 17 cores on my test hardware. This bare metal server now runs at a load of 17+.I got out the profiling tools to take a look and produced the following SVG:
Why would recording rules cause more time to be spent in ZookeeperDiscovery?
Environment
My test hardware is Ubuntu Trusty running kernel 3.13.0-85-generic. 64G RAM and 24 cores.
Prometheus was compiled with Go 1.7.4.
System information:
Linux 3.13.0-85-generic x86_64
Prometheus version:
prometheus, version 1.5.2 (branch: foo, revision: 1.5.2-1+JENKINS~trusty.20170306212211)
build user: jjneely@42lines.net
build date: 2017-03-06T21:22:40Z
go version: go1.7.4
Prometheus configuration file:
prometheus-test.txt
prometheus-test.rules.txt