Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upCPU usage skyrocketed after update from 2.3 to 2.6 #5162
Comments
This comment has been minimized.
This comment has been minimized.
|
I see that you're installation from the Debian package. Can you try the same configuration with the binary from this archive instead? |
This comment has been minimized.
This comment has been minimized.
|
I switched to
but still see a huge increase in CPU usage, after restart its about 50% instead of 5%. |
This comment has been minimized.
This comment has been minimized.
|
Could you get a CPU profile of the 2.6.0? |
This comment has been minimized.
This comment has been minimized.
|
pprof.prometheus.samples.cpu.003.pb.gz |
This comment has been minimized.
This comment has been minimized.
|
Hmm it looks like most of the load is by the remote read endpoint. |
This comment has been minimized.
This comment has been minimized.
|
Yip, what do you have hitting that? |
This comment has been minimized.
This comment has been minimized.
|
These instances a part of a thanos HA setup, so that should be the query nodes. But besides the version bump of the prometheus servers, everything else was untouched. The number of queries has not increased. Any idea at what additional metrics or profiling data I should look to find the culprit? |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
I don't see any obvious culprits in the change history. Do the remote read metrics show an increase in traffic? |
This comment has been minimized.
This comment has been minimized.
|
Ok, very strange ... since 30 minutes the GC graph has dropped - so has the CPU usage. |
This comment has been minimized.
This comment has been minimized.
|
The prometheus_api_remote_read_queries metric was not available in the prometheus 2.3 setup. Since restarting with 2.6, the value was constantly about 10 and has dropped to 0 at the same time the CPU and GC graphs dropped. |
This comment has been minimized.
This comment has been minimized.
|
Sounds like something on the Thanos side then. @bwplotka |
This comment has been minimized.
This comment has been minimized.
|
Not sure how it is on the Thanos side if Thanos version & load did not change, but Prometheus resource consumption change? So what I think changed across those versions & remote read on Prom side are those remote read limits. and concurrency limits:
Maybe that affected performance? For example you constantly hitting one of the limit (because you have it super low like 10 by default). Or just raw performance with those checks? I would increase this limit to at least 100 as Thanos makes remote read your main read path from Prometheus and see how that improves things? |
This comment has been minimized.
This comment has been minimized.
|
I increased the read-concurrent-limit and will have an eye on the servers if the CPU load will stay low or rise again. |
This comment has been minimized.
This comment has been minimized.
|
The load is high again and it's coming from the thanos sidecar. I'll try to dig deeper in why its periodically happening. |
This comment has been minimized.
This comment has been minimized.
|
Ok, I think this can be closed. After digging backwards through the complete metric stack, it seems to be triggered by a query over a large time window with 10s refresh rate, which was incidentally started about the time the update occured. Nevertheless, I did a few tests in our staging environment and can reproducible see higher load on prometheus when comparing 2.3 to 2.6. When displaying a node exporter dashboard with 10s refresh, I have around 28% CPU usage using prometheus 2.3, after updating to 2.6 I end up at nearly 40%. |
This comment has been minimized.
This comment has been minimized.
|
Closing it then, feel free to re-open if you need to. Can you compare memory usage too? It could be that the garbage collection is more aggressive given that it isn't the same Go runtime version between 2.3 and 2.6. |
simonpasquier
closed this
Feb 1, 2019
This comment has been minimized.
This comment has been minimized.
|
Memory usage was also higher after switching to 2.6. What I also see on the node dashboard is much more network traffic after the update while that dashboard querys are running . I will try to look into that. |

rsommer commentedJan 31, 2019
Bug Report
Today I updated from prometheus 2.3 to prometheus 2.6. After installation, cpu usage skyrocketed on both nodes (ha-setup).
What did you expect to see?
Nearly the same CPU usage as with 2.3
What did you see instead? Under which circumstances?
CPU usage increased from 5% to 70%
Environment
System information:
Linux 4.9.0-8-amd64 x86_64
Prometheus version:
prometheus, version 2.6.0+ds (branch: debian/sid, revision: 2.6.0+ds-1)
build user: pkg-go-maintainers@lists.alioth.debian.org
build date: 20181219-15:52:20
go version: go1.10.4
Prometheus configuration file:
There are currently 642 targets configured via file_sd.
