Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upHigh Resource Utilisation on Prometheus #4740
Comments
This comment has been minimized.
This comment has been minimized.
spirrello
commented
Oct 15, 2018
|
Are you doing any top n dashboards? I was previously and this was tanking our Prometheus instance causing it to hit the memory limits. |
This comment has been minimized.
This comment has been minimized.
|
Thanks @spirrello for the response. i think you are referring to topk, please correct me if i am wrong. And, yes we are using But, still it is used only in 3 queries and should not effect that much in my opinion. Any other thoughts that you may seem effective? |
This comment has been minimized.
This comment has been minimized.
spirrello
commented
Oct 16, 2018
|
Yes topk can drive memory utilization heavily and it tapered off once I killed those queries. That's the only thing I've seen drive heavy memory utilization in my environment. |
This comment has been minimized.
This comment has been minimized.
|
From your dashboards, your Prometheus holds almost 2M series in the head. A rough estimate is to consider that every series needs 8kB of memory. In your case it would be 16GB. You need to add some room for handling queries too. As for how to deal with the situation, you would need to get a machine with more RAM, increase your sampling interval or reduce the number of collected metrics. |
This comment has been minimized.
This comment has been minimized.
|
Have removed the panels using @simonpasquier Thanks for the response, will run the profiler and share the results. |
simonpasquier
added
the
kind/more-info-needed
label
Oct 18, 2018
This comment has been minimized.
This comment has been minimized.
|
@vaibhavkhurana2018 any news? |
This comment has been minimized.
This comment has been minimized.
|
@simonpasquier Apologies for the late revert here. Didn't get anything specific with the profiler as the app was going down every time the profiler was run. One observation, have seen CPU spike up whenever a query is performed from the grafana dashboard, which results in increase of memory utilisation and finally the app being unresponsive. Anyways, have segregated the prom app for multiple applications as a resort for the same. Wanted to know that what is an ideal system spec that is recommended to have for this kind of infrastructure as from my point of view i will again be facing the same issue in the coming future. |
This comment has been minimized.
This comment has been minimized.
|
Unfortunately there's no general formula to assess the amount of memory that will be used by Prometheus as it depends on too many factors. When you reach the limits of a single process, the right thing to do is indeed splitting targets across multiple Prometheus servers. Note that some TSDB improvements are in development which should eventually reduce a bit the memory usage (especially when compacting data). I'm closing the issue for now. Feel free to reopen if it happens again. |
vaibhavkhurana2018 commentedOct 15, 2018
•
edited
Proposal
Use case. Why is this important?
The prometheus app is crashing at regular intervals.
Bug Report
What did you do?
I'm running prometheus inside kubernetes cluster, but the prometheus-app keeps on crashing as the utilisation exceeds the node resources.
prometheus deployment config:
What did you see instead? Under which circumstances?
The app is crashing at regular intervals. From the grafana dashboard, i can see that the memory and CPU utilisation exceeds to the poing that node goes off OOM.
NOTE: There is only prometheus pod running on the node.
Environment
System information:
v1.8.1+coreos.0
Its an EC2 instance(r4.xlarge), with 4vCPU, and 30.5 GiB RAM.
Prometheus version:
v4.2.2, docker image: prom/prometheus:v4.2.2
Prometheus configuration file:
Kindly suggest any solution to limit the utilization/ If i am doing anything wrong in the setup.