Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upPrometheus takes too much resource (RAM, disk) after 1 day running in a small kubernetes cluster #2141
Comments
This comment has been minimized.
This comment has been minimized.
|
Update: Up to now, there are 2700000 series. It seems that |
This comment has been minimized.
This comment has been minimized.
|
Can you make sure that you don't have labels-value pairs that have a variable content? Adding for example a uuid for a request as a label can make the number of time-series explode as every new metric-name/label-value combination creates a new time-series. |
This comment has been minimized.
This comment has been minimized.
|
@brancz Thanks for your response. I don't think I have that kind of metric. Every metrics is scaped from etcd and kubernetes. |
This comment has been minimized.
This comment has been minimized.
|
@ntquyen The observed behavior might be completely normal. Note the following:
|
This comment has been minimized.
This comment has been minimized.
|
@beorn7 Thanks for your explanation, it's much more clear for me now! In our k8s cluster, services got re-deployed at almost every time (we have our own pod auto-scheduler: pods are up when new messages comming in and stopped when done processing.) We can't help but produce a lot of new time series, even when I drop haft of the exposed metrics. I reduced the
RAM usage is now 16Gb. It looks like prometheus is trying to store every series in memory. What I expect for |
This comment has been minimized.
This comment has been minimized.
|
Just found this issue #455 which explains moreo memory usage , we may switch to this one then. |
ntquyen
closed this
Nov 3, 2016
This comment has been minimized.
This comment has been minimized.
|
You need a lot of RAM to deal with millions of time series. There is no way around that. |
This comment has been minimized.
This comment has been minimized.
lock
bot
commented
Mar 24, 2019
|
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
ntquyen commentedNov 1, 2016
What did you do?
I'm running prometheus inside kubernetes cluster of ~20 VMs. There are normally ~200 - 250 containers/pods running in the clusters
prometheus deployment config:
The
scrape_intervalis per 60s, andstorage.local.retentionis 24h, so local storage should be small. No node-exporter running. And in the config file (see below) I tried to ignore most of metrics.What did you see instead? Under which circumstances?
Prometheus's storage takes 12GB after 1 day, which is huge, every queries are very slow and sometimes the container got OOM. As recovering from OOM, the logs said
2141026 series loaded.Checking the points by running
topk(100, count by (__name__, job)({__name__=~".+"})), the largest metric is 120k series and there is no way they can all add up to 2m series:Is there something wrong in my configuration?
Environment
System information:
Prometheus version:
v1.1.1
Prometheus configuration file: