Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upPrometheus data never stop growing even with a tiny retention time #2431
Comments
This comment has been minimized.
This comment has been minimized.
|
What disk size are we talking about here? Note that there is a certain base footprint for the LevelDB indices, the checkpointing, the directory structure and such. |
This comment has been minimized.
This comment has been minimized.
|
For now, prometheus disk usage reached 690MB total and it's been running for 3days. Here is the last "Completion message" I got from the prometheus logs:
Even if we reach a 12h cycle, wouldn't the data size decrease after 3 days? Regards, |
This comment has been minimized.
This comment has been minimized.
Is it still growing, or is 690MB the steady state? With 100k series, I would expect disk usage to not go beyond 500MiB or so... Could you check how the disk usage is distributed over sub-directories in your data dir? And please double-check that the flag value for retention is really set (via http://my.prometheus:9090/flags ). |
This comment has been minimized.
This comment has been minimized.
|
Disk usage in the data dir: I'll tell you tomorrow if the disk usage increased or not, for now, it is still around 690MB. prometheus_local_storage_chunk_ops_total{type="drop"} return this: I double-checked to flags, and Is there a way to know how many series my prometheus collects? Many thanks, |
This comment has been minimized.
This comment has been minimized.
|
Hi, I checked the prometheus disk consumption this morning and we are now at 713MB. Regards, |
This comment has been minimized.
This comment has been minimized.
|
You could just count the Let's say you have 100k files there. Each is at least 1kiB in size. That's already 100MiB. About the same size for the checkpoint. And then LevelDB will have a certain baseline. Plus the overhead as explained above. Not entirely unreasonable to end up with something around 1GiB. The value of |
This comment has been minimized.
This comment has been minimized.
|
Hi, Sorry for this late answer. I have 14970 series in my storage. As we speak, the storage size increased to reach 870MB. I gave 2CPU and 3GB of RAM to the pod and both resources are used up to 100% most of the time. Regards, |
This comment has been minimized.
This comment has been minimized.
|
I don't know how I can assist you any further with troubleshooting. Ever growing disk space usage doesn't seem to be a problem with Prometheus in general. We need more evidence to find anything that might be wrong with Prometheus here. |
This comment has been minimized.
This comment has been minimized.
|
What can I provide you? |
This comment has been minimized.
This comment has been minimized.
|
Ideally a PR to fix the bug. :-> Otherwise a smoking gun, that let's us know on a more low level detail what's going wrong and that something is going wrong in the first place. Disk, memory, CPU usage might all be reasonable in your setup. We needed something like "this goroutine spins on the CPU where it shouldn't" or "This file is still on disk but it shouldn't" or "this data structure is allocated in memory but it shouldn't". Go has a lot of troubleshooting tooling for that but I have no capacity to teach you using them. |
This comment has been minimized.
This comment has been minimized.
|
Ok, thx. I'm gonna go learn a few things about Go troubleshouting and come back with more data. Thanks for you time, it's really appreciated. Regards, |
brian-brazil
closed this
Mar 27, 2017
This comment has been minimized.
This comment has been minimized.
lock
bot
commented
Mar 23, 2019
|
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
astraios commentedFeb 15, 2017
What did you do?
I'm using prometheus to scrape metrics from my kubernetes cluster. It's a small setup, one master, 5 nodes with around 40 containers spread across the cluster.
I configured a small retention time, 1h.
What did you expect to see?
It should work without filing my disk.
What did you see instead? Under which circumstances?
Even with a small retention period, the datastore never stop growing until my /var is full and my linux crash.
If I dive into the docker volume, I can see that the folder
labelpair_to_fingerprintsnever stop growing and prometheus deployment has been running for days.Environment
Kubernetes v1.5.2
Prometheus version:
v1.3.1
Alertmanager version:
insert output of
alertmanager -versionhere (if relevant to the issue)Prometheus deployment file: