Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upPrometheus 2.2.1 "out of memory" when starting TSDB #4047
Comments
This comment has been minimized.
This comment has been minimized.
|
@Kirchen99 It'd probably help if you included information about the container's configured memory limit, and what the load on the Prometheus server was (query for |
This comment has been minimized.
This comment has been minimized.
|
rate(prometheus_tsdb_head_samples_appended_total[5m]) is 8137.149152542373 |
This comment was marked as off-topic.
This comment was marked as off-topic.
dobesv
commented
Apr 6, 2018
|
I'm having a similar problem here, I have set the memory limit in kubernetes to 12GB and the prometheus processes are just running up to that limit and getting killed by the kernel. Strangely I don't get the same stack trace as above but I'd love to have more information about how to track down the cause - or if > 12GB is normal memory usage for prometheus with my configuration, in which case I need to figure out how to reduce memory consumption. |
This comment was marked as off-topic.
This comment was marked as off-topic.
|
You have a lot of churn, so this usage is as expected. |
This comment was marked as off-topic.
This comment was marked as off-topic.
dobesv
commented
Apr 6, 2018
•
|
What do you mean by churn? How can I reduce that? |
This comment was marked as off-topic.
This comment was marked as off-topic.
dobesv
commented
Apr 6, 2018
|
Is there maybe a way I could merge time series together to use less memory? |
This comment was marked as off-topic.
This comment was marked as off-topic.
|
@dobesv Your questions are unrelated to the original issue, I'd suggest you take this to the prometheus-users mailing list. |
This comment was marked as off-topic.
This comment was marked as off-topic.
|
Is it possible that there are already lots of data in storage, and Prometheus tries to load all data in one time in memory when it starts? |
This comment was marked as off-topic.
This comment was marked as off-topic.
dobesv
commented
Apr 10, 2018
|
It normally doesn't. But it does keep data about any time series that was recorded in the last few hours in memory. It turns out that HAProxy was exporting millions of time series, so I added rules to prometheus to drop the ones I am not using. That helped a lot - so far I've cut memory needs in half. I'm still looking to see what other time series I can eliminate. |
This comment was marked as off-topic.
This comment was marked as off-topic.
dobesv
commented
Apr 10, 2018
|
There's some useful information in a reply in this thread in google groups: https://groups.google.com/forum/#!topic/prometheus-users/XjBfxBaRRbU |
This comment has been minimized.
This comment has been minimized.
andrejmaya
commented
May 16, 2018
|
Hi, did somebody found any solution on the OOM issue during the startup when Prometheus compact a lot of data? I am running prometheus v2.2.1 in OpenShift with 10GB memory limit and 27GB of data in
Prometheus consumes about 99% of the 10GB. I added the Where do I find any documentation on how to reduce memory consumption or the "head chunks" loaded into the memory? |
This comment has been minimized.
This comment has been minimized.
jurgenweber
commented
Jun 7, 2018
•
|
in the end I just deleted the 'wal' directory, prom would finally start and life would go on.. thankfully not too much data was lost. |
This comment has been minimized.
This comment has been minimized.
|
had a quick look at the code and the WAL's size depends on the block ranges based on the You can try playing with these and see if you find something that works better in your case, but I think the defaults are battle tested which makes me think that your environment can't handle the amount of metrics you are trying to process. The way I understand it is that reducing the block range sizes will use less memory at compaction (each block is loaded in memory at compaction), but will put more stress on your disk and also querying would become slower. There should be more useful info in the users devs groups so would be interested to read more on the subject if you find anything interesting. https://groups.google.com/forum/#!forum/prometheus-developers |
This comment has been minimized.
This comment has been minimized.
|
It's still not clear what went on here, but with it occurring anymore it's hard to debug. If it pops up again please let us know. Changing the block durations is not recommended, those flags only exist for internal loadtesting. |
brian-brazil
closed this
Jun 13, 2018
brian-brazil
added
kind/bug
component/local storage
labels
Jun 13, 2018
This comment has been minimized.
This comment has been minimized.
estahn
commented
Nov 6, 2018
•
|
@brian-brazil We have this issue again. The container is getting OOM killed upon start. You see it is reaching its 30GB limit and then it goes down due to OOM kill. Logs:
|
estahn
referenced this issue
Nov 7, 2018
Closed
Crash recovery OOM kills prometheus-server container #4833
This comment has been minimized.
This comment has been minimized.
danielmotaleite
commented
Dec 4, 2018
|
I had wal/checkpoing.000029 with 159GB and prometheus crashed everytime. Another prometheus (replica) had a 3GB wal/checkpoint file i'm using prometheus 2.5.0 |




Kirchen99 commentedApr 5, 2018
What did you do?
Starting Prometheus in Container
What did you expect to see?
Normal behavior
What did you see instead? Under which circumstances?
fatal error: runtime: out of memory
However, after I run "docker-compose down -v" to delete the volume, it works again.
I got the same issue when I ran Prometheus in kubernetes with StatefulSet. Prometheus hanged up at "Starting TSDB...". After I deleted the PersistentVolume, Prometheus comes up as usual.
Environment
Linux 4.9.32-15.41.amzn1.x86_64 x86_64
Prometheus version:
2.2.1
Prometheus configuration file: