Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upPrometheus Crash Recovery Consumes Excessive Amount of Memory #4609
Comments
This comment has been minimized.
This comment has been minimized.
violenti
commented
Sep 19, 2018
|
Hi, I have the same problem with prometheus 2.3.2 but in kubernetes. Create a cluster federate and the principal nodo consumed 5 gb of memory ram. Is normal is performance? The configure of the scrape federate is :
|
This comment has been minimized.
This comment has been minimized.
dswarbrick
commented
Nov 6, 2018
|
I'm also running into this problem quite frequently, with Prometheus 2.4.3. We have some moderately large instances, with about 2.8 M series and about 45k samples / sec. They are VMs, so it's relatively easy to add more memory to them, but one is already spec'd with 24 GB, and I'm getting a little nervous how much higher it's going to go. The RAM usage settles down after about 20 minutes, but you have to get over that (very steep) hill first. |
This comment has been minimized.
This comment has been minimized.
|
Can you try out 2.6.0 and see if it's better? There's been a number of performance improvements made. |
This comment has been minimized.
This comment has been minimized.
viberan
commented
Dec 20, 2018
|
Same issue with 2.6.0 |
This comment has been minimized.
This comment has been minimized.
hectorhuertas
commented
Feb 14, 2019
|
We are seeing the same issue in 2.7.1. We got two prometheus replicas in kubernetes with the same configuration and around 1M series. Replica 1 using around 8Gi, and replica 2 getting killed by kube when reaching it's 20Gi limit. 2 hours after raising the limit and letting it start, memory is down to 10 Gi. |
This comment has been minimized.
This comment has been minimized.
violenti
commented
Feb 18, 2019
Yes, I now run 2.6.1 and the performace is best. |
PeterZaitsev commentedSep 14, 2018
As Of Prometheus 2.3.2 Crash recovery can be excessively memory important leading to the case when normally running system is unable to ever recover after abnormal reboot.
How to repeat:
Run prometheus with high ingest rate, consuming 60% of memory
Kill -9 prometheus
If auto-restart is configured Prometheus may enter crash loop running out of memory during crash recovery and restarting again
The only way I found to recover from such situation is to restart prometheus disabling all targets; wait for recovery to complete and perform normal restart with all targets. This also confirms issue is crash recovery related.
Sorry not having exact repeatable example.