Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upCrash recovery uses too much memory compared to target-heap-size #3038
Comments
This comment has been minimized.
This comment has been minimized.
|
Please check out #2139 . Bottom line: The sometimes excessive RAM usage during crash recovery is coming from LevelDB, which blows up during re-indexing everything (as it is required by the crash recovery). LevelDB won't be used anymore in Prometheus 2.x, so there is very little incentive to fix those issues. Sorry for that… |
beorn7
closed this
Aug 8, 2017
This comment has been minimized.
This comment has been minimized.
|
@beorn7 Thanks for answering so quickly. It make sense to make best use of your time and not address this. I'll manage in the meantime. Looking forward to version 2. FYI I run Prometheus in HA setup on GKE using preemptibles, so the pods get relocated frequently - at least each 24 hours - making this startup process pretty crucial. |
This comment has been minimized.
This comment has been minimized.
ahume
commented
Nov 22, 2017
|
@JorritSalverda I'd be really interested to know more about your configuration for HA Prometheus on GKE. Is there anything you can share? Or any advice? |
This comment has been minimized.
This comment has been minimized.
|
@ahume Please don't ask unrelated questions on issues. |
This comment has been minimized.
This comment has been minimized.
|
In version 2 this is no longer an issue, so I think we can close this ticket if everyone's okay with that. |
This comment has been minimized.
This comment has been minimized.
|
This issue has been closed long ago. |
This comment has been minimized.
This comment has been minimized.
lock
bot
commented
Mar 23, 2019
|
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
JorritSalverda commentedAug 8, 2017
What did you do?
I restarted Prometheus with the default heap size and - the by documentation suggested - 50% memory headroom with this flag:
And the following values for the Kubernetes pod:
What did you expect to see?
The memory usage to stay within reasonable limits so the Prometheus pod can recover.
What did you see instead? Under which circumstances?
When starting and running crash recovery I get errors like the following and Kubernetes killing the pod with OOMKilled as reason:
This even happens at a memory usage of 4x the heap size (8192Mi in my case).
Environment
The official docker container
prom/prometheus:v1.7.1running on GKE version 1.7.2 with a 200Gi pd-ssd persistent volume.The default config as shown at https://github.com/prometheus/prometheus/blob/master/documentation/examples/prometheus-kubernetes.yml with only difference of
When the pod finally starts after a number of restarts it logs the following, which might give an indication as too how many time ranges, metrics, etc are processed by recovery.