New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crash recovery uses too much memory compared to target-heap-size #3038
Comments
Please check out #2139 . Bottom line: The sometimes excessive RAM usage during crash recovery is coming from LevelDB, which blows up during re-indexing everything (as it is required by the crash recovery). LevelDB won't be used anymore in Prometheus 2.x, so there is very little incentive to fix those issues. Sorry for that… |
@beorn7 Thanks for answering so quickly. It make sense to make best use of your time and not address this. I'll manage in the meantime. Looking forward to version 2. FYI I run Prometheus in HA setup on GKE using preemptibles, so the pods get relocated frequently - at least each 24 hours - making this startup process pretty crucial. |
@JorritSalverda I'd be really interested to know more about your configuration for HA Prometheus on GKE. Is there anything you can share? Or any advice? |
@ahume Please don't ask unrelated questions on issues. |
In version 2 this is no longer an issue, so I think we can close this ticket if everyone's okay with that. |
This issue has been closed long ago. |
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
What did you do?
I restarted Prometheus with the default heap size and - the by documentation suggested - 50% memory headroom with this flag:
And the following values for the Kubernetes pod:
What did you expect to see?
The memory usage to stay within reasonable limits so the Prometheus pod can recover.
What did you see instead? Under which circumstances?
When starting and running crash recovery I get errors like the following and Kubernetes killing the pod with OOMKilled as reason:
This even happens at a memory usage of 4x the heap size (8192Mi in my case).
Environment
The official docker container
prom/prometheus:v1.7.1
running on GKE version 1.7.2 with a 200Gi pd-ssd persistent volume.The default config as shown at https://github.com/prometheus/prometheus/blob/master/documentation/examples/prometheus-kubernetes.yml with only difference of
When the pod finally starts after a number of restarts it logs the following, which might give an indication as too how many time ranges, metrics, etc are processed by recovery.
The text was updated successfully, but these errors were encountered: