Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upPrometheus Release 2.2.0 Memory Leak ? #4164
Comments
This comment has been minimized.
This comment has been minimized.
|
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
I think I’ve seen a memory profile like that before, can you just to try remove the scrape configs with “role: node” and see if it disappears? (I’ve seen leaks somehow related to node objects being deserialized, never found the reason for it though) |
This comment has been minimized.
This comment has been minimized.
|
This comment has been minimized.
This comment has been minimized.
|
Sorry I meant the whole block of those scrape configs that contain the “role: node” key/value pair. |
This comment has been minimized.
This comment has been minimized.
|
@cofyc do you have any idea what might be causing this leak? |
This comment has been minimized.
This comment has been minimized.
|
@krasi-georgiev I've never encountered memory leak with prometheus 2.2.x. |
This comment has been minimized.
This comment has been minimized.
|
@cofyc Looks like it's memory also grows always .
|
This comment has been minimized.
This comment has been minimized.
|
heap is to large for upload |
This comment has been minimized.
This comment has been minimized.
|
This comment has been minimized.
This comment has been minimized.
|
Can you confirm, that when you do In my case the graph of the container usage looks similar but I think we're just seeing mmap memory being freed. Concretely I've been running a setup where the However heap usage (reported by the go runtime) seems stable and not ever growing: Also the process has been running for >24h, so besides Kubernetes not reporting an OOMKill to me, the process itself is also confirming that. Could you try these queries and validate this against your setup? |
This comment has been minimized.
This comment has been minimized.
|
@brancz |
This comment has been minimized.
This comment has been minimized.
|
the screenshot doesn't seem to show anything useful. @piaoyu did you check the logs to see if Prometheus really gets OOM killed? |
This comment has been minimized.
This comment has been minimized.
|
@krasi-georgiev I check the logs it really gets OOM killed. |
This comment has been minimized.
This comment has been minimized.
grantstephens
commented
May 28, 2018
|
I've also experienced the same problem and it has stopped when I removed the k8 cluster from the config. |
This comment has been minimized.
This comment has been minimized.
|
I'm looking into this, and trying to reproduce this issue in my cluster.
@piaoyu From output of
|
This comment has been minimized.
This comment has been minimized.
|
@grantstephens Could you provide output of |
This comment has been minimized.
This comment has been minimized.
grantstephens
commented
May 28, 2018
Let me know if you need any more information. |
This comment has been minimized.
This comment has been minimized.
djsly
commented
Jun 1, 2018
|
Hello, we are also affected on three different clusters, we are getting POD OOM killed every few hours. |
This comment has been minimized.
This comment has been minimized.
|
@djsly it would helpful if you could share:
|
This comment has been minimized.
This comment has been minimized.
|
for everyone in this issue I just realised that if you are testing the binary from the release page this doesn't include the refactoring so make sure you build a new one from master or the 2.2 branch. https://github.com/prometheus/prometheus/tree/release-2.2 I can build the binary for anyone willing to test it. |
This comment has been minimized.
This comment has been minimized.
|
@cofyc quay.io/prometheus/prometheus:v2.2.1 seems not oom problem |
This comment has been minimized.
This comment has been minimized.
|
@piaoyu did you read my comment above? |
This comment has been minimized.
This comment has been minimized.
|
v2.3.0 is out so try that.
|
This comment has been minimized.
This comment has been minimized.
|
@krasi-georgiev OK . I will try quay.io/prometheus/prometheus:v2.3.0 |
brian-brazil
added
the
kind/more-info-needed
label
Jun 13, 2018
This comment has been minimized.
This comment has been minimized.
dfredell
commented
Jun 18, 2018
|
My prometheus was sucking up a ton of memory and disk. More than I expected. I had storage.tsdb.retention set to 10 days, and gave the prometheus docker 50G disk and 4G of memory. It wasn't happy. I tried upgrading to 2.3.0 and I think that made things even worse.
ex
|
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
@krasi-georgiev seem like 2.3 fixed the problem for 12hours memory watch. |
This comment has been minimized.
This comment has been minimized.
|
Nice , closing this as resolved! Thanks for the team work |







piaoyu commentedMay 15, 2018
•
edited
Proposal
Use case. Why is this important?
Nice to have' is not a good use case :)
Bug Report
What did you do?
Not thing
What did you expect to see?
using memory and releasing it
What did you see instead? Under which circumstances?
using memory and not releasing it
Environment