Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upprometheus:v2.0.0-alpha.2 - OOM Killed when Limit is set ( When only Request is set, memory usage keep growing ) #3005
Comments
vnandha
changed the title
Prometheus 2.x - OOM Killed when Limit is set ( When only Request is set, memory usage keep growing )
prometheus:v2.0.0-alpha.2 - OOM Killed when Limit is set ( When only Request is set, memory usage keep growing )
Jul 30, 2017
This comment has been minimized.
This comment has been minimized.
|
This is indeed interesting. I'm not sure what the cgroup primitives are for requests/limits, but it seems like only the limit causes the kernel to evict |
This comment has been minimized.
This comment has been minimized.
|
Hi, also could you try the new Having said that, with that retention, you need a crazy amount of metrics for it to hit the memory usage you are seeing. Can you elaborate how many instances and metrics you are monitoring? |
gouthamve
added
the
dev-2.0
label
Aug 1, 2017
This comment has been minimized.
This comment has been minimized.
|
What was the querying load? The query engine can still cause significant memory usage that needs to addressed independently in the future. |
This comment has been minimized.
This comment has been minimized.
|
Requests only influence the OOM score adj, so tgey make it more or less likely to get killed if the node as a whole runs out of memory. The limit makes it so that the OOM killer is invoked on a cgroup when it uses too much even if the node still has memory. |
This comment has been minimized.
This comment has been minimized.
|
According to this doc, unless I'm reading it wrong, mmap'd pages do count as long as they are mapped in, so you do need to account for the active chunks when setting memory limits. |
This comment has been minimized.
This comment has been minimized.
|
Yes, but will the For example if the node has We actually more or less mmap the whole data directory (!!!!), so I am not sure if any kind of limit setting would be a feasible option here. |
This comment has been minimized.
This comment has been minimized.
|
How would a container "see" available memory?
…On Tue, Aug 1, 2017, 10:04 Goutham Veeramachaneni ***@***.***> wrote:
Yes, but will the mmap'ed memory be evicted as the container nears the
limits?
For example if the node has 50GB and we set a limit of 20GB. We would be
in trouble if the container sees 50GB available but the OOM is triggered
at 20GB and kills the container.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#3005 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAICBqn4H2XAodRDkK8Kw8fhOvsFWhiRks5sTtvxgaJpZM4OnhOl>
.
|
This comment has been minimized.
This comment has been minimized.
|
The point is that the operating system should evict |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
Just for my understanding, did you set memory request or memory limit to achieve this behavior @gouthamve ? |
This comment has been minimized.
This comment has been minimized.
|
This is memory `limit` sorry. Request is just `6Gi`. I did not change the
request from before, so it hit `17Gi` of usage with the `6Gi` request.
Thanks,
Goutham.
…On Tue, Aug 1, 2017 at 6:47 PM Frederic Branczyk ***@***.***> wrote:
Just for my understanding, did you set memory *request* or memory *limit*
to achieve this behavior?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#3005 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AHA3HwjDVHzB1bQeYbySfPlaM6yq2WPGks5sTyVwgaJpZM4OnhOl>
.
|
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
That looks promising, thanks for trying it out. |
This comment has been minimized.
This comment has been minimized.
|
This cluster currently runs ~220 nodes and ~1500 pods. Currently deployed targets are a) kubestate metrics which runs a single pod takes 20s for /metrics roughly 67K (without comment) lines in the /metrics. b) kubelets When i was running 1.7 where we used to 20K metrics per second Is there any similar metrics I can get from 2.x ( i used I would be happy share any metrics you are looking for |
This comment has been minimized.
This comment has been minimized.
|
|
This comment has been minimized.
This comment has been minimized.
|
|
This comment has been minimized.
This comment has been minimized.
|
Thanks, this is all very interesting. I take the increase in appended samples/second comes from some change in configuration (scrape rate, number of targets?) etc. since your 1.7? |
This comment has been minimized.
This comment has been minimized.
|
Since the beginning, our scrape interval is 60s ( we knew it would be a lot of data with 30s ). Targets pretty much remains the same edit: it also possible that our pods count increased compared to a month ago. |
This comment has been minimized.
This comment has been minimized.
|
@vnandha any updates on this? From all we could reproduce the mmap should not cause any OOMing. |
This comment has been minimized.
This comment has been minimized.
|
@fabxc I am not seeing this anymore. |
fabxc
closed this
Sep 14, 2017
This comment has been minimized.
This comment has been minimized.
|
Sorry, In fact we are seeing still OOM killed, we are upgrading to beta4 and let you know if this occurs in it.
|
This comment has been minimized.
This comment has been minimized.
fengye87
commented
Jan 2, 2018
|
@vnandha any updates on this? I'm also seeing OOM killed constantly after running Prometheus for several hours. Have you found any solution? |
This comment has been minimized.
This comment has been minimized.
lock
bot
commented
Mar 23, 2019
|
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |



vnandha commentedJul 30, 2017
•
edited
What did you do?
I am running Prometheus 2.x on Kubernetes.
What did you expect to see?
Prometheus should operate within the allocated memory (like
storage.local.target-heap-sizein 1.6+).I had a discussion on coreos/prometheus-operator#480, it was mentioned that Prometheus 2.x is using mmap and it memory will be evicted kernel automatically.
What did you see instead? Under which circumstances?
I observed two issues
.spec.resources.requests.memory, Prometheus keep using available memory on the system, below shows memory usage is 57G rss and 81G virtualBut the requested memory is just 16G
.spec.resources.requests.limitsis set, its gets OOM killedEnvironment
Kubernetes 1.6.1
Linux 3.10.0-514.16.1.el7.x86_64 x86_64Image: quay.io/prometheus/prometheus:v2.0.0-alpha.2What is the recommened way to manage memory in 2.x Prometheus.?