Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prometheus:v2.0.0-alpha.2 - OOM Killed when Limit is set ( When only Request is set, memory usage keep growing ) #3005

Closed
vnandha opened this Issue Jul 30, 2017 · 23 comments

Comments

Projects
None yet
6 participants
@vnandha
Copy link

vnandha commented Jul 30, 2017

What did you do?
I am running Prometheus 2.x on Kubernetes.

What did you expect to see?
Prometheus should operate within the allocated memory (like storage.local.target-heap-size in 1.6+).
I had a discussion on coreos/prometheus-operator#480, it was mentioned that Prometheus 2.x is using mmap and it memory will be evicted kernel automatically.

@vnandha I talked to @lucab and he mentioned that mmaped memory is one of the first things that get eviceted by the kernel, when hitting the requested amount of memory (requested as in the fields in Kubernetes).

What did you see instead? Under which circumstances?

I observed two issues

  1. When I set only .spec.resources.requests.memory, Prometheus keep using available memory on the system, below shows memory usage is 57G rss and 81G virtual
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                                      
18356 root      20   0 81.436g 0.057t  37268 S 115.1 46.3  26780:41 prometheus

But the requested memory is just 16G

Containers:
  prometheus:
    Container ID:       docker://e00d9d307bf2fd396e73c914f28ef71c79b340623eff23336c512fa531126d79
    Image:              quay.io/prometheus/prometheus:v2.0.0-alpha.2
    Image ID:           docker-pullable://quay.io/prometheus/prometheus@sha256:bfaea6c2e210d739978ec001ccaa992ed476c4a50c65391d229c0a957bde574c
    Port:               9090/TCP
    Args:
      -config.file=/etc/prometheus/config/prometheus.yaml
      -storage.local.path=/var/prometheus/data
      -storage.tsdb.no-lockfile
      -storage.tsdb.retention=72h
      -web.route-prefix=/
    State:              Running
      Started:          Sun, 16 Jul 2017 05:43:10 +0000
    Ready:              True
    Restart Count:      0
    Requests:
      cpu:              8
      memory:           16Gi
    Liveness:           http-get http://:web/status delay=300s timeout=3s period=5s #success=1 #failure=10

  1. When .spec.resources.requests.limits is set, its gets OOM killed
Containers:
  prometheus:
    Container ID:       docker://2323744304b72d7b657f737937f400408cf41ba1658d101ee643d0ea44057648
    Image:              quay.io/prometheus/prometheus:v2.0.0-alpha.2
    Image ID:           docker-pullable://quay.io/prometheus/prometheus@sha256:bfaea6c2e210d739978ec001ccaa992ed476c4a50c65391d229c0a957bde574c
    Port:               9090/TCP
    Args:  
      -config.file=/etc/prometheus/config/prometheus.yaml
      -storage.local.path=/var/prometheus/data
      -storage.tsdb.no-lockfile
      -storage.tsdb.retention=72h
      -web.route-prefix=/
    State:              Running
      Started:          Sun, 30 Jul 2017 02:08:01 +0000
    Last State:         Terminated
      Reason:           OOMKilled
      Exit Code:        137
      Started:          Mon, 01 Jan 0001 00:00:00 +0000
      Finished:         Sun, 30 Jul 2017 02:07:40 +0000
    Ready:              False
    Restart Count:      19
    Limits:
      cpu:      16
      memory:   32Gi
    Requests:
      cpu:              16
      memory:           32Gi
    Liveness:           http-get http://:web/status delay=300s timeout=3s period=5s #success=1 #failure=10

Environment
Kubernetes 1.6.1

  • System information:

Linux 3.10.0-514.16.1.el7.x86_64 x86_64

  • Prometheus version:
    Image: quay.io/prometheus/prometheus:v2.0.0-alpha.2

What is the recommened way to manage memory in 2.x Prometheus.?

@vnandha vnandha changed the title Prometheus 2.x - OOM Killed when Limit is set ( When only Request is set, memory usage keep growing ) prometheus:v2.0.0-alpha.2 - OOM Killed when Limit is set ( When only Request is set, memory usage keep growing ) Jul 30, 2017

@brancz

This comment has been minimized.

Copy link
Member

brancz commented Jul 31, 2017

This is indeed interesting. I'm not sure what the cgroup primitives are for requests/limits, but it seems like only the limit causes the kernel to evict mmaped chunks, when it gets closer to it. /cc @fabxc

@gouthamve

This comment has been minimized.

Copy link
Member

gouthamve commented Aug 1, 2017

Hi, also could you try the new beta.0? We did fix some memory leaks in that release.

Having said that, with that retention, you need a crazy amount of metrics for it to hit the memory usage you are seeing. Can you elaborate how many instances and metrics you are monitoring?

@gouthamve gouthamve added the dev-2.0 label Aug 1, 2017

@fabxc

This comment has been minimized.

Copy link
Member

fabxc commented Aug 1, 2017

What was the querying load? The query engine can still cause significant memory usage that needs to addressed independently in the future.

@matthiasr

This comment has been minimized.

Copy link
Contributor

matthiasr commented Aug 1, 2017

Requests only influence the OOM score adj, so tgey make it more or less likely to get killed if the node as a whole runs out of memory. The limit makes it so that the OOM killer is invoked on a cgroup when it uses too much even if the node still has memory.

@matthiasr

This comment has been minimized.

Copy link
Contributor

matthiasr commented Aug 1, 2017

According to this doc, unless I'm reading it wrong, mmap'd pages do count as long as they are mapped in, so you do need to account for the active chunks when setting memory limits.

@gouthamve

This comment has been minimized.

Copy link
Member

gouthamve commented Aug 1, 2017

Yes, but will the mmap'ed memory be evicted as the container nears the limits?

For example if the node has 50GB and we set a limit of 20GB. We would be in trouble if the container sees 50GB available but the OOM is triggered at 20GB and kills the container.

We actually more or less mmap the whole data directory (!!!!), so I am not sure if any kind of limit setting would be a feasible option here.

@matthiasr

This comment has been minimized.

Copy link
Contributor

matthiasr commented Aug 1, 2017

@brancz

This comment has been minimized.

Copy link
Member

brancz commented Aug 1, 2017

The point is that the operating system should evict mmaped memory when the process nears some limit, I expected this to be the memory request in Kubernetes. This might still be right as @fabxc mentioned, as it also highly depends on the memory used due to query load, which the operating system cannot evict as it's up to the go runtime (and or efficiency of the query engine) to release this memory.

@gouthamve

This comment has been minimized.

Copy link
Member

gouthamve commented Aug 1, 2017

I just ran a test, the limits are honored and mmaped memory is evicted. OOM killing doesn't happen:

screen shot 2017-08-01 at 6 36 03 pm

The Prom hit 17G before setting a limit and I set 10G limit and restarted. So we can see that setting limits is safe. Now I think this issue is insane memory usage now.

@brancz

This comment has been minimized.

Copy link
Member

brancz commented Aug 1, 2017

Just for my understanding, did you set memory request or memory limit to achieve this behavior @gouthamve ?

@gouthamve

This comment has been minimized.

Copy link
Member

gouthamve commented Aug 1, 2017

@vnandha

This comment has been minimized.

Copy link
Author

vnandha commented Aug 1, 2017

I have started running beta2 so far the stats look like this ( req and limit both are set to 32G)

prometheus-k8s-0 2/2 Running 0 2d

So far no restart as well. I will confirm post retention time ( 1 more day )

screen shot 2017-08-01 at 7 21 44 am

@fabxc

This comment has been minimized.

Copy link
Member

fabxc commented Aug 1, 2017

That looks promising, thanks for trying it out.
Can you share what your rough monitoring workload is for this instance? Would be very interesting.

@vnandha

This comment has been minimized.

Copy link
Author

vnandha commented Aug 1, 2017

This cluster currently runs ~220 nodes and ~1500 pods.

Currently deployed targets are

a) kubestate metrics which runs a single pod takes 20s for /metrics roughly 67K (without comment) lines in the /metrics.
we have a problem with scrape it times out often

screen shot 2017-08-01 at 7 44 26 am

b) kubelets
c) node-exporter
d) alert-managers (3 pods)
e) kubernetes api server (3)

When i was running 1.7 where we used to 20K metrics per second

Is there any similar metrics I can get from 2.x ( i used prometheus_local_storage_ingested_samples_total on 1.7)

I would be happy share any metrics you are looking for

@fabxc

This comment has been minimized.

Copy link
Member

fabxc commented Aug 1, 2017

tsdb_samples_appended_total is the equivalent metric for 2.0.
A sum(scrape_samples_scraped) would also be interesting. It should give the number of active time series. In k8s 1.7 there's a bug though by which cAdvisor exposes varying amounts of metrics – so a min/max over time would be helpful there.

@vnandha

This comment has been minimized.

Copy link
Author

vnandha commented Aug 1, 2017

sum(scrape_samples_scraped) = 2584701
rate(tsdb_samples_appended_total[2m]) = ~45k

@fabxc

This comment has been minimized.

Copy link
Member

fabxc commented Aug 1, 2017

Thanks, this is all very interesting. I take the increase in appended samples/second comes from some change in configuration (scrape rate, number of targets?) etc. since your 1.7?

@vnandha

This comment has been minimized.

Copy link
Author

vnandha commented Aug 1, 2017

Since the beginning, our scrape interval is 60s ( we knew it would be a lot of data with 30s ). Targets pretty much remains the same

edit: it also possible that our pods count increased compared to a month ago.

@fabxc

This comment has been minimized.

Copy link
Member

fabxc commented Sep 14, 2017

@vnandha any updates on this? From all we could reproduce the mmap should not cause any OOMing.

@vnandha

This comment has been minimized.

Copy link
Author

vnandha commented Sep 14, 2017

@fabxc I am not seeing this anymore.

@fabxc fabxc closed this Sep 14, 2017

@vnandha

This comment has been minimized.

Copy link
Author

vnandha commented Sep 19, 2017

Sorry, In fact we are seeing still OOM killed, we are upgrading to beta4 and let you know if this occurs in it.

[26029.245332] prometheus cpuset=docker-1c01715e9f4afe1c55f7d72fbf23eb79a59f43a0690f9c57b07b6f28d4880b77.scope mems_allowed=0-1
[26029.379655] CPU: 0 PID: 54724 Comm: prometheus Not tainted 3.10.0-514.26.2.el7.20170707.8.x86_64 #1
[26029.494223] Hardware name: Dell Inc. PowerEdge R630/02C2CP, BIOS 2.0.2 03/15/2016
[26029.583854]  ffff88101f369f60 00000000d1fe1aab ffff88061b72bcc0 ffffffff81687063
[26029.673042]  ffff88061b72bd50 ffffffff8168200e ffff881f9425db80 0000000000000001
[26029.762311]  0000000000000000 0000000000000000 0000000000000046 ffffffff81184886
[26029.851401] Call Trace:
[26029.880650]  [<ffffffff81687063>] dump_stack+0x19/0x1b
[26029.942166]  [<ffffffff8168200e>] dump_header+0x8e/0x225
[26030.005753]  [<ffffffff81184886>] ? find_lock_task_mm+0x56/0xc0
[26030.076609]  [<ffffffff81184d3e>] oom_kill_process+0x24e/0x3c0
[26030.146437]  [<ffffffff811847dd>] ? oom_unkillable_task+0xcd/0x120
[26030.220406]  [<ffffffff81093c0e>] ? has_capability_noaudit+0x1e/0x30
[26030.296463]  [<ffffffff811f38d1>] mem_cgroup_oom_synchronize+0x551/0x580
[26030.376673]  [<ffffffff811f2d20>] ? mem_cgroup_charge_common+0xc0/0xc0
[26030.454808]  [<ffffffff811855c4>] pagefault_out_of_memory+0x14/0x90
[26030.529823]  [<ffffffff8167fe7a>] mm_fault_error+0x68/0x12b
[26030.596521]  [<ffffffff81692e45>] __do_page_fault+0x395/0x450
[26030.665326]  [<ffffffff81692f35>] do_page_fault+0x35/0x90
[26030.730002]  [<ffffffff8168f148>] page_fault+0x28/0x30
[26030.791525] Task in /kubepods.slice/kubepods-podb3134c5c_9d04_11e7_b0bf_6805ca39f78c.slice/docker-1c01715e9f4afe1c55f7d72fbf23eb79a59f43a0690f9c57b07b6f28d4880b77.scope killed as a result of limit of /kubepods.slice/kubepods-podb3134c5c_9d04_11e7_b0bf_6805ca39f78c.slice/docker-1c01715e9f4afe1c55f7d72fbf23eb79a59f43a0690f9c57b07b6f28d4880b77.scope
[26031.159046] memory: usage 33554432kB, limit 33554432kB, failcnt 1519459
[26031.238299] memory+swap: usage 33554432kB, limit 67108864kB, failcnt 0
[26031.316468] kmem: usage 0kB, limit 9007199254740988kB, failcnt 0
[26031.388422] Memory cgroup stats for /kubepods.slice/kubepods-podb3134c5c_9d04_11e7_b0bf_6805ca39f78c.slice/docker-1c01715e9f4afe1c55f7d72fbf23eb79a59f43a0690f9c57b07b6f28d4880b77.scope: cache:104KB rss:33554328KB rss_huge:6207488KB mapped_file:44KB swap:0KB inactive_anon:0KB active_anon:33554328KB inactive_file:24KB active_file:76KB unevictable:0KB
[26031.759568] [ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
[26031.853582] [54604]    99 54604  8420209  8388542   16443        0          -998 prometheus
[26031.953717] Memory cgroup out of memory: Kill process 54911 (prometheus) score 4 or sacrifice child
[26032.062028] Killed process 54604 (prometheus) total-vm:33680836kB, anon-rss:33554168kB, file-rss:0kB, shmem-rss:0kB```
@fengye87

This comment has been minimized.

Copy link

fengye87 commented Jan 2, 2018

@vnandha any updates on this? I'm also seeing OOM killed constantly after running Prometheus for several hours. Have you found any solution?

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 23, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 23, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.