Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kernel: prometheus invoked oom-killer #3470

Closed
tuner23 opened this Issue Nov 14, 2017 · 4 comments

Comments

Projects
None yet
2 participants
@tuner23
Copy link

tuner23 commented Nov 14, 2017

Hi,

we want to get the daily averages of the previous day to use as longterm metrics.
To do this, there is a script which iterates over every metric, calculates the average and stores the data.
E.g. avg_over_time(container_cpu_system_seconds_total[1d])

Partially (~50%) we get out-of-memory error and the prometheus server has to be restartet.

prometheus, version 2.0.0 (branch: HEAD, revision: 0a74f98)
build user: root@615b82cb36b6
build date: 20171108-07:11:59
go version: go1.9.2

Mem: 32Gb

  • System information:
    Linux 3.10.0-693.5.2.el7.x86_64 x86_64

Nov 14 10:33:55 prometheus-test1 kernel: prometheus invoked oom-killer: gfp_mask=0x280da, order=0, oom_score_adj=0
Nov 14 10:33:55 prometheus-test1 kernel: prometheus cpuset=/ mems_allowed=0
Nov 14 10:33:55 prometheus-test1 kernel: CPU: 3 PID: 33947 Comm: prometheus Not tainted 3.10.0-693.5.2.el7.x86_64 #1
Nov 14 10:33:55 prometheus-test1 kernel: Hardware name: Red Hat RHEV Hypervisor, BIOS 1.9.1-5.el7_3.3 04/01/2014
Nov 14 10:33:56 prometheus-test1 kernel: ffff880810b42f70 00000000bb3b807e ffff8801d621fa48 ffffffff816a3e51
Nov 14 10:33:56 prometheus-test1 kernel: ffff8801d621fad8 ffffffff8169f246 ffff8801d621fae0 ffffffff812b7d1b
Nov 14 10:33:56 prometheus-test1 kernel: 0000000000000001 ffff8801d621fa80 ffffffff00000202 fffeefff00000000
Nov 14 10:33:56 prometheus-test1 kernel: Call Trace:
Nov 14 10:33:56 prometheus-test1 kernel: [] dump_stack+0x19/0x1b
Nov 14 10:33:56 prometheus-test1 kernel: [] dump_header+0x90/0x229
Nov 14 10:33:56 prometheus-test1 kernel: [] ? cred_has_capability+0x6b/0x120
Nov 14 10:33:56 prometheus-test1 kernel: [] oom_kill_process+0x254/0x3d0
Nov 14 10:33:56 prometheus-test1 kernel: [] ? selinux_capable+0x2e/0x40
Nov 14 10:33:56 prometheus-test1 kernel: [] out_of_memory+0x4b6/0x4f0
Nov 14 10:33:56 prometheus-test1 kernel: [] __alloc_pages_slowpath+0x5d6/0x724
Nov 14 10:33:56 prometheus-test1 kernel: [] __alloc_pages_nodemask+0x405/0x420
Nov 14 10:33:56 prometheus-test1 kernel: [] alloc_pages_vma+0xb5/0x200
Nov 14 10:33:56 prometheus-test1 kernel: [] handle_mm_fault+0xb60/0xfa0
Nov 14 10:33:56 prometheus-test1 kernel: [] __do_page_fault+0x154/0x450
Nov 14 10:33:56 prometheus-test1 kernel: [] ? __switch_to+0x15a/0x510
Nov 14 10:33:56 prometheus-test1 kernel: [] trace_do_page_fault+0x56/0x150
Nov 14 10:33:56 prometheus-test1 kernel: [] do_async_page_fault+0x1a/0xd0
Nov 14 10:33:56 prometheus-test1 kernel: [] async_page_fault+0x28/0x30
Nov 14 10:33:56 prometheus-test1 kernel: Mem-Info:
Nov 14 10:33:56 prometheus-test1 kernel: active_anon:7474438 inactive_anon:571513 isolated_anon:0#012 active_file:65 inactive_file:0 isolated_file:0#012 unevictable:0 dirty:0 writeback:20 unstable:0#012 slab_reclaimable:10868 slab_unreclaimable:11884#012 mapped:630 shmem:637 pagetables:20398 bounce:0#012 free:49974 free_pcp:4 free_cma:0
Nov 14 10:33:56 prometheus-test1 kernel: Node 0 DMA free:15892kB min:32kB low:40kB high:48kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15908kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:16kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Nov 14 10:33:56 prometheus-test1 kernel: lowmem_reserve[]: 0 2813 31988 31988
Nov 14 10:33:56 prometheus-test1 kernel: Node 0 DMA32 free:122600kB min:5940kB low:7424kB high:8908kB active_anon:2187164kB inactive_anon:553716kB active_file:100kB inactive_file:248kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3129192kB managed:2883184kB mlocked:0kB dirty:0kB writeback:0kB mapped:588kB shmem:416kB slab_reclaimable:3484kB slab_unreclaimable:5100kB kernel_stack:336kB pagetables:6928kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:602 all_unreclaimable? yes
Nov 14 10:33:56 prometheus-test1 kernel: lowmem_reserve[]: 0 0 29174 29174
Nov 14 10:33:56 prometheus-test1 kernel: Node 0 Normal free:61404kB min:61608kB low:77008kB high:92412kB active_anon:27710588kB inactive_anon:1732336kB active_file:160kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:30408704kB managed:29874648kB mlocked:0kB dirty:0kB writeback:80kB mapped:1932kB shmem:2132kB slab_reclaimable:39988kB slab_unreclaimable:42420kB kernel_stack:4528kB pagetables:74664kB unstable:0kB bounce:0kB free_pcp:16kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:283 all_unreclaimable? yes
Nov 14 10:33:56 prometheus-test1 kernel: lowmem_reserve[]: 0 0 0 0
Nov 14 10:33:56 prometheus-test1 kernel: Node 0 DMA: 14kB (U) 08kB 116kB (U) 032kB 264kB (U) 1128kB (U) 1256kB (U) 0512kB 11024kB (U) 12048kB (M) 34096kB (M) = 15892kB
Nov 14 10:33:56 prometheus-test1 kernel: Node 0 DMA32: 103
4kB (UEM) 528kB (UEM) 3916kB (UEM) 8732kB (UEM) 6564kB (UEM) 40128kB (UE) 20256kB (UEM) 111512kB (EM) 461024kB (EM) 02048kB 04096kB = 122572kB
Nov 14 10:33:56 prometheus-test1 kernel: Node 0 Normal: 3094kB (UEM) 3458kB (UEM) 30216kB (UEM) 31132kB (UEM) 25364kB (UEM) 133128kB (UEM) 32256kB (UE) 1512kB (M) 11024kB (M) 02048kB 0*4096kB = 61724kB
Nov 14 10:33:56 prometheus-test1 kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Nov 14 10:33:56 prometheus-test1 kernel: 81648 total pagecache pages
Nov 14 10:33:56 prometheus-test1 kernel: 80927 pages in swap cache
Nov 14 10:33:56 prometheus-test1 kernel: Swap cache stats: add 1413729, delete 1332802, find 314026/357699
Nov 14 10:33:56 prometheus-test1 kernel: Free swap = 0kB
Nov 14 10:33:56 prometheus-test1 kernel: Total swap = 2097148kB
Nov 14 10:33:56 prometheus-test1 kernel: 8388472 pages RAM
Nov 14 10:33:56 prometheus-test1 kernel: 0 pages HighMem/MovableOnly
Nov 14 10:33:56 prometheus-test1 kernel: 195037 pages reserved
Nov 14 10:33:56 prometheus-test1 kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name
...
Nov 14 10:33:57 prometheus-test1 kernel: Out of memory: Kill process 33939 (prometheus) score 931 or sacrifice child
Nov 14 10:33:57 prometheus-test1 kernel: Killed process 33939 (prometheus) total-vm:48335636kB, anon-rss:30603900kB, file-rss:0kB, shmem-rss:0kB
Nov 14 10:33:57 prometheus-test1 systemd: prometheus.service: main process exited, code=killed, status=9/KILL
Nov 14 10:33:58 prometheus-test1 systemd: Unit prometheus.service entered failed state.
Nov 14 10:33:58 prometheus-test1 systemd: prometheus.service failed.

@simonpasquier

This comment has been minimized.

Copy link
Member

simonpasquier commented Nov 14, 2017

Prometheus uses that many memory because it loads all container_cpu_system_seconds_total series over 1 day which is a lot. Either you need a machine with more RAM or write a post-processing tool rather than running this query directly on Prometheus. Also I'm not sure why you're computing the average over a counter...

@tuner23

This comment has been minimized.

Copy link
Author

tuner23 commented Nov 17, 2017

Thanks, you're right.
After scaling down to hourly averages and some blacklisting, everything works fine :-)

@simonpasquier

This comment has been minimized.

Copy link
Member

simonpasquier commented Nov 17, 2017

Can you close the issue then please?

@tuner23 tuner23 closed this Nov 17, 2017

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 23, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 23, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.