Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prometheus Memory Issues #4945

Open
muralikanagala opened this Issue Dec 3, 2018 · 1 comment

Comments

Projects
None yet
2 participants
@muralikanagala
Copy link

muralikanagala commented Dec 3, 2018

Bug Report

What did you do?
No changes in the environment.

What did you expect to see?
Prometheus running stable

What did you see instead? Under which circumstances?
Prometheus eating up all the memory and getting restarted.
This is happening on both the servers. So added additional memory to one of the servers (now at 32GB)
but it is not helping. I have tried deleting the WAL directory and tried restarting but the issue shows up again in few minutes/hours. Added below config options(storage.tsdb) but it did not fix the problem.

[Service]
User=prometheus
Restart=on-failure
RestartSec=5
LimitNOFILE=50000
ExecStart=/prometheus/prometheus \
  --config.file=/prometheus/prometheus.yml \
  --storage.tsdb.path=/prometheus \
  --storage.tsdb.retention=150d \
  --web.external-url=http://servername:9090/ \
  --storage.tsdb.min-block-duration=30m \
  --storage.tsdb.max-block-duration=1d \
  --web.enable-lifecycle \
  --web.enable-admin-api \
  --rules.alert.resend-delay=1m \
  --log.level=debug
ExecReload=/bin/curl -X POST http://localhost:9090/-/reload

Environment

2 CentOS servers running Prometheus and Grafana colecting the same data from a bunch of Windows machines running wmi_exporter, RabbitMQ etc. Total of ~ 500 targets and 600000 time series. 
  • System information:
Linux 3.10.0-862.14.4.el7.x86_64 x86_64
  • Prometheus version:
prometheus, version 2.5.0 (branch: HEAD, revision: 67dc912ac8b24f94a1fc478f352d25179c94ab9b)
  build user:       root@578ab108d0b9
  build date:       20181106-11:40:44
  go version:       go1.11.1
  • Alertmanager version:
alertmanager, version 0.15.0-rc.3 (branch: HEAD, revision: 5e86f61bd73c6325d6049ab3dbcb468ede26dfe0)
  build user:       root@0d28fc42e4ec
  build date:       20180618-10:26:10
  go version:       go1.10.3
  • Prometheus configuration file:
    NA

  • Alertmanager configuration file:
    NA

  • Logs:

ec 03 11:24:58  prometheus[23468]: level=info ts=2018-12-03T16:24:58.651049636Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1543525200000 maxt=1543527000000 ulid=01CXGPX97QGG73NMBDBE836Y19
Dec 03 11:24:58  prometheus[23468]: level=info ts=2018-12-03T16:24:58.651399611Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1543519800000 maxt=1543525200000 ulid=01CXGPY24SRA82NCZG6J90QZAY
^[[C^[[C^[[ADec 03 11:26:09  prometheus[23468]: fatal error: runtime: out of memory
Dec 03 11:26:09  prometheus[23468]: runtime stack:
Dec 03 11:26:09  prometheus[23468]: runtime.throw(0x1ada480, 0x16)
Dec 03 11:26:09  prometheus[23468]: /usr/local/go/src/runtime/panic.go:608 +0x72
Dec 03 11:26:09  prometheus[23468]: runtime.sysMap(0xc36c000000, 0x4000000, 0x2db8578)
Dec 03 11:26:09  prometheus[23468]: /usr/local/go/src/runtime/mem_linux.go:156 +0xc7
Dec 03 11:26:09  prometheus[23468]: runtime.(*mheap).sysAlloc(0x2d9ed20, 0x4000000, 0x2d9f1e8, 0x7f70bdd15000)
Dec 03 11:26:09  prometheus[23468]: /usr/local/go/src/runtime/malloc.go:619 +0x1c7
Dec 03 11:26:09  prometheus[23468]: runtime.(*mheap).grow(0x2d9ed20, 0x1, 0x0)
Dec 03 11:26:09  prometheus[23468]: /usr/local/go/src/runtime/mheap.go:920 +0x42
Dec 03 11:26:09  prometheus[23468]: runtime.(*mheap).allocSpanLocked(0x2d9ed20, 0x1, 0x2db8588, 0x400)
Dec 03 11:26:09  prometheus[23468]: /usr/local/go/src/runtime/mheap.go:848 +0x337
Dec 03 11:26:09  prometheus[23468]: runtime.(*mheap).alloc_m(0x2d9ed20, 0x1, 0x23, 0x7f70bdcbcfff)
Dec 03 11:26:09  prometheus[23468]: /usr/local/go/src/runtime/mheap.go:692 +0x119
Dec 03 11:26:09  prometheus[23468]: runtime.(*mheap).alloc.func1()
Dec 03 11:26:09  prometheus[23468]: /usr/local/go/src/runtime/mheap.go:759 +0x4c
Dec 03 11:26:09  prometheus[23468]: runtime.(*mheap).alloc(0x2d9ed20, 0x1, 0x7f70bd010023, 0x7f70bd9fcef0)
Dec 03 11:26:09  prometheus[23468]: /usr/local/go/src/runtime/mheap.go:758 +0x8a
Dec 03 11:26:09  prometheus[23468]: runtime.(*mcentral).grow(0x2da0918, 0x0)
Dec 03 11:26:09  prometheus[23468]: /usr/local/go/src/runtime/mcentral.go:232 +0x94
Dec 03 11:26:09  prometheus[23468]: runtime.(*mcentral).cacheSpan(0x2da0918, 0x7f70bd9fcef0)
Dec 03 11:26:09  prometheus[23468]: /usr/local/go/src/runtime/mcentral.go:106 +0x2f8
Dec 03 11:26:09  prometheus[23468]: runtime.(*mcache).refill(0x7ff6a85a8000, 0xc0006a6123)
Dec 03 11:26:09  prometheus[23468]: /usr/local/go/src/runtime/mcache.go:122 +0x95
Dec 03 11:26:09  prometheus[23468]: runtime.(*mcache).nextFree.func1()
Dec 03 11:26:09  prometheus[23468]: /usr/local/go/src/runtime/malloc.go:749 +0x32
Dec 03 11:26:09  prometheus[23468]: runtime.systemstack(0x0)
Dec 03 11:26:09  prometheus[23468]: /usr/local/go/src/runtime/asm_amd64.s:351 +0x66
Dec 03 11:26:09  prometheus[23468]: runtime.mstart()
Dec 03 11:26:09  prometheus[23468]: /usr/local/go/src/runtime/proc.go:1229
Dec 03 11:26:09  prometheus[23468]: goroutine 215 [running]:
Dec 03 11:26:09  prometheus[23468]: runtime.systemstack_switch()
Dec 03 11:26:09  prometheus[23468]: /usr/local/go/src/runtime/asm_amd64.s:311 fp=0xc000695b50 sp=0xc000695b48 pc=0x459d60
Dec 03 11:26:09  prometheus[23468]: runtime.(*mcache).nextFree(0x7ff6a85a8000, 0x23, 0xc36be34000, 0x7f70bd9fadb0, 0xc000695c20)
Dec 03 11:26:09  prometheus[23468]: /usr/local/go/src/runtime/malloc.go:748 +0xb6 fp=0xc000695ba8 sp=0xc000695b50 pc=0x40b686
Dec 03 11:26:09  prometheus[23468]: runtime.mallocgc(0x100, 0x0, 0x0, 0xc36bf67f00)
Dec 03 11:26:09  prometheus[23468]: /usr/local/go/src/runtime/malloc.go:903 +0x793 fp=0xc000695c48 sp=0xc000695ba8 pc=0x40bfd3
Dec 03 11:26:09  prometheus[23468]: runtime.growslice(0x1742d80, 0xc335282c00, 0x80, 0x80, 0x81, 0xc36bf67f00, 0x80, 0x100)
Dec 03 11:26:09  prometheus[23468]: /usr/local/go/src/runtime/slice.go:197 +0x219 fp=0xc000695cb0 sp=0xc000695c48 pc=0x443039
Dec 03 11:26:09  prometheus[23468]: github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/chunkenc.(*bstream).writeByte(...)
Dec 03 11:26:09  prometheus[23468]: /go/src/github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/chunkenc/bstream.go:103
Dec 03 11:26:09  prometheus[23468]: github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/chunkenc.(*bstream).writeBits(0xc33527d9c0, 0x5fce839ebc0, 0x2e)
Dec 03 11:26:09  prometheus[23468]: /go/src/github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/chunkenc/bstream.go:112 +0x156 fp=0xc000695d20 sp=0xc000695cb0 pc=0x1445326
Dec 03 11:26:09  prometheus[23468]: github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/chunkenc.(*xorAppender).writeVDelta(0xc335263860, 0x41a546cada8ad8f3)
Dec 03 11:26:09  prometheus[23468]: /go/src/github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/chunkenc/xor.go:207 +0x227 fp=0xc000695da8 sp=0xc000695d20 pc=0x1446be7
Dec 03 11:26:09  prometheus[23468]: github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/chunkenc.(*xorAppender).Append(0xc335263860, 0x1676164d14f, 0x41a546cada8ad8f3)
Dec 03 11:26:09  prometheus[23468]: /go/src/github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/chunkenc/xor.go:175 +0x390 fp=0xc000695e58 sp=0xc000695da8 pc=0x14464d0
Dec 03 11:26:09  prometheus[23468]: github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb.(*memSeries).append(0xc1ff590000, 0x1676164d14f, 0x41a546cada8ad8f3, 0x1)
Dec 03 11:26:09  prometheus[23468]: /go/src/github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/head.go:1455 +0x125 fp=0xc000695ea0 sp=0xc000695e58 pc=0x147ef75
Dec 03 11:26:09  prometheus[23468]: github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb.(*Head).processWALSamples(0xc00015a370, 0x1676160e7c0, 0x1, 0x4, 0xc1e63d59e0, 0xc1e63d5a40, 0xec)
Dec 03 11:26:09  prometheus[23468]: /go/src/github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/head.go:257 +0x281 fp=0xc000695f58 sp=0xc000695ea0 pc=0x1477d81
Dec 03 11:26:09  prometheus[23468]: github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb.(*Head).loadWAL.func1(0xc00015a370, 0xc2132c18f8, 0x4, 0xc2132c1900, 0xc2132c1910, 0x1, 0xc1e63d59e0, 0xc1e63d5a40)
Dec 03 11:26:09  prometheus[23468]: /go/src/github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/head.go:324 +0x63 fp=0xc000695fa0 sp=0xc000695f58 pc=0x1492db3
Dec 03 11:26:09  prometheus[23468]: runtime.goexit()
Dec 03 11:26:09  prometheus[23468]: /usr/local/go/src/runtime/asm_amd64.s:1333 +0x1 fp=0xc000695fa8 sp=0xc000695fa0 pc=0x45bcc1
Dec 03 11:26:09  prometheus[23468]: created by github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb.(*Head).loadWAL
Dec 03 11:26:09  prometheus[23468]: /go/src/github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/head.go:323 +0x1c7
Dec 03 11:26:09  prometheus[23468]: goroutine 1 [chan receive, 1 minutes]:
Dec 03 11:26:09  prometheus[23468]: github.com/prometheus/prometheus/vendor/github.com/oklog/oklog/pkg/group.(*Group).Run(0xc00066be18, 0xc000123af0, 0x8)
Dec 03 11:26:10  systemd[1]: prometheus.service: main process exited, code=exited, status=2/INVALIDARGUMENT
Dec 03 11:26:10  systemd[1]: Unit prometheus.service entered failed state.
Dec 03 11:26:10  systemd[1]: prometheus.service failed.
Dec 03 11:26:15  systemd[1]: prometheus.service holdoff time over, scheduling restart.
Dec 03 11:26:15  systemd[1]: Started Prometheus Server.
Dec 03 11:26:15  systemd[1]: Starting Prometheus Server...
Dec 03 11:26:15  prometheus[23483]: level=info ts=2018-12-03T16:26:15.133029748Z caller=main.go:244 msg="Starting Prometheus" version="(version=2.5.0, branch=HEAD, revision=67dc912ac8b24f94a1fc478f352d25179c94ab9b)"
Dec 03 11:26:15  prometheus[23483]: level=info ts=2018-12-03T16:26:15.133509894Z caller=main.go:245 build_context="(go=go1.11.1, user=root@578ab108d0b9, date=20181106-11:40:44)"
Dec 03 11:26:15  prometheus[23483]: level=info ts=2018-12-03T16:26:15.133538695Z caller=main.go:246 host_details="(Linux 3.10.0-862.14.4.el7.x86_64 #1 SMP Wed Sep 26 15:12:11 UTC 2018 x86_64  (none))"
Dec 03 11:26:15  prometheus[23483]: level=info ts=2018-12-03T16:26:15.133560358Z caller=main.go:247 fd_limits="(soft=50000, hard=50000)"
Dec 03 11:26:15  prometheus[23483]: level=info ts=2018-12-03T16:26:15.133577565Z caller=main.go:248 vm_limits="(soft=unlimited, hard=unlimited)"
Dec 03 11:26:15  prometheus[23483]: level=info ts=2018-12-03T16:26:15.134370002Z caller=main.go:562 msg="Starting TSDB ..."
Dec 03 11:26:15  prometheus[23483]: level=info ts=2018-12-03T16:26:15.134462491Z caller=web.go:399 component=web msg="Start listening for connections" address=0.0.0.0:9090
Dec 03 11:26:15  prometheus[23483]: level=info ts=2018-12-03T16:26:15.136096655Z caller=repair.go:35 component=tsdb msg="found healthy block" mint=1527400800000 maxt=1532649600000 ulid=01CKCWHRZHF7VSF2S036BBEVN6

@simonpasquier

This comment has been minimized.

Copy link
Member

simonpasquier commented Jan 17, 2019

How much RAM do you have on this machine? Please also attach the output of the promtool debug all ... command. In general, you shouldn't have to change storage.tsdb.min-block-duration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.