Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prometheus 2.2.1 "out of memory" when starting TSDB #4047

Closed
Kirchen99 opened this Issue Apr 5, 2018 · 16 comments

Comments

Projects
None yet
9 participants
@Kirchen99
Copy link

Kirchen99 commented Apr 5, 2018

What did you do?
Starting Prometheus in Container
What did you expect to see?
Normal behavior
What did you see instead? Under which circumstances?
fatal error: runtime: out of memory

However, after I run "docker-compose down -v" to delete the volume, it works again.

I got the same issue when I ran Prometheus in kubernetes with StatefulSet. Prometheus hanged up at "Starting TSDB...". After I deleted the PersistentVolume, Prometheus comes up as usual.

Environment

  • System information:

Linux 4.9.32-15.41.amzn1.x86_64 x86_64

  • Prometheus version:

    2.2.1

  • Prometheus configuration file:

global:
  scrape_interval: 5s
  scrape_timeout: 5s
  evaluation_interval: 1m

alerting:
  alertmanagers:
  - path_prefix: devops-eval-prometheus1-app1/
  - static_configs:
    - targets:
      - alertmanager:9093
    scheme: http
    timeout: 10s

rule_files:
- /srv/rules/*.yml
scrape_configs:
- job_name: nodecadvisor
  scrape_interval: 5s
  scrape_timeout: 5s
  metrics_path: /metrics
  scheme: http
  static_configs:
  - targets:
    - cadvisor:8080

- job_name: consul
  scrape_interval: 5s
  scrape_timeout: 5s
  metrics_path: /metrics
  consul_sd_configs:
  - server: asds-dev-elastic01-es-master01:8500
    datacenter: asds-dev-elastic01
    services: ['node-exporter','cadvisor']
  relabel_configs:
    - source_labels: ['__meta_consul_node']
      regex:         (.+)
      target_label:  instance
      replacement:   '${1}'
      action: replace
    - source_labels: ['__meta_consul_address']
      regex:         (.+)
      target_label:  ip
      replacement:   '${1}'
      action: replace
    - source_labels: ['__meta_consul_service_id']
      regex:         (.+)
      target_label:  service_id
      replacement:   '${1}'
      action: replace
    - source_labels: ['__meta_consul_tags']
      separator: ","
      regex:         ',(.+),(.+),'
      target_label:  version
      replacement:   '${2}'
      action: replace
  • Logs:
prometheus_1       | level=info ts=2018-04-05T09:39:28.840531407Z caller=main.go:220 msg="Starting Prometheus" version="(version=2.2.1, branch=HEAD, revision=bc6058c81272a8d938c05e75607371284236aadc)"
prometheus_1       | level=info ts=2018-04-05T09:39:28.840586103Z caller=main.go:221 build_context="(go=go1.10, user=root@149e5b3f0829, date=20180314-14:15:45)"
prometheus_1       | level=info ts=2018-04-05T09:39:28.840612693Z caller=main.go:222 host_details="(Linux 4.9.32-15.41.amzn1.x86_64 #1 SMP Thu Jun 22 06:20:54 UTC 2017 x86_64 958d383a8236 (none))"
prometheus_1       | level=info ts=2018-04-05T09:39:28.840636264Z caller=main.go:223 fd_limits="(soft=1024, hard=4096)"
prometheus_1       | level=info ts=2018-04-05T09:39:28.844252402Z caller=main.go:504 msg="Starting TSDB ..."
prometheus_1       | level=info ts=2018-04-05T09:39:28.845984174Z caller=web.go:382 component=web msg="Start listening for connections" address=0.0.0.0:9090
prometheus_1       | fatal error: runtime: out of memory
prometheus_1       |
prometheus_1       | runtime stack:
prometheus_1       | runtime.throw(0x1af4202, 0x16)
prometheus_1       |    /usr/local/go/src/runtime/panic.go:619 +0x81
prometheus_1       | runtime.sysMap(0xc5e4040000, 0x100000, 0xc420179d00, 0x28a6138)
prometheus_1       |    /usr/local/go/src/runtime/mem_linux.go:216 +0x20a
prometheus_1       | runtime.(*mheap).sysAlloc(0x288c9c0, 0x100000, 0x7fc60e202db0)
prometheus_1       |    /usr/local/go/src/runtime/malloc.go:470 +0xd4
prometheus_1       | runtime.(*mheap).grow(0x288c9c0, 0x1, 0x0)
prometheus_1       |    /usr/local/go/src/runtime/mheap.go:907 +0x60
prometheus_1       | runtime.(*mheap).allocSpanLocked(0x288c9c0, 0x1, 0x28a6148, 0x7fc60e202db0)
prometheus_1       |    /usr/local/go/src/runtime/mheap.go:820 +0x301
prometheus_1       | runtime.(*mheap).alloc_m(0x288c9c0, 0x1, 0xc42004003f, 0x7fc60e202db0)
prometheus_1       |    /usr/local/go/src/runtime/mheap.go:686 +0x118
prometheus_1       | runtime.(*mheap).alloc.func1()
prometheus_1       |    /usr/local/go/src/runtime/mheap.go:753 +0x4d
prometheus_1       | runtime.(*mheap).alloc(0x288c9c0, 0x1, 0x7fc60e01003f, 0x7fc60e202db0)
prometheus_1       |    /usr/local/go/src/runtime/mheap.go:752 +0x8a
prometheus_1       | runtime.(*mcentral).grow(0x288ecd0, 0x0)
prometheus_1       |    /usr/local/go/src/runtime/mcentral.go:232 +0x94
prometheus_1       | runtime.(*mcentral).cacheSpan(0x288ecd0, 0x7fc60e202db0)
prometheus_1       |    /usr/local/go/src/runtime/mcentral.go:106 +0x2e4
prometheus_1       | runtime.(*mcache).refill(0x7fc9f1a336c8, 0xc42004653f)
prometheus_1       |    /usr/local/go/src/runtime/mcache.go:123 +0x9c
prometheus_1       | runtime.(*mcache).nextFree.func1()
prometheus_1       |    /usr/local/go/src/runtime/malloc.go:556 +0x32
prometheus_1       | runtime.systemstack(0x0)
prometheus_1       |    /usr/local/go/src/runtime/asm_amd64.s:409 +0x79
prometheus_1       | runtime.mstart()
prometheus_1       |    /usr/local/go/src/runtime/proc.go:1170
prometheus_1       |
prometheus_1       | goroutine 170 [running]:
prometheus_1       | runtime.systemstack_switch()
prometheus_1       |    /usr/local/go/src/runtime/asm_amd64.s:363 fp=0xc424108c00 sp=0xc424108bf8 pc=0x457ee0
prometheus_1       | runtime.(*mcache).nextFree(0x7fc9f1a336c8, 0xc424108c3f, 0x441f58, 0xc5e403fe01, 0x1ff)
prometheus_1       |    /usr/local/go/src/runtime/malloc.go:555 +0xa9 fp=0xc424108c58 sp=0xc424108c00 pc=0x4101f9
prometheus_1       | runtime.mallocgc(0x400, 0x0, 0x200, 0x400)
prometheus_1       |    /usr/local/go/src/runtime/malloc.go:710 +0x79f fp=0xc424108cf8 sp=0xc424108c58 pc=0x410b4f
prometheus_1       | runtime.growslice(0x1744e80, 0xc5e2775600, 0x200, 0x200, 0x201, 0xc5e403f400, 0x3f383a8000000002, 0x400)
prometheus_1       |    /usr/local/go/src/runtime/slice.go:172 +0x21d fp=0xc424108d60 sp=0xc424108cf8 pc=0x441f0d
prometheus_1       | github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/chunkenc.(*bstream).writeBit(...)
prometheus_1       |    /go/src/github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/chunkenc/bstream.go:79
prometheus_1       | github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/chunkenc.(*bstream).writeBits(0xc5e1c9d9c0, 0x0, 0x1)
prometheus_1       |    /go/src/github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/chunkenc/bstream.go:118 +0x2de fp=0xc424108dd0 sp=0xc424108d60 pc=0x147b6ce
prometheus_1       | github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/chunkenc.(*xorAppender).Append(0xc5e1c9ec60, 0x162793587f3, 0x0)
prometheus_1       |    /go/src/github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/chunkenc/xor.go:165 +0x54f fp=0xc424108e80 sp=0xc424108dd0 pc=0x147c8ef
prometheus_1       | github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb.(*memSeries).append(0xc423498e70, 0x162793587f3, 0x0, 0x700001)
prometheus_1       |    /go/src/github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/head.go:1221 +0x126 fp=0xc424108ec0 sp=0xc424108e80 pc=0x14a8a86
prometheus_1       | github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb.(*Head).processWALSamples(0xc4201a8d20, 0x16267544a00, 0x1, 0x2, 0xc42044a300, 0xc42044a360, 0x60e57)
prometheus_1       |    /go/src/github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/head.go:219 +0x16f fp=0xc424108f58 sp=0xc424108ec0 pc=0x14a33bf
prometheus_1       | github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb.(*Head).ReadWAL.func1(0xc4201a8d20, 0x16267544a00, 0x2, 0xc4216fd3e8, 0xc4216fd3f0, 0x1, 0xc42044a300, 0xc42044a360)
prometheus_1       |    /go/src/github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/head.go:256 +0x60 fp=0xc424108fa0 sp=0xc424108f58 pc=0x14b91f0
prometheus_1       | runtime.goexit()
prometheus_1       |    /usr/local/go/src/runtime/asm_amd64.s:2361 +0x1 fp=0xc424108fa8 sp=0xc424108fa0 pc=0x45aa01
prometheus_1       | created by github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb.(*Head).ReadWAL
prometheus_1       |    /go/src/github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/head.go:255 +0x1e8
prometheus_1       |
prometheus_1       | goroutine 1 [chan receive, 5 minutes]:
prometheus_1       | github.com/prometheus/prometheus/vendor/github.com/oklog/oklog/pkg/group.(*Group).Run(0xc420adbc88, 0xc42050c6f0, 0x8)
prometheus_1       |    /go/src/github.com/prometheus/prometheus/vendor/github.com/oklog/oklog/pkg/group/group.go:43 +0xec
prometheus_1       | main.main()
prometheus_1       |    /go/src/github.com/prometheus/prometheus/cmd/prometheus/main.go:581 +0x5237
prometheus_1       |
prometheus_1       | goroutine 19 [syscall, 5 minutes]:
prometheus_1       | os/signal.signal_recv(0x0)
prometheus_1       |    /usr/local/go/src/runtime/sigqueue.go:139 +0xa6
prometheus_1       | os/signal.loop()
prometheus_1       |    /usr/local/go/src/os/signal/signal_unix.go:22 +0x22
prometheus_1       | created by os/signal.init.0
prometheus_1       |    /usr/local/go/src/os/signal/signal_unix.go:28 +0x41
prometheus_1       |
prometheus_1       | goroutine 4 [chan receive, 1 minutes]:
prometheus_1       | github.com/prometheus/prometheus/vendor/github.com/golang/glog.(*loggingT).flushDaemon(0x2883600)
prometheus_1       |    /go/src/github.com/prometheus/prometheus/vendor/github.com/golang/glog/glog.go:879 +0x8b
prometheus_1       | created by github.com/prometheus/prometheus/vendor/github.com/golang/glog.init.0
prometheus_1       |    /go/src/github.com/prometheus/prometheus/vendor/github.com/golang/glog/glog.go:410 +0x203
prometheus_1       |
prometheus_1       | goroutine 46 [chan receive (nil chan), 5 minutes]:
prometheus_1       | github.com/prometheus/prometheus/prompb.RegisterAdminHandlerFromEndpoint.func1.1(0x7fc9f19dc150, 0xc420094020, 0xc4200b1380, 0x1ae4f89, 0xc)
prometheus_1       |    /go/src/github.com/prometheus/prometheus/prompb/rpc.pb.gw.go:85 +0x4c
prometheus_1       | created by github.com/prometheus/prometheus/prompb.RegisterAdminHandlerFromEndpoint.func1
prometheus_1       |    /go/src/github.com/prometheus/prometheus/prompb/rpc.pb.gw.go:84 +0x19b
prometheus_1       |
prometheus_1       | goroutine 141 [select, 5 minutes, locked to thread]:
prometheus_1       | runtime.gopark(0x1b6e6d0, 0x0, 0x1ade841, 0x6, 0x18, 0x1)
prometheus_1       |    /usr/local/go/src/runtime/proc.go:291 +0x11a
prometheus_1       | runtime.selectgo(0xc420573f50, 0xc420088900)
prometheus_1       |    /usr/local/go/src/runtime/select.go:392 +0xe50
prometheus_1       | runtime.ensureSigM.func1()
prometheus_1       |    /usr/local/go/src/runtime/signal_unix.go:549 +0x1f4
prometheus_1       | runtime.goexit()
prometheus_1       |    /usr/local/go/src/runtime/asm_amd64.s:2361 +0x1
prometheus_1       |
prometheus_1       | goroutine 142 [select, 5 minutes]:
prometheus_1       | main.main.func6(0xc420573818, 0xc420573838)
prometheus_1       |    /go/src/github.com/prometheus/prometheus/cmd/prometheus/main.go:372 +0x121
prometheus_1       | github.com/prometheus/prometheus/vendor/github.com/oklog/oklog/pkg/group.(*Group).Run.func1(0xc42012a900, 0xc4200c2380, 0xc42050c670)
prometheus_1       |    /go/src/github.com/prometheus/prometheus/vendor/github.com/oklog/oklog/pkg/group/group.go:38 +0x27
prometheus_1       | created by github.com/prometheus/prometheus/vendor/github.com/oklog/oklog/pkg/group.(*Group).Run
prometheus_1       |    /go/src/github.com/prometheus/prometheus/vendor/github.com/oklog/oklog/pkg/group/group.go:37 +0xa8


prometheus_1       | github.com/prometheus/prometheus/discovery.(*Manager).Run(0xc42041c620, 0x0, 0x0)
prometheus_1       |    /go/src/github.com/prometheus/prometheus/discovery/manager.go:93 +0x50
prometheus_1       | main.main.func8(0x0, 0x0)
prometheus_1       |    /go/src/github.com/prometheus/prometheus/cmd/prometheus/main.go:393 +0x40
prometheus_1       | github.com/prometheus/prometheus/vendor/github.com/oklog/oklog/pkg/group.(*Group).Run.func1(0xc42012a900, 0xc42045eca0, 0xc42045ecc0)


prometheus_1       |    /go/src/github.com/prometheus/prometheus/vendor/github.com/oklog/oklog/pkg/group/group.go:37 +0xa8
prometheus_1       |
prometheus_1       | goroutine 144 [chan receive, 5 minutes]:
prometheus_1       | github.com/prometheus/prometheus/discovery.(*Manager).Run(0xc42041c690, 0x0, 0x0)
prometheus_1       |    /go/src/github.com/prometheus/prometheus/discovery/manager.go:93 +0x50
prometheus_1       | main.main.func10(0x0, 0x0)
prometheus_1       |    /go/src/github.com/prometheus/prometheus/cmd/prometheus/main.go:406 +0x40
prometheus_1       | github.com/prometheus/prometheus/vendor/github.com/oklog/oklog/pkg/group.(*Group).Run.func1(0xc42012a900, 0xc42045ed20, 0xc42045ed60)
prometheus_1       |    /go/src/github.com/prometheus/prometheus/vendor/github.com/oklog/oklog/pkg/group/group.go:38 +0x27
prometheus_1       | created by github.com/prometheus/prometheus/vendor/github.com/oklog/oklog/pkg/group.(*Group).Run
prometheus_1       |    /go/src/github.com/prometheus/prometheus/vendor/github.com/oklog/oklog/pkg/group/group.go:37 +0xa8
prometheus_1       |
prometheus_1       | goroutine 145 [chan receive, 5 minutes]:
prometheus_1       | main.main.func12(0x0, 0x0)
prometheus_1       |    /go/src/github.com/prometheus/prometheus/cmd/prometheus/main.go:423 +0x5e
prometheus_1       | github.com/prometheus/prometheus/vendor/github.com/oklog/oklog/pkg/group.(*Group).Run.func1(0xc42012a900, 0xc42036fc80, 0xc42045ed80)
prometheus_1       |    /go/src/github.com/prometheus/prometheus/vendor/github.com/oklog/oklog/pkg/group/group.go:38 +0x27
prometheus_1       | created by github.com/prometheus/prometheus/vendor/github.com/oklog/oklog/pkg/group.(*Group).Run
prometheus_1       |    /go/src/github.com/prometheus/prometheus/vendor/github.com/oklog/oklog/pkg/group/group.go:37 +0xa8
prometheus_1       |
prometheus_1       | goroutine 146 [chan receive, 5 minutes]:
prometheus_1       | main.main.func14(0x0, 0x0)
prometheus_1       |    /go/src/github.com/prometheus/prometheus/cmd/prometheus/main.go:445 +0x8f
prometheus_1       | github.com/prometheus/prometheus/vendor/github.com/oklog/oklog/pkg/group.(*Group).Run.func1(0xc42012a900, 0xc42012a840, 0xc42050c6b0)
prometheus_1       |    /go/src/github.com/prometheus/prometheus/vendor/github.com/oklog/oklog/pkg/group/group.go:38 +0x27
prometheus_1       | created by github.com/prometheus/prometheus/vendor/github.com/oklog/oklog/pkg/group.(*Group).Run
prometheus_1       |    /go/src/github.com/prometheus/prometheus/vendor/github.com/oklog/oklog/pkg/group/group.go:37 +0xa8
prometheus_1       |
prometheus_1       | goroutine 147 [select, 5 minutes]:
prometheus_1       | main.main.func16(0x0, 0x0)
prometheus_1       |    /go/src/github.com/prometheus/prometheus/cmd/prometheus/main.go:475 +0x113
prometheus_1       | github.com/prometheus/prometheus/vendor/github.com/oklog/oklog/pkg/group.(*Group).Run.func1(0xc42012a900, 0xc42012a8a0, 0xc42050c6d0)
prometheus_1       |    /go/src/github.com/prometheus/prometheus/vendor/github.com/oklog/oklog/pkg/group/group.go:38 +0x27
prometheus_1       | created by github.com/prometheus/prometheus/vendor/github.com/oklog/oklog/pkg/group.(*Group).Run
prometheus_1       |    /go/src/github.com/prometheus/prometheus/vendor/github.com/oklog/oklog/pkg/group/group.go:37 +0xa8
prometheus_1       |
prometheus_1       | goroutine 148 [runnable]:
prometheus_1       | github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb.(*walReader).entry(0xc42021f3b0, 0x1c37580, 0xc4203972e0, 0xfe1b027, 0x0, 0x0, 0xc42009be00, 0xc42044a418, 0xc420b63a00)
prometheus_1       |    /go/src/github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/wal.go:1087 +0x122
prometheus_1       | github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb.(*walReader).next(0xc42021f3b0, 0xc420b63ac0)
prometheus_1       |    /go/src/github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/wal.go:1032 +0xee
prometheus_1       | github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb.(*walReader).Read(0xc42021f3b0, 0xc42047e830, 0xc4217081a0, 0xc4217081c0, 0xc42044a360, 0xc420b63be8)
prometheus_1       |    /go/src/github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/wal.go:926 +0x184
prometheus_1       | github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb.(*repairingWALReader).Read(0xc421708180, 0xc42047e830, 0xc4217081a0, 0xc4217081c0, 0x2, 0xc4216fd3e8)
prometheus_1       |    /go/src/github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/wal.go:261 +0x5f
prometheus_1       | github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb.(*Head).ReadWAL(0xc4201a8d20, 0x0, 0x0)
prometheus_1       |    /go/src/github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/head.go:308 +0x321
prometheus_1       | github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb.Open(0x7ffcfaba6f62, 0xb, 0x1c37240, 0xc4205264e0, 0x1c475c0, 0xc42009c6c0, 0xc420526510, 0xc42043a500, 0x0, 0x0)
prometheus_1       |    /go/src/github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/db.go:239 +0x52d
prometheus_1       | github.com/prometheus/prometheus/storage/tsdb.Open(0x7ffcfaba6f62, 0xb, 0x1c37240, 0xc4205264e0, 0x1c475c0, 0xc42009c6c0, 0xc42015ad88, 0x0, 0x0, 0x0)
prometheus_1       |    /go/src/github.com/prometheus/prometheus/storage/tsdb/tsdb.go:143 +0x293
prometheus_1       | main.main.func18(0x0, 0x0)
prometheus_1       |    /go/src/github.com/prometheus/prometheus/cmd/prometheus/main.go:505 +0x1f6
prometheus_1       | github.com/prometheus/prometheus/vendor/github.com/oklog/oklog/pkg/group.(*Group).Run.func1(0xc42012a900, 0xc4200c2400, 0xc42036fe00)
prometheus_1       |    /go/src/github.com/prometheus/prometheus/vendor/github.com/oklog/oklog/pkg/group/group.go:38 +0x27
prometheus_1       | created by github.com/prometheus/prometheus/vendor/github.com/oklog/oklog/pkg/group.(*Group).Run
prometheus_1       |    /go/src/github.com/prometheus/prometheus/vendor/github.com/oklog/oklog/pkg/group/group.go:37 +0xa8
prometheus_1       |
prometheus_1       | goroutine 149 [select, 5 minutes]:
prometheus_1       | github.com/prometheus/prometheus/web.(*Handler).Run(0xc42014dd00, 0x1c50d80, 0xc42032c480, 0x0, 0x0)
prometheus_1       |    /go/src/github.com/prometheus/prometheus/web/web.go:470 +0xe6b
prometheus_1       | main.main.func20(0x0, 0x0)
prometheus_1       |    /go/src/github.com/prometheus/prometheus/cmd/prometheus/main.go:533 +0x40
prometheus_1       | github.com/prometheus/prometheus/vendor/github.com/oklog/oklog/pkg/group.(*Group).Run.func1(0xc42012a900, 0xc42045eda0, 0xc42050c6e0)
prometheus_1       |    /go/src/github.com/prometheus/prometheus/vendor/github.com/oklog/oklog/pkg/group/group.go:38 +0x27
prometheus_1       | created by github.com/prometheus/prometheus/vendor/github.com/oklog/oklog/pkg/group.(*Group).Run
prometheus_1       |    /go/src/github.com/prometheus/prometheus/vendor/github.com/oklog/oklog/pkg/group/group.go:37 +0xa8
prometheus_1       |
prometheus_1       | goroutine 150 [chan receive, 5 minutes]:
prometheus_1       | main.main.func22(0x1, 0x0)
prometheus_1       |    /go/src/github.com/prometheus/prometheus/cmd/prometheus/main.go:552 +0x4e
prometheus_1       | github.com/prometheus/prometheus/vendor/github.com/oklog/oklog/pkg/group.(*Group).Run.func1(0xc42012a900, 0xc42045edc0, 0xc42045ee20)
prometheus_1       |    /go/src/github.com/prometheus/prometheus/vendor/github.com/oklog/oklog/pkg/group/group.go:38 +0x27
prometheus_1       | created by github.com/prometheus/prometheus/vendor/github.com/oklog/oklog/pkg/group.(*Group).Run
prometheus_1       |    /go/src/github.com/prometheus/prometheus/vendor/github.com/oklog/oklog/pkg/group/group.go:37 +0xa8
prometheus_1       |
prometheus_1       | goroutine 151 [chan receive, 5 minutes]:
prometheus_1       | main.main.func24(0x0, 0x0)
prometheus_1       |    /go/src/github.com/prometheus/prometheus/cmd/prometheus/main.go:570 +0x5e
prometheus_1       | github.com/prometheus/prometheus/vendor/github.com/oklog/oklog/pkg/group.(*Group).Run.func1(0xc42012a900, 0xc42036ff20, 0xc42050c6f0)
prometheus_1       |    /go/src/github.com/prometheus/prometheus/vendor/github.com/oklog/oklog/pkg/group/group.go:38 +0x27
prometheus_1       | created by github.com/prometheus/prometheus/vendor/github.com/oklog/oklog/pkg/group.(*Group).Run
prometheus_1       |    /go/src/github.com/prometheus/prometheus/vendor/github.com/oklog/oklog/pkg/group/group.go:37 +0xa8
prometheus_1       |
prometheus_1       | goroutine 167 [select]:
prometheus_1       | github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb.(*SegmentWAL).run(0xc4202e2280, 0x2540be400)
prometheus_1       |    /go/src/github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/wal.go:704 +0x36e
prometheus_1       | created by github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb.OpenSegmentWAL
prometheus_1       |    /go/src/github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/wal.go:244 +0x776
prometheus_1       |
prometheus_1       | goroutine 45 [select, 5 minutes]:
prometheus_1       | github.com/prometheus/prometheus/vendor/google.golang.org/grpc.(*addrConn).transportMonitor(0xc4200b1ba0)
prometheus_1       |    /go/src/github.com/prometheus/prometheus/vendor/google.golang.org/grpc/clientconn.go:908 +0x1c0
prometheus_1       | github.com/prometheus/prometheus/vendor/google.golang.org/grpc.(*ClientConn).resetAddrConn.func1(0xc4200b1ba0)
prometheus_1       |    /go/src/github.com/prometheus/prometheus/vendor/google.golang.org/grpc/clientconn.go:637 +0x1af
prometheus_1       | created by github.com/prometheus/prometheus/vendor/google.golang.org/grpc.(*ClientConn).resetAddrConn
prometheus_1       |    /go/src/github.com/prometheus/prometheus/vendor/google.golang.org/grpc/clientconn.go:628 +0x6d8
prometheus_1       |
prometheus_1       | goroutine 239 [chan receive, 5 minutes]:
prometheus_1       | github.com/prometheus/prometheus/vendor/github.com/cockroachdb/cmux.muxListener.Accept(...)
prometheus_1       |    /go/src/github.com/prometheus/prometheus/vendor/github.com/cockroachdb/cmux/cmux.go:184
prometheus_1       | github.com/prometheus/prometheus/vendor/github.com/cockroachdb/cmux.(*muxListener).Accept(0xc4203d65c0, 0xc420094020, 0x180fc40, 0x28679d0, 0x1a98bc0)
prometheus_1       |    <autogenerated>:1 +0x5b
prometheus_1       | net/http.(*Server).Serve(0xc4204c6340, 0x1c4f480, 0xc4203d65c0, 0x0, 0x0)
prometheus_1       |    /usr/local/go/src/net/http/server.go:2770 +0x1a5
prometheus_1       | github.com/prometheus/prometheus/web.(*Handler).Run.func5(0xc4204c6340, 0x1c4f480, 0xc4203d65c0, 0xc42014dd00)
prometheus_1       |    /go/src/github.com/prometheus/prometheus/web/web.go:455 +0x43
prometheus_1       | created by github.com/prometheus/prometheus/web.(*Handler).Run
prometheus_1       |    /go/src/github.com/prometheus/prometheus/web/web.go:454 +0xced
prometheus_1       |
prometheus_1       | goroutine 240 [chan receive, 5 minutes]:
prometheus_1       | github.com/prometheus/prometheus/vendor/github.com/cockroachdb/cmux.muxListener.Accept(...)
prometheus_1       |    /go/src/github.com/prometheus/prometheus/vendor/github.com/cockroachdb/cmux/cmux.go:184
prometheus_1       | github.com/prometheus/prometheus/vendor/github.com/cockroachdb/cmux.(*muxListener).Accept(0xc4203d63a0, 0x1b6a478, 0xc420394240, 0x1c4f480, 0xc4203d63a0)
prometheus_1       |    <autogenerated>:1 +0x5b
prometheus_1       | github.com/prometheus/prometheus/vendor/google.golang.org/grpc.(*Server).Serve(0xc420394240, 0x1c4f480, 0xc4203d63a0, 0x0, 0x0)


prometheus_1       |    /go/src/github.com/prometheus/prometheus/web/web.go:460 +0x43
prometheus_1       | created by github.com/prometheus/prometheus/web.(*Handler).Run
prometheus_1       |    /go/src/github.com/prometheus/prometheus/web/web.go:459 +0xd39
prometheus_1       |
prometheus_1       | goroutine 241 [IO wait, 5 minutes]:


prometheus_1       | internal/poll.(*pollDesc).wait(0xc42047c298, 0x72, 0xc42009d200, 0x0, 0x0)
prometheus_1       |    /usr/local/go/src/internal/poll/fd_poll_runtime.go:85 +0x9b
prometheus_1       | internal/poll.(*pollDesc).waitRead(0xc42047c298, 0xffffffffffffff00, 0x0, 0x0)
prometheus_1       |    /usr/local/go/src/internal/poll/fd_poll_runtime.go:90 +0x3d
prometheus_1       | internal/poll.(*FD).Accept(0xc42047c280, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
prometheus_1       |    /usr/local/go/src/internal/poll/fd_unix.go:372 +0x1a8
prometheus_1       | net.(*netFD).accept(0xc42047c280, 0x1c59ea0, 0xc420312058, 0x1adc177)
prometheus_1       |    /usr/local/go/src/net/fd_unix.go:238 +0x42
prometheus_1       | net.(*TCPListener).accept(0xc420096018, 0xc420312000, 0xc42091ee98, 0x1)
prometheus_1       |    /usr/local/go/src/net/tcpsock_posix.go:136 +0x2e
prometheus_1       | net.(*TCPListener).Accept(0xc420096018, 0xc42091ee98, 0x1955540, 0x1955540, 0xc420dcd290)
prometheus_1       |    /usr/local/go/src/net/tcpsock.go:259 +0x49
prometheus_1       | github.com/prometheus/prometheus/vendor/golang.org/x/net/netutil.(*limitListener).Accept(0xc4203d61e0, 0x4346f4, 0xc42091eee8, 0x4571a0, 0xc42091ef28)
prometheus_1       |    /go/src/github.com/prometheus/prometheus/vendor/golang.org/x/net/netutil/listen.go:30 +0x53
prometheus_1       | github.com/prometheus/prometheus/vendor/github.com/mwitkow/go-conntrack.(*connTrackListener).Accept(0xc4203d6360, 0x1b678a0, 0xc42009d040, 0x1c5d9e0, 0xc420dcd320)
prometheus_1       |    /go/src/github.com/prometheus/prometheus/vendor/github.com/mwitkow/go-conntrack/listener_wrapper.go:86 +0x37
prometheus_1       | github.com/prometheus/prometheus/vendor/github.com/cockroachdb/cmux.(*cMux).Serve(0xc42009d040, 0x0, 0x0)
prometheus_1       |    /go/src/github.com/prometheus/prometheus/vendor/github.com/cockroachdb/cmux/cmux.go:124 +0x88
prometheus_1       | github.com/prometheus/prometheus/web.(*Handler).Run.func7(0xc420312180, 0x1c45ec0, 0xc42009d040)
prometheus_1       |    /go/src/github.com/prometheus/prometheus/web/web.go:467 +0x31
prometheus_1       | created by github.com/prometheus/prometheus/web.(*Handler).Run
prometheus_1       |    /go/src/github.com/prometheus/prometheus/web/web.go:466 +0xd8e
prometheus_1       |
prometheus_1       | goroutine 243 [IO wait, 5 minutes]:
prometheus_1       | internal/poll.runtime_pollWait(0x7fc9f19d7e30, 0x72, 0xc420921b88)
prometheus_1       |    /usr/local/go/src/runtime/netpoll.go:173 +0x57
prometheus_1       | internal/poll.(*pollDesc).wait(0xc42047c718, 0x72, 0xffffffffffffff00, 0x1c3be60, 0x27b7638)
prometheus_1       |    /usr/local/go/src/internal/poll/fd_poll_runtime.go:85 +0x9b
prometheus_1       | internal/poll.(*pollDesc).waitRead(0xc42047c718, 0xc420e7c000, 0x8000, 0x8000)
prometheus_1       |    /usr/local/go/src/internal/poll/fd_poll_runtime.go:90 +0x3d
prometheus_1       | internal/poll.(*FD).Read(0xc42047c700, 0xc420e7c000, 0x8000, 0x8000, 0x0, 0x0, 0x0)
prometheus_1       |    /usr/local/go/src/internal/poll/fd_unix.go:157 +0x17d
prometheus_1       | net.(*netFD).Read(0xc42047c700, 0xc420e7c000, 0x8000, 0x8000, 0x60, 0x0, 0xc420057740)
prometheus_1       |    /usr/local/go/src/net/fd_unix.go:202 +0x4f
prometheus_1       | net.(*conn).Read(0xc420096cd8, 0xc420e7c000, 0x8000, 0x8000, 0x0, 0x0, 0x0)
prometheus_1       |    /usr/local/go/src/net/net.go:176 +0x6a
prometheus_1       | bufio.(*Reader).Read(0xc42007e120, 0xc420648118, 0x9, 0x9, 0xc42017afb0, 0xc420057548, 0x411959)
prometheus_1       |    /usr/local/go/src/bufio/bufio.go:216 +0x238
prometheus_1       | io.ReadAtLeast(0x1c36960, 0xc42007e120, 0xc420648118, 0x9, 0x9, 0x9, 0x1c278da, 0xc4200b1ba0, 0xc4200575d8)
prometheus_1       |    /usr/local/go/src/io/io.go:309 +0x86
prometheus_1       | io.ReadFull(0x1c36960, 0xc42007e120, 0xc420648118, 0x9, 0x9, 0x403f2c, 0xc4202e4000, 0x4)
prometheus_1       |    /usr/local/go/src/io/io.go:327 +0x58
prometheus_1       | github.com/prometheus/prometheus/vendor/golang.org/x/net/http2.readFrameHeader(0xc420648118, 0x9, 0x9, 0x1c36960, 0xc42007e120, 0x0, 0xc400000000, 0x0, 0x2)
prometheus_1       |    /go/src/github.com/prometheus/prometheus/vendor/golang.org/x/net/http2/frame.go:237 +0x7b
prometheus_1       | github.com/prometheus/prometheus/vendor/golang.org/x/net/http2.(*Framer).ReadFrame(0xc4206480e0, 0xc42009d1c0, 0xc420057768, 0xc4203122a0, 0xc4200577a0)
prometheus_1       |    /go/src/github.com/prometheus/prometheus/vendor/golang.org/x/net/http2/frame.go:492 +0xa4
prometheus_1       | github.com/prometheus/prometheus/vendor/google.golang.org/grpc/transport.(*framer).readFrame(0xc420dcd110, 0xc4200577a0, 0x0, 0xc4203122a0, 0x53d70d)
prometheus_1       |    /go/src/github.com/prometheus/prometheus/vendor/google.golang.org/grpc/transport/http_util.go:608 +0x2f
prometheus_1       | github.com/prometheus/prometheus/vendor/google.golang.org/grpc/transport.(*http2Client).reader(0xc4202e4900)
prometheus_1       |    /go/src/github.com/prometheus/prometheus/vendor/google.golang.org/grpc/transport/http2_client.go:1080 +0x47
prometheus_1       | created by github.com/prometheus/prometheus/vendor/google.golang.org/grpc/transport.newHTTP2Client
prometheus_1       |    /go/src/github.com/prometheus/prometheus/vendor/google.golang.org/grpc/transport/http2_client.go:267 +0xb6c
prometheus_1       |
prometheus_1       | goroutine 244 [select, 5 minutes]:
prometheus_1       | github.com/prometheus/prometheus/vendor/google.golang.org/grpc/transport.(*http2Client).controller(0xc4202e4900)
prometheus_1       |    /go/src/github.com/prometheus/prometheus/vendor/google.golang.org/grpc/transport/http2_client.go:1168 +0x122
prometheus_1       | created by github.com/prometheus/prometheus/vendor/google.golang.org/grpc/transport.newHTTP2Client
prometheus_1       |    /go/src/github.com/prometheus/prometheus/vendor/google.golang.org/grpc/transport/http2_client.go:297 +0xca2
prometheus_1       |
prometheus_1       | goroutine 245 [IO wait, 5 minutes]:
prometheus_1       | internal/poll.runtime_pollWait(0x7fc9f19d7d60, 0x72, 0xc420e9e940)
prometheus_1       |    /usr/local/go/src/runtime/netpoll.go:173 +0x57
prometheus_1       | internal/poll.(*pollDesc).wait(0xc42047d818, 0x72, 0xffffffffffffff00, 0x1c3be60, 0x27b7638)
prometheus_1       |    /usr/local/go/src/internal/poll/fd_poll_runtime.go:85 +0x9b
prometheus_1       | internal/poll.(*pollDesc).waitRead(0xc42047d818, 0xc420648200, 0x9, 0x9)
prometheus_1       |    /usr/local/go/src/internal/poll/fd_poll_runtime.go:90 +0x3d
prometheus_1       | internal/poll.(*FD).Read(0xc42047d800, 0xc4206482d8, 0x9, 0x9, 0x0, 0x0, 0x0)
prometheus_1       |    /usr/local/go/src/internal/poll/fd_unix.go:157 +0x17d
prometheus_1       | net.(*netFD).Read(0xc42047d800, 0xc4206482d8, 0x9, 0x9, 0x4, 0x0, 0x0)
prometheus_1       |    /usr/local/go/src/net/fd_unix.go:202 +0x4f
prometheus_1       | net.(*conn).Read(0xc420096ce0, 0xc4206482d8, 0x9, 0x9, 0x0, 0x0, 0x0)
prometheus_1       |    /usr/local/go/src/net/net.go:176 +0x6a
prometheus_1       | github.com/prometheus/prometheus/vendor/github.com/cockroachdb/cmux.(*bufferedReader).Read(0xc4201a8898, 0xc4206482d8, 0x9, 0x9, 0x100000000000000, 0x0, 0x10)
prometheus_1       |    /go/src/github.com/prometheus/prometheus/vendor/github.com/cockroachdb/cmux/buffer.go:42 +0x120
prometheus_1       | io.ReadAtLeast(0x1c37180, 0xc4201a8898, 0xc4206482d8, 0x9, 0x9, 0x9, 0x410ef8, 0x10, 0x1964300)
prometheus_1       |    /usr/local/go/src/io/io.go:309 +0x86
prometheus_1       | io.ReadFull(0x1c37180, 0xc4201a8898, 0xc4206482d8, 0x9, 0x9, 0x32b5e23d542da301, 0xefff100000004, 0x7)
prometheus_1       |    /usr/local/go/src/io/io.go:327 +0x58
prometheus_1       | github.com/prometheus/prometheus/vendor/golang.org/x/net/http2.readFrameHeader(0xc4206482d8, 0x9, 0x9, 0x1c37180, 0xc4201a8898, 0x0, 0x0, 0xc420095ca0, 0x0)
prometheus_1       |    /go/src/github.com/prometheus/prometheus/vendor/golang.org/x/net/http2/frame.go:237 +0x7b
prometheus_1       | github.com/prometheus/prometheus/vendor/golang.org/x/net/http2.(*Framer).ReadFrame(0xc4206482a0, 0x1c3eb80, 0xc420095ca0, 0x0, 0x0)
prometheus_1       |    /go/src/github.com/prometheus/prometheus/vendor/golang.org/x/net/http2/frame.go:492 +0xa4
prometheus_1       | github.com/prometheus/prometheus/vendor/github.com/cockroachdb/cmux.matchHTTP2Field(0x1c37180, 0xc4201a8898, 0x1ae59d9, 0xc, 0x1aeaa9a, 0x10, 0x7fc9f19f94b8)
prometheus_1       |    /go/src/github.com/prometheus/prometheus/vendor/github.com/cockroachdb/cmux/matchers.go:145 +0x140
prometheus_1       | github.com/prometheus/prometheus/vendor/github.com/cockroachdb/cmux.HTTP2HeaderField.func1(0x1c37180, 0xc4201a8898, 0xc4200576a0)
prometheus_1       |    /go/src/github.com/prometheus/prometheus/vendor/github.com/cockroachdb/cmux/matchers.go:111 +0x59
prometheus_1       | github.com/prometheus/prometheus/vendor/github.com/cockroachdb/cmux.(*cMux).serve(0xc42009d040, 0x1c5d9e0, 0xc420dcd320, 0xc420312060, 0xc420095940)
prometheus_1       |    /go/src/github.com/prometheus/prometheus/vendor/github.com/cockroachdb/cmux/cmux.go:143 +0x1f3
prometheus_1       | created by github.com/prometheus/prometheus/vendor/github.com/cockroachdb/cmux.(*cMux).Serve
prometheus_1       |    /go/src/github.com/prometheus/prometheus/vendor/github.com/cockroachdb/cmux/cmux.go:133 +0x15d
prometheus_1       |
prometheus_1       | goroutine 169 [runnable]:
prometheus_1       | github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/chunkenc.(*bstream).writeBits(0xc5e1c63180, 0x4e20, 0x11)
prometheus_1       |    /go/src/github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/chunkenc/bstream.go:108 +0x349
prometheus_1       | github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/chunkenc.(*xorAppender).Append(0xc5e1c4af30, 0x1627935af03, 0x0)
prometheus_1       |    /go/src/github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/chunkenc/xor.go:166 +0x576
prometheus_1       | github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb.(*memSeries).append(0xc4234d40b0, 0x1627935af03, 0x0, 0x1)
prometheus_1       |    /go/src/github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/head.go:1221 +0x126
prometheus_1       | github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb.(*Head).processWALSamples(0xc4201a8d20, 0x16267544a00, 0x0, 0x2, 0xc42044a240, 0xc42044a300, 0x60b58)
prometheus_1       |    /go/src/github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/head.go:219 +0x16f
prometheus_1       | github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb.(*Head).ReadWAL.func1(0xc4201a8d20, 0x16267544a00, 0x2, 0xc4216fd3e8, 0xc4216fd3f0, 0x0, 0xc42044a240, 0xc42044a300)
prometheus_1       |    /go/src/github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/head.go:256 +0x60
prometheus_1       | created by github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb.(*Head).ReadWAL
prometheus_1       |    /go/src/github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/head.go:255 +0x1e8
prometheus_1       |
prometheus_1       | goroutine 171 [runnable]:
prometheus_1       | github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb.(*walReader).Read.func1(0xc420312960, 0xc42044a3c0, 0xc42047e830, 0xc4217081e0, 0xc4217081a0, 0xc421708200, 0xc4217081c0, 0xc421708220, 0xc42021f3b0) prometheus_1       |    /go/src/github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/wal.go:901 +0x7f
prometheus_1       | created by github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb.(*walReader).Read
prometheus_1       |    /go/src/github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/wal.go:898 +0x173```
@juliusv

This comment has been minimized.

Copy link
Member

juliusv commented Apr 5, 2018

@Kirchen99 It'd probably help if you included information about the container's configured memory limit, and what the load on the Prometheus server was (query for prometheus_tsdb_head_series and rate(prometheus_tsdb_head_samples_appended_total[5m]) on a server that still works. Maybe the head block is just too large to fit into the container's mem limit?

@Kirchen99

This comment has been minimized.

Copy link
Author

Kirchen99 commented Apr 5, 2018

rate(prometheus_tsdb_head_samples_appended_total[5m]) is 8137.149152542373
prometheus_tsdb_head_series is 62122
I have 8GB for memory.
Another thing I discovered is when restarting it takes longer and longer at "Starting TSDB..." At the beginning it is 1 minute, then 2 minutes, 3 minutes, 5 minitues, 15 minutes...

@dobesv

This comment was marked as off-topic.

Copy link

dobesv commented Apr 6, 2018

I'm having a similar problem here, I have set the memory limit in kubernetes to 12GB and the prometheus processes are just running up to that limit and getting killed by the kernel. Strangely I don't get the same stack trace as above but I'd love to have more information about how to track down the cause - or if > 12GB is normal memory usage for prometheus with my configuration, in which case I need to figure out how to reduce memory consumption.

image

image

@brian-brazil

This comment was marked as off-topic.

Copy link
Member

brian-brazil commented Apr 6, 2018

You have a lot of churn, so this usage is as expected.

@dobesv

This comment was marked as off-topic.

Copy link

dobesv commented Apr 6, 2018

What do you mean by churn? How can I reduce that?

@dobesv

This comment was marked as off-topic.

Copy link

dobesv commented Apr 6, 2018

Is there maybe a way I could merge time series together to use less memory?

@brian-brazil

This comment was marked as off-topic.

Copy link
Member

brian-brazil commented Apr 6, 2018

@dobesv Your questions are unrelated to the original issue, I'd suggest you take this to the prometheus-users mailing list.

@Kirchen99

This comment was marked as off-topic.

Copy link
Author

Kirchen99 commented Apr 10, 2018

Is it possible that there are already lots of data in storage, and Prometheus tries to load all data in one time in memory when it starts?

@dobesv

This comment was marked as off-topic.

Copy link

dobesv commented Apr 10, 2018

It normally doesn't. But it does keep data about any time series that was recorded in the last few hours in memory. It turns out that HAProxy was exporting millions of time series, so I added rules to prometheus to drop the ones I am not using. That helped a lot - so far I've cut memory needs in half. I'm still looking to see what other time series I can eliminate.

@dobesv

This comment was marked as off-topic.

Copy link

dobesv commented Apr 10, 2018

There's some useful information in a reply in this thread in google groups: https://groups.google.com/forum/#!topic/prometheus-users/XjBfxBaRRbU

@andrejmaya

This comment has been minimized.

Copy link

andrejmaya commented May 16, 2018

Hi, did somebody found any solution on the OOM issue during the startup when Prometheus compact a lot of data?

I am running prometheus v2.2.1 in OpenShift with 10GB memory limit and 27GB of data in /data directory. During the startup and these entries in the log:

level=info ts=2018-05-16T08:45:33.61776039Z caller=compact.go:393 component=tsdb msg="compact blocks" count=1 mint=1526443200000 maxt=1526450400000
level=info ts=2018-05-16T08:45:41.121433336Z caller=head.go:348 component=tsdb msg="head GC completed" duration=599.959505ms
level=info ts=2018-05-16T08:45:42.739168664Z caller=head.go:357 component=tsdb msg="WAL truncation completed" duration=1.616577116s
level=info ts=2018-05-16T08:45:44.987052493Z caller=compact.go:393 component=tsdb msg="compact blocks" count=3 mint=1526342400000 maxt=1526364000000
level=info ts=2018-05-16T08:47:15.503732025Z caller=compact.go:393 component=tsdb msg="compact blocks" count=3 mint=1526364000000 maxt=1526385600000
level=info ts=2018-05-16T09:00:00.238214945Z caller=compact.go:393 component=tsdb msg="compact blocks" count=1 mint=1526450400000 maxt=1526457600000
level=info ts=2018-05-16T09:00:19.690017824Z caller=head.go:348 component=tsdb msg="head GC completed" duration=1.870823274s
level=info ts=2018-05-16T09:00:24.602969974Z caller=head.go:357 component=tsdb msg="WAL truncation completed" duration=4.912877555s
level=info ts=2018-05-16T09:00:26.021513094Z caller=compact.go:393 component=tsdb msg="compact blocks" count=3 mint=1526428800000 maxt=1526450400000

Prometheus consumes about 99% of the 10GB. I added the --storage.tsdb.max-block-duration=6h to the command args and my retention is --storage.tsdb.retention=168h.

Where do I find any documentation on how to reduce memory consumption or the "head chunks" loaded into the memory?

@dobesv @brian-brazil @juliusv

@jurgenweber

This comment has been minimized.

Copy link

jurgenweber commented Jun 7, 2018

in the end I just deleted the 'wal' directory, prom would finally start and life would go on.. thankfully not too much data was lost.

@krasi-georgiev

This comment has been minimized.

Copy link
Member

krasi-georgiev commented Jun 7, 2018

had a quick look at the code and the WAL's size depends on the block ranges based on the storage.tsdb.max-block-duration and storage.tsdb.min-block-duration

You can try playing with these and see if you find something that works better in your case, but I think the defaults are battle tested which makes me think that your environment can't handle the amount of metrics you are trying to process.

The way I understand it is that reducing the block range sizes will use less memory at compaction (each block is loaded in memory at compaction), but will put more stress on your disk and also querying would become slower.

There should be more useful info in the users devs groups so would be interested to read more on the subject if you find anything interesting.

https://groups.google.com/forum/#!forum/prometheus-developers
https://groups.google.com/forum/#!forum/prometheus-users

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jun 13, 2018

It's still not clear what went on here, but with it occurring anymore it's hard to debug. If it pops up again please let us know.

Changing the block durations is not recommended, those flags only exist for internal loadtesting.

@estahn

This comment has been minimized.

Copy link

estahn commented Nov 6, 2018

@brian-brazil We have this issue again. The container is getting OOM killed upon start.

image

You see it is reaching its 30GB limit and then it goes down due to OOM kill.

Logs:

level=info ts=2018-11-06T23:54:07.953888964Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1539388800000 maxt=1539410400000 ulid=01CSPAG4D1S2PP200F5S3GJ88A
level=info ts=2018-11-06T23:54:07.954345953Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1539410400000 maxt=1539432000000 ulid=01CSPZ38NCPGFATD9TC6BYVDQK
level=info ts=2018-11-06T23:54:07.954769258Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1539432000000 maxt=1539453600000 ulid=01CSQKPF927H8YWZV020XCS58A
level=info ts=2018-11-06T23:54:07.955252927Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1539453600000 maxt=1539475200000 ulid=01CSR89QY9KFJZ5Q71CJY4N9QR
level=info ts=2018-11-06T23:54:07.95575467Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1539475200000 maxt=1539496800000 ulid=01CSRWWXW5HBBJT2G1EBW1WC02
...

Configuration:
image

@danielmotaleite

This comment has been minimized.

Copy link

danielmotaleite commented Dec 4, 2018

I had wal/checkpoing.000029 with 159GB and prometheus crashed everytime.
Removed the file and all working fine.

Another prometheus (replica) had a 3GB wal/checkpoint file
i suspect that everytime that prometheus crashed (probably the first time due a OOM from a huge query), it made that file grow... until it was huge and impossible to load

i'm using prometheus 2.5.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.