Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upPrometheus version 2.0 memory leak ? #3685
Comments
This comment has been minimized.
This comment has been minimized.
Mrcortez34
commented
Feb 7, 2018
This comment has been minimized.
This comment has been minimized.
Mrcortez34
commented
Feb 7, 2018
•
|
Here are my scrape configs apiVersion: v1 |
This comment has been minimized.
This comment has been minimized.
Mrcortez34
commented
Feb 8, 2018
•
|
@saady Would you mind sharing your Targets query? |
This comment has been minimized.
This comment has been minimized.
dimgsg9
commented
Feb 8, 2018
|
Running Prometheus as systemd service inside Virtual Machine, not inside docker container.
Stats for the last 7 days. Only 60 targets, statically configured. Not a kubernetes environment. Monitoring static environment with node, nginx, apache, coredns, proxysql & mysql exporters: |
This comment has been minimized.
This comment has been minimized.
|
can you attach the heap profile |
This comment has been minimized.
This comment has been minimized.
|
Prometheus's new storage engine memory-maps the entire storage. On a machine with lots of free memory this will make it appear like memory usage keeps growing indefinitely. However, the memory will be released as soon as another process needs it. That's most likely what you are looking at here. Your last screenshot @dimgsg9 seems to show that the resident memory (actual Go heap of the server) is fairly constant while virtual memory grows. Memory leaks would show in the resident memory usage going up. @saady @Mrcortez34 your screenshots are less revealing unfortunately – memory usage numbers vary a lot depending on what metrics are being used. TL;DR Prometheus is probably just "using" more and more memory because nothing else needs it anyway and will release it if that changes. |
This comment has been minimized.
This comment has been minimized.
Mrcortez34
commented
Feb 9, 2018
|
@fabxc I was trying to figure out why machine was rebooting all the time. What I found was that the prom process was using it all and then the machine would reboot . This happened all the time until I increased the memory to 128G and now it happens every other day. @fabxc what information would you need to determine anything? |
This comment has been minimized.
This comment has been minimized.
|
Thanks, this is much more detailed. Here indeed resident usage appears to be creeping up. Can you run |
This comment has been minimized.
This comment has been minimized.
Mrcortez34
commented
Feb 10, 2018
|
@fabxc Thanks for information. I will get it to you on Monday. |
This comment has been minimized.
This comment has been minimized.
Mrcortez34
commented
Feb 12, 2018
This comment has been minimized.
This comment has been minimized.
|
Thanks. So interestingly enough there is no big offender in memory usage, which would point to a memory leak with strong certainty. When did you create the profile? Early on in the cycle or shortly before it got OOM killed? 30% of memory usage is coming from k8s client libraries and thereby the k8s service discovery integration. Given this shouldn't need more than a few dozen MB, this is still most likely the cause. We had memory leaks coming from the k8s client in the past but haven't for a while. It may even be triggered by the k8s cluster version. Do you know which k8s version you are on? |
This comment has been minimized.
This comment has been minimized.
Mrcortez34
commented
Feb 13, 2018
|
It was early on, because the night before It restarted. We're running 1.7.4 |
This comment has been minimized.
This comment has been minimized.
nordbranch
commented
Feb 14, 2018
•
|
Hello. Joining the thread with similar issues. Version: 2.1.0 and then a stack dump: I ran "go tool pprof -symbolize=remote -inuse_space "http://localhost/debug/pprof/heap" - here's the graphviz of the profiling: Dashboard history: It's using ec2_sd_configs (trying to minimize the number of jobs), kubernetes_sd_configs, some static_configs, and I'm federating in from one prometheus with a 15min retention for service stats within Kubernetes. While I get 2.1 stable (to replace 1.6.3), it's not evaluating any rules/alerts, and nobody but myself it querying it via grafana or directly. I can provide a sanitized conf, but while trying to get this 2.1 up to replace 1.6.3, I have no user grafana traffic going to it, and it's not evaluating any alerting rules. It's using ec2_sd_configs, which I'm reducing to as few jobs as possible to minimize calls to AWS. If there's anything else I can provide to help on this one, please let me know. Many Thanks! late addition: We're pointing to Kubernetes 1.4. |
This comment has been minimized.
This comment has been minimized.
ghost
commented
Feb 15, 2018
•
|
Hi Prometheus version: 2.1.0 (https://github.com/prometheus/prometheus/releases/tag/v2.1.0) These three k8s clusters are vary in size, but prometheus instances has almost identical configs.
*metrics is a sum of all numSeries in meta.json files in dataset dir: Memory profiles is svg: It is ok that VSZ memory usage is high due to implementation of tsdb: https://fabxc.org/tsdb/. But:
k8s-1.8.6 prometheus RSS usage is low, twice smaller than dataset size.
Looks like that, but my investigations for now did not lead to anything... |
This comment has been minimized.
This comment has been minimized.
|
Thanks to both of you for providing such detailed information! This definitely looks like a memory leak in client-go dependent on the k8s version. We ran into those several times across different projects. What we can do is update the client-go version once more and hope that if fixes things (without breaking others). One thing that stands out though is that all of the leakage comes from Kubernetes "node" discovery. I'll look into it.
Loving the path name by the way :) |
This comment has been minimized.
This comment has been minimized.
Mrcortez34
commented
Feb 20, 2018
|
@fabxc Would a fix for this be in 2.2.0? |
This comment has been minimized.
This comment has been minimized.
|
please test the binary attached to the PR as the k8s client is dependant nightmare so needs a proper testing. |
This comment has been minimized.
This comment has been minimized.
ghost
commented
Feb 28, 2018
|
@krasi-georgiev I trying prometheus builded from your branch and also from my branch, I did almost same thing (https://github.com/theairkit/prometheus/commit/d8e663b04fab9045e3c2ab7bac90d8e86bd176ac). apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-config
namespace: monitoring
data:
prometheus.yml: |-
global:
external_labels:
cluster: mycluster
scrape_interval: 10s
rule_files:
- "/etc/prometheus-rules/*.rules"
alerting:
alertmanagers:
- scheme: http
static_configs:
- targets:
- "alertmanager:80"
scrape_configs:
- job_name: 'prometheus'
scrape_interval: 10s
static_configs:
- targets: ['127.0.0.1:9090']
- job_name: 'kubelet'
scrape_interval: 10s
scheme: https
tls_config:
insecure_skip_verify: true
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:
- role: node
relabel_configs:
- target_label: __address__
replacement: kubernetes.default.svc:443
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/${1}/proxy/metricslevel=info ts=2018-02-28T13:10:22.114521741Z caller=main.go:220 msg="Starting Prometheus" version="(version=2.2.0-rc.1, branch=k8s-update-client, revision=89aaa3ca5a514f401832ccbea169058fd9f76220)"
level=info ts=2018-02-28T13:10:22.11461344Z caller=main.go:221 build_context="(go=go1.10, user=root@3fccbaba49db, date=20180228-12:00:40)"
level=info ts=2018-02-28T13:10:22.11464829Z caller=main.go:222 host_details="(Linux 4.9.0-0.bpo.2-amd64 #1 SMP Debian 4.9.18-1~bpo8+1 (2017-04-10) x86_64 prometheus-3520576040-j556j (none))"
level=info ts=2018-02-28T13:10:22.114697596Z caller=main.go:223 fd_limits="(soft=1048576, hard=1048576)"
level=info ts=2018-02-28T13:10:22.117688809Z caller=web.go:381 component=web msg="Start listening for connections" address=0.0.0.0:9090
level=info ts=2018-02-28T13:10:22.117676028Z caller=main.go:504 msg="Starting TSDB ..."
level=info ts=2018-02-28T13:10:22.124020709Z caller=main.go:514 msg="TSDB started"
level=info ts=2018-02-28T13:10:22.124177288Z caller=main.go:588 msg="Loading configuration file" filename=/etc/prometheus/prometheus.yml
level=info ts=2018-02-28T13:10:22.125594046Z caller=kubernetes.go:191 component="discovery manager scrape" discovery=k8s msg="Using pod service account via in-cluster config"
E0228 13:10:22.126933 1 runtime.go:66] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
/go/src/github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:72
/go/src/github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:65
/go/src/github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:51
/usr/lib/go-1.10/src/runtime/asm_amd64.s:573
/usr/lib/go-1.10/src/runtime/panic.go:505
/usr/lib/go-1.10/src/runtime/panic.go:63
/usr/lib/go-1.10/src/runtime/signal_unix.go:388
/go/src/github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/cache/listwatch.go:67
/go/src/github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/cache/listwatch.go:77
/go/src/github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/pager/pager.go:39
/go/src/github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/pager/pager.go:77
/go/src/github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/cache/listwatch.go:107
/go/src/github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/cache/reflector.go:249
/go/src/github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/cache/reflector.go:204
/go/src/github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133
/go/src/github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134
/go/src/github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88
/go/src/github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/cache/reflector.go:203
/go/src/github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/cache/controller.go:122
/go/src/github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:54
/go/src/github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:71
/usr/lib/go-1.10/src/runtime/asm_amd64.s:2361
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x40 pc=0xea0345]
goroutine 270 [running]:
github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
/go/src/github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:58 +0x107
panic(0x15ec960, 0x249ae90)
/usr/lib/go-1.10/src/runtime/panic.go:505 +0x229
github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/cache.NewListWatchFromClient.func1(0xc420038300)
/go/src/github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/cache/listwatch.go:67 +0x25
github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/cache.NewFilteredListWatchFromClient.func1(0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x182f897, ...)
/go/src/github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/cache/listwatch.go:77 +0xe6
github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/pager.SimplePageFunc.func1(0x7f5cfd17b050, 0xc420044050, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
/go/src/github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/pager/pager.go:39 +0x64
github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/pager.(*ListPager).List(0xc420f739f8, 0x7f5cfd17b050, 0xc420044050, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
/go/src/github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/pager/pager.go:77 +0x105
github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/cache.(*ListWatch).List(0xc4211500c0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
/go/src/github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/cache/listwatch.go:107 +0x16b
github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/cache.(*Reflector).ListAndWatch(0xc42020e500, 0xc4201fa060, 0x0, 0x0)
/go/src/github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/cache/reflector.go:249 +0x208
github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/cache.(*Reflector).Run.func1()
/go/src/github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/cache/reflector.go:204 +0x33
github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1(0xc4204c2718)
/go/src/github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133 +0x54
github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc420f73f18, 0x3b9aca00, 0x0, 0xc420106c01, 0xc4201fa060)
/go/src/github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134 +0xbd
github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/wait.Until(0xc4204c2718, 0x3b9aca00, 0xc4201fa060)
/go/src/github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88 +0x4d
github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/cache.(*Reflector).Run(0xc42020e500, 0xc4201fa060)
/go/src/github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/cache/reflector.go:203 +0x157
github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/cache.(*Reflector).Run-fm(0xc4201fa060)
/go/src/github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/cache/controller.go:122 +0x34
github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/wait.(*Group).StartWithChannel.func1()
/go/src/github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:54 +0x31
github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1(0xc420472140, 0xc421150160)
/go/src/github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:71 +0x4f
created by github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/wait.(*Group).Start
/go/src/github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:69 +0x62For now I trying to investigate this... |
This comment has been minimized.
This comment has been minimized.
|
@theairkit don't worry I will revert the go client to a more stable version. |
This comment has been minimized.
This comment has been minimized.
|
as per @brancz advice I changed the version to v6.0.0 so test it and if it still panics I will try to replicate as well. |
This comment has been minimized.
This comment has been minimized.
ghost
commented
Mar 2, 2018
|
@krasi-georgiev Thanks for work! I test it, but unfortunately it still panics: # ./prometheus --config.file=./prometheus.yml --web.listen-address="0.0.0.0:9091"
level=info ts=2018-03-02T13:23:45.535390988Z caller=main.go:220 msg="Starting Prometheus" version="(version=2.2.0-rc.1, branch=k8s-update-client, revision=cbd061899074633748032533815d37ca7167bbdf)"
level=info ts=2018-03-02T13:23:45.535469836Z caller=main.go:221 build_context="(go=go1.10, user=operator@krakatau, date=20180302-13:20:27)"
level=info ts=2018-03-02T13:23:45.535491836Z caller=main.go:222 host_details="(Linux 4.9.0-0.bpo.2-amd64 #1 SMP Debian 4.9.18-1~bpo8+1 (2017-04-10) x86_64 prometheus-3197024436-gf9nb (none))"
level=info ts=2018-03-02T13:23:45.535510203Z caller=main.go:223 fd_limits="(soft=1048576, hard=1048576)"
level=info ts=2018-03-02T13:23:45.537881034Z caller=web.go:381 component=web msg="Start listening for connections" address=0.0.0.0:9091
level=info ts=2018-03-02T13:23:45.537852889Z caller=main.go:504 msg="Starting TSDB ..."
level=info ts=2018-03-02T13:23:45.542072153Z caller=main.go:514 msg="TSDB started"
level=info ts=2018-03-02T13:23:45.542131935Z caller=main.go:588 msg="Loading configuration file" filename=./prometheus.yml
level=info ts=2018-03-02T13:23:45.54298083Z caller=kubernetes.go:191 component="discovery manager scrape" discovery=k8s msg="Using pod service account via in-cluster config"
E0302 13:23:45.544284 352 runtime.go:66] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
/home/operator/zspace/dev/golang/src/github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:72
/home/operator/zspace/dev/golang/src/github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:65
/home/operator/zspace/dev/golang/src/github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:51
/usr/lib/go/src/runtime/asm_amd64.s:573
/usr/lib/go/src/runtime/panic.go:505
/usr/lib/go/src/runtime/panic.go:63
/usr/lib/go/src/runtime/signal_unix.go:388
/home/operator/zspace/dev/golang/src/github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/cache/listwatch.go:67
/home/operator/zspace/dev/golang/src/github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/pager/pager.go:39
/home/operator/zspace/dev/golang/src/github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/pager/pager.go:77
/home/operator/zspace/dev/golang/src/github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/cache/listwatch.go:97
/home/operator/zspace/dev/golang/src/github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/cache/reflector.go:249
/home/operator/zspace/dev/golang/src/github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/cache/reflector.go:204
/home/operator/zspace/dev/golang/src/github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133
/home/operator/zspace/dev/golang/src/github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134
/home/operator/zspace/dev/golang/src/github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88
/home/operator/zspace/dev/golang/src/github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/cache/reflector.go:203
/home/operator/zspace/dev/golang/src/github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/cache/controller.go:122
/home/operator/zspace/dev/golang/src/github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:54
/home/operator/zspace/dev/golang/src/github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:71
/usr/lib/go/src/runtime/asm_amd64.s:2361
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x40 pc=0xeb0629]
goroutine 364 [running]:
github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
/home/operator/zspace/dev/golang/src/github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:58 +0x107
panic(0x16183a0, 0x24eb370)
/usr/lib/go/src/runtime/panic.go:505 +0x229
github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/cache.NewListWatchFromClient.func1(0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x185edf7, ...)
/home/operator/zspace/dev/golang/src/github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/cache/listwatch.go:67 +0xe9
github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/pager.SimplePageFunc.func1(0x7f6249530fd0, 0xc420132028, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
/home/operator/zspace/dev/golang/src/github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/pager/pager.go:39 +0x64
github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/pager.(*ListPager).List(0xc420de59f8, 0x7f6249530fd0, 0xc420132028, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
/home/operator/zspace/dev/golang/src/github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/pager/pager.go:77 +0x105
github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/cache.(*ListWatch).List(0xc4205ee640, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
/home/operator/zspace/dev/golang/src/github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/cache/listwatch.go:97 +0x16b
github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/cache.(*Reflector).ListAndWatch(0xc4201400a0, 0xc42029a3c0, 0x0, 0x0)
/home/operator/zspace/dev/golang/src/github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/cache/reflector.go:249 +0x208
github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/cache.(*Reflector).Run.func1()
/home/operator/zspace/dev/golang/src/github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/cache/reflector.go:204 +0x33
github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1(0xc420f15718)
/home/operator/zspace/dev/golang/src/github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133 +0x54
github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc420de5f18, 0x3b9aca00, 0x0, 0xc420107901, 0xc42029a3c0)
/home/operator/zspace/dev/golang/src/github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134 +0xbd
github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/wait.Until(0xc420f15718, 0x3b9aca00, 0xc42029a3c0)
/home/operator/zspace/dev/golang/src/github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88 +0x4d
github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/cache.(*Reflector).Run(0xc4201400a0, 0xc42029a3c0)
/home/operator/zspace/dev/golang/src/github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/cache/reflector.go:203 +0x157
github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/cache.(*Reflector).Run-fm(0xc42029a3c0)
/home/operator/zspace/dev/golang/src/github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/cache/controller.go:122 +0x34
github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/wait.(*Group).StartWithChannel.func1()
/home/operator/zspace/dev/golang/src/github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:54 +0x31
github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1(0xc420132500, 0xc4205ee6e0)
/home/operator/zspace/dev/golang/src/github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:71 +0x4f
created by github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/wait.(*Group).Start
/home/operator/zspace/dev/golang/src/github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:69 +0x62prometheus.yml: # cat prometheus.yml
global:
scrape_interval: 10s
scrape_configs:
- job_name: 'kubelet'
scrape_interval: 10s
scheme: https
tls_config:
insecure_skip_verify: true
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:
- role: node
relabel_configs:
- target_label: __address__
replacement: kubernetes.default.svc:443
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/${1}/proxy/metrics |
This comment has been minimized.
This comment has been minimized.
|
@theairkit ok I will try to find some time to replicate locally. |
This comment has been minimized.
This comment has been minimized.
|
@theairkit still didn't have to test it myself , but @brancz pointed me to the solution. so if you have time for another test that would be great , otherwise I will try it myself in the next few days.
|
This comment has been minimized.
This comment has been minimized.
ghost
commented
Mar 2, 2018
|
@krasi-georgiev I'll find some time for that today/tomorrow and test it, thanks! |
This comment has been minimized.
This comment has been minimized.
ghost
commented
Mar 2, 2018
|
@krasi-georgiev It works now! So, I run prometheus with full configuration and will see memory usage for next some hours. |
This comment has been minimized.
This comment has been minimized.
ghost
commented
Mar 7, 2018
|
@krasi-georgiev |
This comment has been minimized.
This comment has been minimized.
|
if you are running 2.1 it is missing the throttling so this might be another reason for the mem increase. btw did you run another pprof to see of the k8s client is the culprit again? |
This comment has been minimized.
This comment has been minimized.
ghost
commented
Mar 7, 2018
I'll try it, thanks!
I'll provide it to you, of course |
This comment has been minimized.
This comment has been minimized.
|
I just remembered that my branch is already in sync with master so it includes the throttling. |
This comment has been minimized.
This comment has been minimized.
|
Is this still happening with 2.2.1? |
This comment has been minimized.
This comment has been minimized.
Mrcortez34
commented
Mar 20, 2018
|
I'm no longer seeing it.
…On Mar 20, 2018 8:45 AM, "Brian Brazil" ***@***.***> wrote:
Is this still happening with 2.2.1?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3685 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AUR_Ws6amsycQ4nxaYIA_ORd4AmPGAZKks5tgQf5gaJpZM4ReZOi>
.
|
brian-brazil
closed this
Mar 20, 2018
This comment has been minimized.
This comment has been minimized.
ghost
commented
Mar 20, 2018
|
Agreed with @Mrcortez34 I'm no longer seeing it also. |
This comment has been minimized.
This comment has been minimized.
piaoyu
commented
May 8, 2018
This comment has been minimized.
This comment has been minimized.
piaoyu
commented
May 8, 2018
•
|
This comment has been minimized.
This comment has been minimized.
|
looks like you are hitting the k8s client leak as well which has been fixed about a week ago. You can either build from the 2.2 or master branch or wait for the next release |
This comment has been minimized.
This comment has been minimized.
piaoyu
commented
May 9, 2018
|
@krasi-georgiev thanks. is it this pr #4117 ? |
This comment has been minimized.
This comment has been minimized.
|
yep , that is the one. |
This comment has been minimized.
This comment has been minimized.
piaoyu
commented
May 10, 2018
•
|
@krasi-georgiev I checkout the release-2.2 and build it . But it looks like also appear.
|
This comment has been minimized.
This comment has been minimized.
|
comparing both it looks like it is using half RAM 4gb vs 2gb on this one. Are you running some expensive query? The rest of the memory usage looks like some json coding encoding. If you run some expensive query you might lower the total memory usage by using a recording rule. |
This comment has been minimized.
This comment has been minimized.
piaoyu
commented
May 10, 2018
•
|
@krasi-georgiev Yes. but the k8s.io/client-go also using so many RAM.
|
This comment has been minimized.
This comment has been minimized.
|
so still high memory usage in the k8s client somewhere. It might help to look at your config and also if you mention the scale of your deployment and total PODs being discovered. @beorn7 how much improvement in the memory usage did you see after applying the k8s SD refactoring. |
This comment has been minimized.
This comment has been minimized.
|
Built from what's currently in the release-2.2 and also master, I see "normal" memory behavior, i.e. the usual up and down from expanding WAL and compaction. There is no upwards trend anymore, while the previous state was infinite growth with OOMs every few days (but only if using K8s SD). The baseline memory usage (before the growth kicks in) hasn't changed much. |
This comment has been minimized.
This comment has been minimized.
|
@piaoyu does your Prometheus instance gets OOM killed as well? |
This comment has been minimized.
This comment has been minimized.
piaoyu
commented
May 12, 2018
•
|
@krasi-georgiev Yes. I put one of prometheus out of expensive query. but it also gets OOM killed
|
This comment has been minimized.
This comment has been minimized.
|
I am not 100% sure ,but I think it OOM is normal when you have such an expensive query and run so frequently that it doesn't fit in the memory so maybe your problem is there and not in the k8s SD. Have you tried using a recording rule? |
This comment has been minimized.
This comment has been minimized.
piaoyu
commented
May 14, 2018
|
@krasi-georgiev Yes . We Use
I will put one of prometheus out of all recording rule to test |
This comment has been minimized.
This comment has been minimized.
|
Expensive queries are indeed a “legitimate” way to OOM Prometheus (of any version – while we should have that, there is really no self-protection against queries that grab a lof of RAM). To diagnose a memory leak in SD (like here) or somewhere else outside the query engine (e.g. storage), it's best to run a server without any queries (none via the API, and no recording rules configured). |
This comment has been minimized.
This comment has been minimized.
piaoyu
commented
May 14, 2018
|
@beorn7 yes, none via the API, and no recording rules configured is use to one of prometheus. |
This comment has been minimized.
This comment has been minimized.
|
@piaoyu sorry I didn't quite understand what you mean? |
This comment has been minimized.
This comment has been minimized.
piaoyu
commented
May 15, 2018
•
|
@krasi-georgiev I mean that I will put the prometheus out of all recording rule and query to test. |
This comment has been minimized.
This comment has been minimized.
|
If you believe you have a memory leak, please file a new bug as it's getting a bit difficult to track all the threads in this closed bug. |
This comment has been minimized.
This comment has been minimized.
piaoyu
commented
May 15, 2018
•
|
@brian-brazil OK . I submit a issue #4164 |
This comment has been minimized.
This comment has been minimized.
lock
bot
commented
Mar 22, 2019
|
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |








saady commentedJan 15, 2018
Prometheus is using about 5 GB memory, I'm scraping only 55 targets which is not very high.
I'm attaching a Grafana dashboard I'm using to monitor prometheus.
Targets count:
Resident memory: