Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prometheus version 2.0 memory leak ? #3685

Closed
saady opened this Issue Jan 15, 2018 · 53 comments

Comments

Projects
None yet
9 participants
@saady
Copy link

saady commented Jan 15, 2018

Prometheus is using about 5 GB memory, I'm scraping only 55 targets which is not very high.
I'm attaching a Grafana dashboard I'm using to monitor prometheus.

Targets count:

screen shot 2018-01-15 at 12 59 52 pm

Resident memory:

screen shot 2018-01-15 at 1 00 22 pm

@Mrcortez34

This comment has been minimized.

Copy link

Mrcortez34 commented Feb 7, 2018

I have the same issue with my prom server using memory and not releasing it. It just keeps growing.
screenshot from 2018-02-07 14-05-04

@Mrcortez34

This comment has been minimized.

Copy link

Mrcortez34 commented Feb 7, 2018

Here are my scrape configs

apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-nodes
data:
prometheus.yml: |-
global:
scrape_interval: 1m
scrape_timeout: 1m
evaluation_interval: 1m
external_labels:
slave: prometheus-nodes
datacenter: aus1

@Mrcortez34

This comment has been minimized.

Copy link

Mrcortez34 commented Feb 8, 2018

@saady Would you mind sharing your Targets query?

@dimgsg9

This comment has been minimized.

Copy link

dimgsg9 commented Feb 8, 2018

Running Prometheus as systemd service inside Virtual Machine, not inside docker container.
CentOS 7

Prometheus Version 2.0.0
0a74f98
go1.9.2

Stats for the last 7 days. Only 60 targets, statically configured. Not a kubernetes environment. Monitoring static environment with node, nginx, apache, coredns, proxysql & mysql exporters:

screen shot 2018-02-08 at 9 15 11 am

@krasi-georgiev

This comment has been minimized.

Copy link
Member

krasi-georgiev commented Feb 8, 2018

can you attach the heap profile
...:9090/debug/pprof/heap

@fabxc

This comment has been minimized.

Copy link
Member

fabxc commented Feb 9, 2018

Prometheus's new storage engine memory-maps the entire storage. On a machine with lots of free memory this will make it appear like memory usage keeps growing indefinitely. However, the memory will be released as soon as another process needs it.

That's most likely what you are looking at here. Your last screenshot @dimgsg9 seems to show that the resident memory (actual Go heap of the server) is fairly constant while virtual memory grows. Memory leaks would show in the resident memory usage going up.

@saady @Mrcortez34 your screenshots are less revealing unfortunately – memory usage numbers vary a lot depending on what metrics are being used.

TL;DR Prometheus is probably just "using" more and more memory because nothing else needs it anyway and will release it if that changes.

@Mrcortez34

This comment has been minimized.

Copy link

Mrcortez34 commented Feb 9, 2018

@fabxc I was trying to figure out why machine was rebooting all the time. What I found was that the prom process was using it all and then the machine would reboot . This happened all the time until I increased the memory to 128G and now it happens every other day. @fabxc what information would you need to determine anything?
screenshot from 2018-02-09 09-13-26
screenshot from 2018-02-09 09-12-52
screenshot from 2018-02-09 09-12-21

@fabxc

This comment has been minimized.

Copy link
Member

fabxc commented Feb 9, 2018

Thanks, this is much more detailed. Here indeed resident usage appears to be creeping up.

Can you run go tool pprof -symbolize=remote -inuse_space "http://<prometheus>/debug/pprof/heap" as @krasi-georgiev suggested. It will prompt you for input. Enter svg > inuse.svg then and send the file our way.

@Mrcortez34

This comment has been minimized.

Copy link

Mrcortez34 commented Feb 10, 2018

@fabxc Thanks for information. I will get it to you on Monday.

@Mrcortez34

This comment has been minimized.

Copy link

Mrcortez34 commented Feb 12, 2018

@fabxc Here is what you asked for.

inuse.zip

@fabxc

This comment has been minimized.

Copy link
Member

fabxc commented Feb 13, 2018

Thanks. So interestingly enough there is no big offender in memory usage, which would point to a memory leak with strong certainty. When did you create the profile? Early on in the cycle or shortly before it got OOM killed?

30% of memory usage is coming from k8s client libraries and thereby the k8s service discovery integration. Given this shouldn't need more than a few dozen MB, this is still most likely the cause.

We had memory leaks coming from the k8s client in the past but haven't for a while. It may even be triggered by the k8s cluster version. Do you know which k8s version you are on?

@Mrcortez34

This comment has been minimized.

Copy link

Mrcortez34 commented Feb 13, 2018

It was early on, because the night before It restarted.

We're running 1.7.4

@nordbranch

This comment has been minimized.

Copy link

nordbranch commented Feb 14, 2018

Hello. Joining the thread with similar issues.

Version: 2.1.0
Revision: 85f23d8
exhausted memory on a ~59GB RAM i3.2xlarge in under 18hrs:
:fatal error: runtime: out of memory

and then a stack dump:
full log : https://s3-us-west-1.amazonaws.com/nordberg-temp/our_daily_liver/prometheus.log.out_of_memory

I ran "go tool pprof -symbolize=remote -inuse_space "http://localhost/debug/pprof/heap" - here's the graphviz of the profiling:
graphviz svg of go prof : https://s3-us-west-1.amazonaws.com/nordberg-temp/our_daily_liver/inuse.svg

Dashboard history:
https://s3-us-west-1.amazonaws.com/nordberg-temp/our_daily_liver/prom_dash_001.png
https://s3-us-west-1.amazonaws.com/nordberg-temp/our_daily_liver/prom_dash_002.png
(Let me know if you want a zoom-in of the time period from the logs)

It's using ec2_sd_configs (trying to minimize the number of jobs), kubernetes_sd_configs, some static_configs, and I'm federating in from one prometheus with a 15min retention for service stats within Kubernetes. While I get 2.1 stable (to replace 1.6.3), it's not evaluating any rules/alerts, and nobody but myself it querying it via grafana or directly.

I can provide a sanitized conf, but while trying to get this 2.1 up to replace 1.6.3, I have no user grafana traffic going to it, and it's not evaluating any alerting rules. It's using ec2_sd_configs, which I'm reducing to as few jobs as possible to minimize calls to AWS.

If there's anything else I can provide to help on this one, please let me know. Many Thanks!

late addition: We're pointing to Kubernetes 1.4.

@ghost

This comment has been minimized.

Copy link

ghost commented Feb 15, 2018

Hi
Looks like I have similar issue, but only for monitoring old version of k8s.
Graphs below show prometeus RSS memory usage for last 24h:
2018-02-15 11 55 09

Prometheus version: 2.1.0 (https://github.com/prometheus/prometheus/releases/tag/v2.1.0)
k8s clusters runs on nodes Debian GNU/Linux 9.3 (stretch)

These three k8s clusters are vary in size, but prometheus instances has almost identical configs.
Some information about numbers of metris etc. (values rounded):

k8s version dataset, Gb (du /data) metrics* prometheus_tsdb_head_series RSS*, Gb VSZ*, Gb
1.5.7 49 153000000 1600000 94 140
1.6.7 137 415000000 4300000 130 307
1.8.6 9 42000000 440000 4 13

*metrics is a sum of all numSeries in meta.json files in dataset dir:
find /data -name meta.json | xargs -n1 cat | grep numSeries | tr -d , | awk '{s+=$2}END{print s}'
*RSS is a metric: process_resident_memory_bytes{job="prometheus"}
*VSZ is a metric: process_virtual_memory_bytes{job="prometheus"}

Memory profiles is svg:
alloc-inuse.zip

It is ok that VSZ memory usage is high due to implementation of tsdb: https://fabxc.org/tsdb/.

But:

  • For k8s-1.5.7 prometheus RSS usage increases and for now it is ~2x dataset size.
  • For k8s-1.6.7 prometheus RSS usage not increase, but very high (132Gb), nearly dataset size.

k8s-1.8.6 prometheus RSS usage is low, twice smaller than dataset size.

@fabxc

It may even be triggered by the k8s cluster version

Looks like that, but my investigations for now did not lead to anything...
Do you have any ideas?..
Thanks in advance!

@fabxc

This comment has been minimized.

Copy link
Member

fabxc commented Feb 15, 2018

Thanks to both of you for providing such detailed information! This definitely looks like a memory leak in client-go dependent on the k8s version.

We ran into those several times across different projects. What we can do is update the client-go version once more and hope that if fixes things (without breaking others).
But aside from that we've little means to address this aside from creating our own client library.

One thing that stands out though is that all of the leakage comes from Kubernetes "node" discovery.
@brancz mentioned that we "fixed" this in other projects by not using watchers anymore for node discovery but simply refreshing every few minutes.
Possibly the best way forward is to go the same route in Prometheus at the cost of higher update latencies when discovering kubelets.

I'll look into it.

.../nordberg-temp/our_daily_liver/prometheus.log.out_of_memory

Loving the path name by the way :)

@Mrcortez34

This comment has been minimized.

Copy link

Mrcortez34 commented Feb 20, 2018

@fabxc Would a fix for this be in 2.2.0?

@krasi-georgiev

This comment has been minimized.

Copy link
Member

krasi-georgiev commented Feb 28, 2018

please test the binary attached to the PR as the k8s client is dependant nightmare so needs a proper testing.

@ghost

This comment has been minimized.

Copy link

ghost commented Feb 28, 2018

@krasi-georgiev I trying prometheus builded from your branch and also from my branch, I did almost same thing (https://github.com/theairkit/prometheus/commit/d8e663b04fab9045e3c2ab7bac90d8e86bd176ac).
In both cases/instances I got runtime panic when trying to start prometheus with this config:

apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-config
  namespace: monitoring
data:
  prometheus.yml: |-
    global:
      external_labels:
        cluster: mycluster
      scrape_interval: 10s
    rule_files:
      - "/etc/prometheus-rules/*.rules"
    alerting:
      alertmanagers:
      - scheme: http
        static_configs:
        - targets:
          - "alertmanager:80"
    scrape_configs:
      - job_name: 'prometheus'
        scrape_interval: 10s
        static_configs:
          - targets: ['127.0.0.1:9090']
      - job_name: 'kubelet'
        scrape_interval: 10s
        scheme: https
        tls_config:
          insecure_skip_verify: true
        bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
        kubernetes_sd_configs:
          - role: node
        relabel_configs:
          - target_label: __address__
            replacement: kubernetes.default.svc:443
          - source_labels: [__meta_kubernetes_node_name]
            regex: (.+)
            target_label: __metrics_path__
            replacement: /api/v1/nodes/${1}/proxy/metrics
level=info ts=2018-02-28T13:10:22.114521741Z caller=main.go:220 msg="Starting Prometheus" version="(version=2.2.0-rc.1, branch=k8s-update-client, revision=89aaa3ca5a514f401832ccbea169058fd9f76220)"
level=info ts=2018-02-28T13:10:22.11461344Z caller=main.go:221 build_context="(go=go1.10, user=root@3fccbaba49db, date=20180228-12:00:40)"
level=info ts=2018-02-28T13:10:22.11464829Z caller=main.go:222 host_details="(Linux 4.9.0-0.bpo.2-amd64 #1 SMP Debian 4.9.18-1~bpo8+1 (2017-04-10) x86_64 prometheus-3520576040-j556j (none))"
level=info ts=2018-02-28T13:10:22.114697596Z caller=main.go:223 fd_limits="(soft=1048576, hard=1048576)"
level=info ts=2018-02-28T13:10:22.117688809Z caller=web.go:381 component=web msg="Start listening for connections" address=0.0.0.0:9090
level=info ts=2018-02-28T13:10:22.117676028Z caller=main.go:504 msg="Starting TSDB ..."
level=info ts=2018-02-28T13:10:22.124020709Z caller=main.go:514 msg="TSDB started"
level=info ts=2018-02-28T13:10:22.124177288Z caller=main.go:588 msg="Loading configuration file" filename=/etc/prometheus/prometheus.yml
level=info ts=2018-02-28T13:10:22.125594046Z caller=kubernetes.go:191 component="discovery manager scrape" discovery=k8s msg="Using pod service account via in-cluster config"
E0228 13:10:22.126933       1 runtime.go:66] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
/go/src/github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:72
/go/src/github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:65
/go/src/github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:51
/usr/lib/go-1.10/src/runtime/asm_amd64.s:573
/usr/lib/go-1.10/src/runtime/panic.go:505
/usr/lib/go-1.10/src/runtime/panic.go:63
/usr/lib/go-1.10/src/runtime/signal_unix.go:388
/go/src/github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/cache/listwatch.go:67
/go/src/github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/cache/listwatch.go:77
/go/src/github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/pager/pager.go:39
/go/src/github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/pager/pager.go:77
/go/src/github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/cache/listwatch.go:107
/go/src/github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/cache/reflector.go:249
/go/src/github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/cache/reflector.go:204
/go/src/github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133
/go/src/github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134
/go/src/github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88
/go/src/github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/cache/reflector.go:203
/go/src/github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/cache/controller.go:122
/go/src/github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:54
/go/src/github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:71
/usr/lib/go-1.10/src/runtime/asm_amd64.s:2361
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x40 pc=0xea0345]

goroutine 270 [running]:
github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
	/go/src/github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:58 +0x107
panic(0x15ec960, 0x249ae90)
	/usr/lib/go-1.10/src/runtime/panic.go:505 +0x229
github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/cache.NewListWatchFromClient.func1(0xc420038300)
	/go/src/github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/cache/listwatch.go:67 +0x25
github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/cache.NewFilteredListWatchFromClient.func1(0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x182f897, ...)
	/go/src/github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/cache/listwatch.go:77 +0xe6
github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/pager.SimplePageFunc.func1(0x7f5cfd17b050, 0xc420044050, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/pager/pager.go:39 +0x64
github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/pager.(*ListPager).List(0xc420f739f8, 0x7f5cfd17b050, 0xc420044050, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/pager/pager.go:77 +0x105
github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/cache.(*ListWatch).List(0xc4211500c0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/cache/listwatch.go:107 +0x16b
github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/cache.(*Reflector).ListAndWatch(0xc42020e500, 0xc4201fa060, 0x0, 0x0)
	/go/src/github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/cache/reflector.go:249 +0x208
github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/cache.(*Reflector).Run.func1()
	/go/src/github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/cache/reflector.go:204 +0x33
github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1(0xc4204c2718)
	/go/src/github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133 +0x54
github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc420f73f18, 0x3b9aca00, 0x0, 0xc420106c01, 0xc4201fa060)
	/go/src/github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134 +0xbd
github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/wait.Until(0xc4204c2718, 0x3b9aca00, 0xc4201fa060)
	/go/src/github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88 +0x4d
github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/cache.(*Reflector).Run(0xc42020e500, 0xc4201fa060)
	/go/src/github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/cache/reflector.go:203 +0x157
github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/cache.(*Reflector).Run-fm(0xc4201fa060)
	/go/src/github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/cache/controller.go:122 +0x34
github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/wait.(*Group).StartWithChannel.func1()
	/go/src/github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:54 +0x31
github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1(0xc420472140, 0xc421150160)
	/go/src/github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:71 +0x4f
created by github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/wait.(*Group).Start
	/go/src/github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:69 +0x62

For now I trying to investigate this...

@krasi-georgiev

This comment has been minimized.

Copy link
Member

krasi-georgiev commented Feb 28, 2018

@theairkit don't worry I will revert the go client to a more stable version.

@krasi-georgiev

This comment has been minimized.

Copy link
Member

krasi-georgiev commented Mar 1, 2018

as per @brancz advice I changed the version to v6.0.0 so test it and if it still panics I will try to replicate as well.

@ghost

This comment has been minimized.

Copy link

ghost commented Mar 2, 2018

@krasi-georgiev Thanks for work! I test it, but unfortunately it still panics:

# ./prometheus --config.file=./prometheus.yml --web.listen-address="0.0.0.0:9091"
level=info ts=2018-03-02T13:23:45.535390988Z caller=main.go:220 msg="Starting Prometheus" version="(version=2.2.0-rc.1, branch=k8s-update-client, revision=cbd061899074633748032533815d37ca7167bbdf)"
level=info ts=2018-03-02T13:23:45.535469836Z caller=main.go:221 build_context="(go=go1.10, user=operator@krakatau, date=20180302-13:20:27)"
level=info ts=2018-03-02T13:23:45.535491836Z caller=main.go:222 host_details="(Linux 4.9.0-0.bpo.2-amd64 #1 SMP Debian 4.9.18-1~bpo8+1 (2017-04-10) x86_64 prometheus-3197024436-gf9nb (none))"
level=info ts=2018-03-02T13:23:45.535510203Z caller=main.go:223 fd_limits="(soft=1048576, hard=1048576)"
level=info ts=2018-03-02T13:23:45.537881034Z caller=web.go:381 component=web msg="Start listening for connections" address=0.0.0.0:9091
level=info ts=2018-03-02T13:23:45.537852889Z caller=main.go:504 msg="Starting TSDB ..."
level=info ts=2018-03-02T13:23:45.542072153Z caller=main.go:514 msg="TSDB started"
level=info ts=2018-03-02T13:23:45.542131935Z caller=main.go:588 msg="Loading configuration file" filename=./prometheus.yml
level=info ts=2018-03-02T13:23:45.54298083Z caller=kubernetes.go:191 component="discovery manager scrape" discovery=k8s msg="Using pod service account via in-cluster config"
E0302 13:23:45.544284     352 runtime.go:66] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
/home/operator/zspace/dev/golang/src/github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:72
/home/operator/zspace/dev/golang/src/github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:65
/home/operator/zspace/dev/golang/src/github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:51
/usr/lib/go/src/runtime/asm_amd64.s:573
/usr/lib/go/src/runtime/panic.go:505
/usr/lib/go/src/runtime/panic.go:63
/usr/lib/go/src/runtime/signal_unix.go:388
/home/operator/zspace/dev/golang/src/github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/cache/listwatch.go:67
/home/operator/zspace/dev/golang/src/github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/pager/pager.go:39
/home/operator/zspace/dev/golang/src/github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/pager/pager.go:77
/home/operator/zspace/dev/golang/src/github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/cache/listwatch.go:97
/home/operator/zspace/dev/golang/src/github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/cache/reflector.go:249
/home/operator/zspace/dev/golang/src/github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/cache/reflector.go:204
/home/operator/zspace/dev/golang/src/github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133
/home/operator/zspace/dev/golang/src/github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134
/home/operator/zspace/dev/golang/src/github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88
/home/operator/zspace/dev/golang/src/github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/cache/reflector.go:203
/home/operator/zspace/dev/golang/src/github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/cache/controller.go:122
/home/operator/zspace/dev/golang/src/github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:54
/home/operator/zspace/dev/golang/src/github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:71
/usr/lib/go/src/runtime/asm_amd64.s:2361
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x40 pc=0xeb0629]

goroutine 364 [running]:
github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
	/home/operator/zspace/dev/golang/src/github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:58 +0x107
panic(0x16183a0, 0x24eb370)
	/usr/lib/go/src/runtime/panic.go:505 +0x229
github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/cache.NewListWatchFromClient.func1(0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x185edf7, ...)
	/home/operator/zspace/dev/golang/src/github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/cache/listwatch.go:67 +0xe9
github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/pager.SimplePageFunc.func1(0x7f6249530fd0, 0xc420132028, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/home/operator/zspace/dev/golang/src/github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/pager/pager.go:39 +0x64
github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/pager.(*ListPager).List(0xc420de59f8, 0x7f6249530fd0, 0xc420132028, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/home/operator/zspace/dev/golang/src/github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/pager/pager.go:77 +0x105
github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/cache.(*ListWatch).List(0xc4205ee640, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/home/operator/zspace/dev/golang/src/github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/cache/listwatch.go:97 +0x16b
github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/cache.(*Reflector).ListAndWatch(0xc4201400a0, 0xc42029a3c0, 0x0, 0x0)
	/home/operator/zspace/dev/golang/src/github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/cache/reflector.go:249 +0x208
github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/cache.(*Reflector).Run.func1()
	/home/operator/zspace/dev/golang/src/github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/cache/reflector.go:204 +0x33
github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1(0xc420f15718)
	/home/operator/zspace/dev/golang/src/github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133 +0x54
github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc420de5f18, 0x3b9aca00, 0x0, 0xc420107901, 0xc42029a3c0)
	/home/operator/zspace/dev/golang/src/github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134 +0xbd
github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/wait.Until(0xc420f15718, 0x3b9aca00, 0xc42029a3c0)
	/home/operator/zspace/dev/golang/src/github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88 +0x4d
github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/cache.(*Reflector).Run(0xc4201400a0, 0xc42029a3c0)
	/home/operator/zspace/dev/golang/src/github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/cache/reflector.go:203 +0x157
github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/cache.(*Reflector).Run-fm(0xc42029a3c0)
	/home/operator/zspace/dev/golang/src/github.com/prometheus/prometheus/vendor/k8s.io/client-go/tools/cache/controller.go:122 +0x34
github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/wait.(*Group).StartWithChannel.func1()
	/home/operator/zspace/dev/golang/src/github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:54 +0x31
github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1(0xc420132500, 0xc4205ee6e0)
	/home/operator/zspace/dev/golang/src/github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:71 +0x4f
created by github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/wait.(*Group).Start
	/home/operator/zspace/dev/golang/src/github.com/prometheus/prometheus/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:69 +0x62

prometheus.yml:

# cat prometheus.yml
global:
  scrape_interval: 10s
scrape_configs:
  - job_name: 'kubelet'
    scrape_interval: 10s
    scheme: https
    tls_config:
      insecure_skip_verify: true
    bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
    kubernetes_sd_configs:
      - role: node
    relabel_configs:
      - target_label: __address__
        replacement: kubernetes.default.svc:443
      - source_labels: [__meta_kubernetes_node_name]
        regex: (.+)
        target_label: __metrics_path__
        replacement: /api/v1/nodes/${1}/proxy/metrics
@krasi-georgiev

This comment has been minimized.

Copy link
Member

krasi-georgiev commented Mar 2, 2018

@theairkit ok I will try to find some time to replicate locally.

@krasi-georgiev

This comment has been minimized.

Copy link
Member

krasi-georgiev commented Mar 2, 2018

@theairkit still didn't have to test it myself , but @brancz pointed me to the solution. so if you have time for another test that would be great , otherwise I will try it myself in the next few days.

cache.NewListWatchFromClient(rclient, "endpoints", namespace, nil)
is now
cache.NewListWatchFromClient(rclient, "endpoints", namespace, fields.Everything())

@ghost

This comment has been minimized.

Copy link

ghost commented Mar 2, 2018

@krasi-georgiev I'll find some time for that today/tomorrow and test it, thanks!

@ghost

This comment has been minimized.

Copy link

ghost commented Mar 2, 2018

@krasi-georgiev It works now! So, I run prometheus with full configuration and will see memory usage for next some hours.

@ghost

This comment has been minimized.

Copy link

ghost commented Mar 7, 2018

@krasi-georgiev
Hi! I tested prometheus this few days for collecting statistics, and unfortunately it still leaks, and memory consumption did not change, ~5-6Gb/hour, and still looks exactly like my statistics above in this thread (#3685 (comment)).
Looks like updating k8s-client did not eliminate memory leak...

@krasi-georgiev

This comment has been minimized.

Copy link
Member

krasi-georgiev commented Mar 7, 2018

if you are running 2.1 it is missing the throttling so this might be another reason for the mem increase.
2.2 is quite stable if you want to try it.

btw did you run another pprof to see of the k8s client is the culprit again?

@ghost

This comment has been minimized.

Copy link

ghost commented Mar 7, 2018

if you are running 2.1 it is missing the throttling so this might be another reason for the mem increase.
2.2 is quite stable if you want to try it.

I'll try it, thanks!

btw did you run another pprof to see of the k8s client is the culprit again?

I'll provide it to you, of course

@krasi-georgiev

This comment has been minimized.

Copy link
Member

krasi-georgiev commented Mar 7, 2018

I just remembered that my branch is already in sync with master so it includes the throttling.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Mar 20, 2018

Is this still happening with 2.2.1?

@Mrcortez34

This comment has been minimized.

Copy link

Mrcortez34 commented Mar 20, 2018

@ghost

This comment has been minimized.

Copy link

ghost commented Mar 20, 2018

Agreed with @Mrcortez34 I'm no longer seeing it also.

@piaoyu

This comment has been minimized.

Copy link

piaoyu commented May 8, 2018

Looks like also happened in 2.2.1 , I use kubernets 1.9.0
dingtalk20180508111326

@piaoyu

This comment has been minimized.

Copy link

piaoyu commented May 8, 2018

Version | 2.2.1
-- | --
bc6058c81272a8d938c05e75607371284236aadc
HEAD
(pprof) top 10
Showing nodes accounting for 7839.14MB, 76.47% of 10251.73MB total
Dropped 374 nodes (cum <= 51.26MB)
Showing top 10 nodes out of 92
      flat  flat%   sum%        cum   cum%
 2461.23MB 24.01% 24.01%  2537.24MB 24.75%  github.com/prometheus/prometheus/vendor/github.com/ugorji/go/codec.(*jsonDecDriver).DecodeString /root/gopath/src/github.com/prometheus/prometheus/vendor/github.com/ugorji/go/codec/json.go
 1012.74MB  9.88% 33.89%  1012.74MB  9.88%  github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/index.newReader.func2 /root/gopath/src/github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/index/index.go
  891.05MB  8.69% 42.58%   891.05MB  8.69%  github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/index.(*decbuf).uvarintStr /root/gopath/src/github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/index/encoding_helpers.go
  720.54MB  7.03% 49.61%  2757.76MB 26.90%  github.com/prometheus/prometheus/vendor/github.com/ugorji/go/codec.fastpathT.DecSliceStringV /root/gopath/src/github.com/prometheus/prometheus/vendor/github.com/ugorji/go/codec/fast-path.generated.go
  560.95MB  5.47% 55.08%   841.96MB  8.21%  github.com/prometheus/prometheus/vendor/k8s.io/client-go/pkg/api/v1.codecSelfer1234.decSliceNodeCondition /root/gopath/src/github.com/prometheus/prometheus/vendor/k8s.io/client-go/pkg/api/v1/types.generated.go
  485.77MB  4.74% 59.82%   836.29MB  8.16%  github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/index.(*Reader).readSymbols /root/gopath/src/github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/index/index.go
  477.30MB  4.66% 64.47%  3234.06MB 31.55%  github.com/prometheus/prometheus/vendor/k8s.io/client-go/pkg/api/v1.codecSelfer1234.decSliceContainerImage /root/gopath/src/github.com/prometheus/prometheus/vendor/k8s.io/client-go/pkg/api/v1/types.generated.go
  464.69MB  4.53% 69.01%   464.69MB  4.53%  github.com/prometheus/prometheus/pkg/labels.(*Builder).Labels /root/gopath/src/github.com/prometheus/prometheus/pkg/labels/labels.go
  446.76MB  4.36% 73.36%   469.76MB  4.58%  github.com/prometheus/prometheus/vendor/k8s.io/client-go/pkg/api/v1.codecSelfer1234.decResourceList /root/gopath/src/github.com/prometheus/prometheus/vendor/k8s.io/client-go/pkg/api/v1/types.generated.go
  318.11MB  3.10% 76.47%   318.11MB  3.10%  github.com/prometheus/prometheus/pkg/textparse.(*Parser).Metric /root/gopath/src/github.com/prometheus/prometheus/pkg/textparse/parse.go
@krasi-georgiev

This comment has been minimized.

Copy link
Member

krasi-georgiev commented May 8, 2018

looks like you are hitting the k8s client leak as well which has been fixed about a week ago. You can either build from the 2.2 or master branch or wait for the next release

@piaoyu

This comment has been minimized.

Copy link

piaoyu commented May 9, 2018

@krasi-georgiev thanks. is it this pr #4117 ?

@krasi-georgiev

This comment has been minimized.

Copy link
Member

krasi-georgiev commented May 9, 2018

yep , that is the one.

@piaoyu

This comment has been minimized.

Copy link

piaoyu commented May 10, 2018

@krasi-georgiev I checkout the release-2.2 and build it . But it looks like also appear.

/prometheus $ prometheus --version
prometheus, version 2.2.1 (branch: HEAD, revision: 94e4a4321761f83207ad11542ee971e7d5220e80)
  build user:       root@sz-bcs-a151
  build date:       20180508-12:56:09
  go version:       go1.9.4

pro

Dropped 381 nodes (cum <= 37.26MB)
Showing top 20 nodes out of 109
      flat  flat%   sum%        cum   cum%
 1338.13MB 17.96% 17.96%  1378.63MB 18.50%  github.com/prometheus/prometheus/vendor/github.com/ugorji/go/codec.(*jsonDecDriver).DecodeString /root/gopath/src/github.com/prometheus/prometheus/vendor/github.com/ugorji/go/codec/json.go
  710.53MB  9.54% 27.49%   710.53MB  9.54%  github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/index.(*decbuf).uvarintStr /root/gopath/src/github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/index/encoding_helpers.go
  700.89MB  9.41% 36.90%   700.89MB  9.41%  github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/index.newReader.func2 /root/gopath/src/github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/index/index.go
  527.22MB  7.08% 43.98%   527.22MB  7.08%  github.com/prometheus/prometheus/pkg/labels.(*Builder).Labels /root/gopath/src/github.com/prometheus/prometheus/pkg/labels/labels.go
  481.77MB  6.47% 50.44%   745.78MB 10.01%  github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/index.(*Reader).readSymbols /root/gopath/src/github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/index/index.go
  387.52MB  5.20% 55.64%  1500.14MB 20.13%  github.com/prometheus/prometheus/vendor/github.com/ugorji/go/codec.fastpathT.DecSliceStringV /root/gopath/src/github.com/prometheus/prometheus/vendor/github.com/ugorji/go/codec/fast-path.generated.go
  294.50MB  3.95% 59.59%   439.51MB  5.90%  github.com/prometheus/prometheus/vendor/k8s.io/client-go/pkg/api/v1.codecSelfer1234.decSliceNodeCondition /root/gopath/src/github.com/prometheus/prometheus/vendor/k8s.io/client-go/pkg/api/v1/types.generated.go
  291.10MB  3.91% 63.50%   291.10MB  3.91%  github.com/prometheus/prometheus/pkg/textparse.(*Parser).Metric /root/gopath/src/github.com/prometheus/prometheus/pkg/textparse/parse.go
  290.51MB  3.90% 67.40%   290.51MB  3.90%  github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb.(*decbuf).uvarintStr /root/gopath/src/github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/encoding_helpers.go
  267.09MB  3.58% 70.99%   557.60MB  7.48%  github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb.(*walReader).decodeSeries /root/gopath/src/github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/wal.go
  258.03MB  3.46% 74.45%   258.03MB  3.46%  github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/chunkenc.NewXORChunk /root/gopath/src/github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/chunkenc/xor.go
  247.64MB  3.32% 77.77%   260.14MB  3.49%  github.com/prometheus/prometheus/vendor/k8s.io/client-go/pkg/api/v1.codecSelfer1234.decResourceList /root/gopath/src/github.com/prometheus/prometheus/vendor/k8s.io/client-go/pkg/api/v1/types.generated.go
  239.40MB  3.21% 80.98%  1739.55MB 23.35%  github.com/prometheus/prometheus/vendor/k8s.io/client-go/pkg/api/v1.codecSelfer1234.decSliceContainerImage /root/gopath/src/github.com/prometheus/prometheus/vendor/k8s.io/client-go/pkg/api/v1/types.generated.go
@krasi-georgiev

This comment has been minimized.

Copy link
Member

krasi-georgiev commented May 10, 2018

comparing both it looks like it is using half RAM 4gb vs 2gb on this one.

Are you running some expensive query? The rest of the memory usage looks like some json coding encoding. If you run some expensive query you might lower the total memory usage by using a recording rule.

@piaoyu

This comment has been minimized.

Copy link

piaoyu commented May 10, 2018

@krasi-georgiev Yes. but the k8s.io/client-go also using so many RAM.

Showing top 10 nodes out of 89
      flat  flat%   sum%        cum   cum%
 2818.77MB 26.77% 26.77%  2900.77MB 27.55%  github.com/prometheus/prometheus/vendor/github.com/ugorji/go/codec.(*jsonDecDriver).DecodeString /root/gopath/src/github.com/prometheus/prometheus/vendor/github.com/ugorji/go/codec/json.go
  765.05MB  7.27% 34.04%  3119.80MB 29.63%  github.com/prometheus/prometheus/vendor/github.com/ugorji/go/codec.fastpathT.DecSliceStringV /root/gopath/src/github.com/prometheus/prometheus/vendor/github.com/ugorji/go/codec/fast-path.generated.go
  706.03MB  6.71% 40.74%   706.03MB  6.71%  github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/index.(*decbuf).uvarintStr /root/gopath/src/github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/index/encoding_helpers.go
  701.39MB  6.66% 47.40%   701.39MB  6.66%  github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/index.newReader.func2 /root/gopath/src/github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/index/index.go
  644.09MB  6.12% 53.52%   952.60MB  9.05%  github.com/prometheus/prometheus/vendor/k8s.io/client-go/pkg/api/v1.codecSelfer1234.decSliceNodeCondition /root/gopath/src/github.com/prometheus/prometheus/vendor/k8s.io/client-go/pkg/api/v1/types.generated.go
  534.23MB  5.07% 58.59%   534.23MB  5.07%  github.com/prometheus/prometheus/pkg/labels.(*Builder).Labels /root/gopath/src/github.com/prometheus/prometheus/pkg/labels/labels.go
  515.88MB  4.90% 63.49%  3635.68MB 34.53%  github.com/prometheus/prometheus/vendor/k8s.io/client-go/pkg/api/v1.codecSelfer1234.decSliceContainerImage /root/gopath/src/github.com/prometheus/prometheus/vendor/k8s.io/client-go/pkg/api/v1/types.generated.go
  481.77MB  4.58% 68.07%   742.78MB  7.05%  github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/index.(*Reader).readSymbols /root/gopath/src/github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/index/index.go
  456.76MB  4.34% 72.41%   486.26MB  4.62%  github.com/prometheus/prometheus/vendor/k8s.io/client-go/pkg/api/v1.codecSelfer1234.decResourceList /root/gopath/src/github.com/prometheus/prometheus/vendor/k8s.io/client-go/pkg/api/v1/types.generated.go
  286.10MB  2.72% 75.12%   286.10MB  2.72%  github.com/prometheus/prometheus/pkg/textparse.(*Parser).Metric /root/gopath/src/github.com/prometheus/prometheus/pkg/textparse/parse.go
@krasi-georgiev

This comment has been minimized.

Copy link
Member

krasi-georgiev commented May 10, 2018

so still high memory usage in the k8s client somewhere.
@piaoyu in relation to the k8s lines the memory usage grows constantly or goes up and down?
What changes have you done last before this started happening?
Upgrade Prometheus, increased POD numbers etc?

It might help to look at your config and also if you mention the scale of your deployment and total PODs being discovered.

@beorn7 how much improvement in the memory usage did you see after applying the k8s SD refactoring.

@beorn7

This comment has been minimized.

Copy link
Member

beorn7 commented May 11, 2018

Built from what's currently in the release-2.2 and also master, I see "normal" memory behavior, i.e. the usual up and down from expanding WAL and compaction. There is no upwards trend anymore, while the previous state was infinite growth with OOMs every few days (but only if using K8s SD). The baseline memory usage (before the growth kicks in) hasn't changed much.

@krasi-georgiev

This comment has been minimized.

Copy link
Member

krasi-georgiev commented May 11, 2018

@piaoyu does your Prometheus instance gets OOM killed as well?

@piaoyu

This comment has been minimized.

Copy link

piaoyu commented May 12, 2018

@krasi-georgiev Yes. I put one of prometheus out of expensive query. but it also gets OOM killed

Showing top 20 nodes out of 100
      flat  flat%   sum%        cum   cum%
 1472.14MB 19.41% 19.41%  1516.64MB 20.00%  github.com/prometheus/prometheus/vendor/github.com/ugorji/go/codec.(*jsonDecDriver).DecodeString /root/gopath/src/github.com/prometheus/prometheus/vendor/github.com/ugorji/go/codec/json.go
  715.53MB  9.44% 28.85%   715.53MB  9.44%  github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/index.(*decbuf).uvarintStr /root/gopath/src/github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/index/encoding_helpers.go
  713.52MB  9.41% 38.26%   713.52MB  9.41%  github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/index.newReader.func2 /root/gopath/src/github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/index/index.go
  482.70MB  6.37% 44.62%   482.70MB  6.37%  github.com/prometheus/prometheus/pkg/labels.(*Builder).Labels /root/gopath/src/github.com/prometheus/prometheus/pkg/labels/labels.go
  482.27MB  6.36% 50.98%   742.78MB  9.79%  github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/index.(*Reader).readSymbols /root/gopath/src/github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/index/index.go
  435.03MB  5.74% 56.72%  1667.16MB 21.98%  github.com/prometheus/prometheus/vendor/github.com/ugorji/go/codec.fastpathT.DecSliceStringV /root/gopath/src/github.com/prometheus/prometheus/vendor/github.com/ugorji/go/codec/fast-path.generated.go
  347.59MB  4.58% 61.30%   502.59MB  6.63%  github.com/prometheus/prometheus/vendor/k8s.io/client-go/pkg/api/v1.codecSelfer1234.decSliceNodeCondition /root/gopath/src/github.com/prometheus/prometheus/vendor/k8s.io/client-go/pkg/api/v1/types.generated.go
  311.11MB  4.10% 65.41%   311.11MB  4.10%  github.com/prometheus/prometheus/pkg/textparse.(*Parser).Metric /root/gopath/src/github.com/prometheus/prometheus/pkg/textparse/parse.go
  296.52MB  3.91% 69.32%  1962.68MB 25.88%  github.com/prometheus/prometheus/vendor/k8s.io/client-go/pkg/api/v1.codecSelfer1234.decSliceContainerImage /root/gopath/src/github.com/prometheus/prometheus/vendor/k8s.io/client-go/pkg/api/v1/types.generated.go
  277.01MB  3.65% 72.97%   277.01MB  3.65%  github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb.(*decbuf).uvarintStr /root/gopath/src/github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/encoding_helpers.go
  269.65MB  3.56% 76.52%   282.66MB  3.73%  github.com/prometheus/prometheus/vendor/k8s.io/client-go/pkg/api/v1.codecSelfer1234.decResourceList /root/gopath/src/github.com/prometheus/prometheus/vendor/k8s.io/client-go/pkg/api/v1/types.generated.go
  256.59MB  3.38% 79.91%   533.60MB  7.04%  github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb.(*walReader).decodeSeries /root/gopath/src/github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/wal.go
  196.29MB  2.59% 82.50%   196.29MB  2.59%  github.com/prometheus/prometheus/retrieval.(*scrapeCache).trackStaleness /root/gopath/src/github.com/prometheus/prometheus/retrieval/scrape.go
  135.02MB  1.78% 84.28%   135.02MB  1.78%  github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb.newMemSeries /root/gopath/src/github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/head.go
  133.59MB  1.76% 86.04%   133.59MB  1.76%  reflect.unsafe_New /root/working/go/src/runtime/malloc.go
  129.03MB  1.70% 87.74%   179.03MB  2.36%  github.com/prometheus/prometheus/vendor/github.com/ugorji/go/codec.fastpathT.DecMapStringStringV /root/gopath/src/github.com/prometheus/prometheus/vendor/github.com/ugorji/go/codec/fast-path.generated.go
  127.87MB  1.69% 89.43%   127.87MB  1.69%  github.com/prometheus/prometheus/retrieval.(*scrapeCache).addRef /root/gopath/src/github.com/prometheus/prometheus/retrieval/scrape.go
  111.01MB  1.46% 90.89%   111.01MB  1.46%  github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/chunkenc.NewXORChunk /root/gopath/src/github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/chunkenc/xor.go
   88.68MB  1.17% 92.06%    88.68MB  1.17%  github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/index.(*MemPostings).addFor /root/gopath/src/github.com/prometheus/prometheus/vendor/github.com/prometheus/tsdb/index/postings.go
   83.54MB  1.10% 93.16%    83.54MB  1.10%  github.com/prometheus/prometheus/pkg/pool.(*BytesPool).Get /root/gopath/src/github.com/prometheus/prometheus/pkg/pool/pool.go
@krasi-georgiev

This comment has been minimized.

Copy link
Member

krasi-georgiev commented May 14, 2018

I am not 100% sure ,but I think it OOM is normal when you have such an expensive query and run so frequently that it doesn't fit in the memory so maybe your problem is there and not in the k8s SD.

Have you tried using a recording rule?

@piaoyu

This comment has been minimized.

Copy link

piaoyu commented May 14, 2018

@krasi-georgiev Yes . We Use

record: instance:container_memory_ratio expr: container_memory_rss   / container_spec_memory_limit_bytes. 

I will put one of prometheus out of all recording rule to test

@beorn7

This comment has been minimized.

Copy link
Member

beorn7 commented May 14, 2018

Expensive queries are indeed a “legitimate” way to OOM Prometheus (of any version – while we should have that, there is really no self-protection against queries that grab a lof of RAM).

To diagnose a memory leak in SD (like here) or somewhere else outside the query engine (e.g. storage), it's best to run a server without any queries (none via the API, and no recording rules configured).

@piaoyu

This comment has been minimized.

Copy link

piaoyu commented May 14, 2018

@beorn7 yes, none via the API, and no recording rules configured is use to one of prometheus.

@krasi-georgiev

This comment has been minimized.

Copy link
Member

krasi-georgiev commented May 14, 2018

@piaoyu sorry I didn't quite understand what you mean?

@piaoyu

This comment has been minimized.

Copy link

piaoyu commented May 15, 2018

@krasi-georgiev I mean that I will put the prometheus out of all recording rule and query to test.
And find it also gets OOM killed.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented May 15, 2018

If you believe you have a memory leak, please file a new bug as it's getting a bit difficult to track all the threads in this closed bug.

@piaoyu

This comment has been minimized.

Copy link

piaoyu commented May 15, 2018

@brian-brazil OK . I submit a issue #4164

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 22, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 22, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.