Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

memory leak in kubelet 1.12.5 #73587

Closed
szuecs opened this issue Jan 31, 2019 · 35 comments · Fixed by #74636
Closed

memory leak in kubelet 1.12.5 #73587

szuecs opened this issue Jan 31, 2019 · 35 comments · Fixed by #74636
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/node Categorizes an issue or PR as relevant to SIG Node.
Milestone

Comments

@szuecs
Copy link
Member

szuecs commented Jan 31, 2019

What happened:
After upgrading to kubernetes 1.12.5 we observe failing nodes, that are caused by kubelet eating all over the memory after some time.

image

I use image k8s.gcr.io/hyperkube:v1.12.5 to run kubelet on 102 clusters and since a week we see some nodes leaking memory, caused by kubelet.

I investigated some of these kubelets with strace and pprof.
With 3s of running strace I saw >= 50 openat() calls to the same file from the same threadid (pid) from kubelet:

    50 [pid 12416] openat(10, "__pycache__", O_RDONLY|O_NOCTTY|O_NONBLOCK|O_DIRECTORY|O_NOFOLLOW) = 8
    54 [pid 12368] openat(6, "__pycache__", O_RDONLY|O_NOCTTY|O_NONBLOCK|O_DIRECTORY|O_NOFOLLOW) = 4
    72 [pid 12368] openat(8, "__pycache__", O_RDONLY|O_NOCTTY|O_NONBLOCK|O_DIRECTORY|O_NOFOLLOW) = 4
    80 [pid 12368] openat(4, "__pycache__", O_RDONLY|O_NOCTTY|O_NONBLOCK|O_DIRECTORY|O_NOFOLLOW) = 3
    85 [pid 12416] openat(8, "__pycache__", O_RDONLY|O_NOCTTY|O_NONBLOCK|O_DIRECTORY|O_NOFOLLOW) = 6

If I do pprof kubelet it shows client_go metrics and compress is taking over most of the compute time.

(pprof) root@ip-172-31-10-50:~# go tool pprof  http://localhost:10248/debug/pprof/heap
Fetching profile from http://localhost:10248/debug/pprof/heap
Saved profile in /root/pprof/pprof.hyperkube.localhost:10248.alloc_objects.alloc_space.inuse_objects.inuse_space.002.pb.gz
Entering interactive mode (type "help" for commands)
(pprof) top
2406.93MB of 2451.55MB total (98.18%)
Dropped 2863 nodes (cum <= 12.26MB)
Showing top 10 nodes out of 34 (cum >= 2411.39MB)
      flat  flat%   sum%        cum   cum%
 2082.07MB 84.93% 84.93%  2082.07MB 84.93%  k8s.io/kubernetes/vendor/github.com/beorn7/perks/quantile.newStream (inline)
  311.65MB 12.71% 97.64%  2398.72MB 97.84%  k8s.io/kubernetes/vendor/github.com/prometheus/client_golang/prometheus.newSummary
   10.71MB  0.44% 98.08%  2414.43MB 98.49%  k8s.io/kubernetes/vendor/github.com/prometheus/client_golang/prometheus.(*MetricVec).getOrCreateMetricWithLabelValues
    2.50MB   0.1% 98.18%  2084.57MB 85.03%  k8s.io/kubernetes/vendor/github.com/beorn7/perks/quantile.NewTargeted
         0     0% 98.18%  2412.06MB 98.39%  k8s.io/kubernetes/cmd/kubelet/app.startKubelet.func1
         0     0% 98.18%  2412.06MB 98.39%  k8s.io/kubernetes/pkg/kubelet.(*Kubelet).HandlePodAdditions
         0     0% 98.18%  2412.06MB 98.39%  k8s.io/kubernetes/pkg/kubelet.(*Kubelet).Run
         0     0% 98.18%  2412.06MB 98.39%  k8s.io/kubernetes/pkg/kubelet.(*Kubelet).syncLoop
         0     0% 98.18%  2412.06MB 98.39%  k8s.io/kubernetes/pkg/kubelet.(*Kubelet).syncLoopIteration
         0     0% 98.18%  2411.39MB 98.36%  k8s.io/kubernetes/pkg/kubelet/pod.(*basicManager).AddPod

Memory profile png:
mem_profile001

The reflector metrics seem to be the problem:

curl -s localhost:10255/metrics | sed 's/{.*//' | sort | uniq -c | sort -nr
  25749 reflector_watch_duration_seconds
  25749 reflector_list_duration_seconds
  25749 reflector_items_per_watch
  25749 reflector_items_per_list
   8583 reflector_watches_total
   8583 reflector_watch_duration_seconds_sum
   8583 reflector_watch_duration_seconds_count
   8583 reflector_short_watches_total
   8583 reflector_lists_total
   8583 reflector_list_duration_seconds_sum
   8583 reflector_list_duration_seconds_count
   8583 reflector_last_resource_version
   8583 reflector_items_per_watch_sum
   8583 reflector_items_per_watch_count
   8583 reflector_items_per_list_sum
   8583 reflector_items_per_list_count
    165 storage_operation_duration_seconds_bucket
     51 kubelet_runtime_operations_latency_microseconds
     44 rest_client_request_latency_seconds_bucket
     33 kubelet_docker_operations_latency_microseconds
     17 kubelet_runtime_operations_latency_microseconds_sum
     17 kubelet_runtime_operations_latency_microseconds_count
     17 kubelet_runtime_operations

What you expected to happen:

I expect that kubelet does not need so much memory

How to reproduce it (as minimally and precisely as possible):

I don't know

Anything else we need to know?:

One of the affected clusters has only 120 Pods and 3 Pods are in CrashLoopBackOff state, one pod since 6 days on an affected node which was investigated.

Environment:

  • Kubernetes version (use kubectl version): v1.12.5
  • Cloud provider or hardware configuration: AWS
  • OS (e.g. from /etc/os-release):
# cat /etc/os-release
NAME="Container Linux by CoreOS"
ID=coreos
VERSION=1800.7.0
VERSION_ID=1800.7.0
BUILD_ID=2018-08-15-2254
PRETTY_NAME="Container Linux by CoreOS 1800.7.0 (Rhyolite)"
ANSI_COLOR="38;5;75"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://issues.coreos.com"
COREOS_BOARD="amd64-usr"
@szuecs szuecs added the kind/bug Categorizes issue or PR as related to a bug. label Jan 31, 2019
@k8s-ci-robot k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Jan 31, 2019
@szuecs
Copy link
Member Author

szuecs commented Jan 31, 2019

/sig node

@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jan 31, 2019
@Shnatsel
Copy link

@kubernetes/sig-node-bugs

@k8s-ci-robot
Copy link
Contributor

@Shnatsel: Reiterating the mentions to trigger a notification:
@kubernetes/sig-node-bugs

In response to this:

@kubernetes/sig-node-bugs

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@aermakov-zalando
Copy link

aermakov-zalando commented Jan 31, 2019

I think it might be a side effect of #71731, but I couldn't find any numbers in there. The comments mention an offline discussion, was this brought up?

@aermakov-zalando
Copy link

What happens (I only investigated it very briefly) is that kubelet creates a bunch of reflectors for every pod — secret mounts, configmap mounts, token mounts etc. Each reflector registers a number of histograms, gauges, etc. and even when the reflector is stopped and removed the metrics are never removed. After a few days on a busy cluster you get millions of metrics bloating up kubelet and everything else using the reflectors.

@aermakov-zalando
Copy link

This was broken since v1.12.0, so probably unrelated to #71731.

@dims
Copy link
Member

dims commented Jan 31, 2019

correct @aermakov-zalando that PR is only in v1.14.0-alpha.2 v1.14.0-alpha.1 and master

@szuecs
Copy link
Member Author

szuecs commented Jan 31, 2019

For everyone that finds this issue and needs a patch to disable the reflector metrics:

diff --git c/pkg/util/reflector/prometheus/prometheus.go i/pkg/util/reflector/prometheus/prometheus.go
index 958a0007cd..63657e9c55 100644
--- c/pkg/util/reflector/prometheus/prometheus.go
+++ i/pkg/util/reflector/prometheus/prometheus.go
@@ -85,8 +85,6 @@ func init() {
 	prometheus.MustRegister(watchDuration)
 	prometheus.MustRegister(itemsPerWatch)
 	prometheus.MustRegister(lastResourceVersion)
-
-	cache.SetReflectorMetricsProvider(prometheusMetricsProvider{})
 }
 
 type prometheusMetricsProvider struct{}

@feiskyer
Copy link
Member

@szuecs Do you have same metrics after disabling ReflectorMetricsProvider?

@szuecs
Copy link
Member Author

szuecs commented Jan 31, 2019

no, it drops the reflector metrics

@yujuhong
Copy link
Contributor

@wojtek-t is this indirectly caused by switching to watch-based managers?

@yujuhong yujuhong added this to the v1.14 milestone Jan 31, 2019
@yujuhong yujuhong added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Jan 31, 2019
@wojtek-t
Copy link
Member

wojtek-t commented Feb 1, 2019

Yeah - switching to watch resulted in more extensive using of a reflector.
It seems that those metrics are directly coming from reflector (I wasn't aware of those).
So it's highly probably that this change cause it.

I think that we don't really need those metrics, so if we could switch them off in Kubelets, that should solve the problem.

@wojtek-t
Copy link
Member

wojtek-t commented Feb 1, 2019

#73624 is sent out for review to fix that.
We should probably cherrypick it to previous releases if we agree that's the right way to solve it.

@wojtek-t
Copy link
Member

wojtek-t commented Feb 1, 2019

Actually, I realized that I don't fully understand the problem.
It should be a problem in default Kubernetes (what we build from this repo), because those metrics are not switched on for kubelet:

  • metrics factory is defaulted to noop
  • the prometheus metrics need to be explicitly linked and initialized (as they are in apiserver e.g.:
    _ "k8s.io/kubernetes/pkg/util/reflector/prometheus" // for reflector metric registration
    )

@aermakov-zalando
Copy link

aermakov-zalando commented Feb 1, 2019

Wouldn't it be better to rewrite the reflector metrics so they're aggregated in a better way rather than relying on people not accidentally enabling them? Or at least put a huge warning on top saying "this will leak memory like crazy" just so the same situation doesn't repeat in other code using client-go?

@wojtek-t
Copy link
Member

wojtek-t commented Feb 1, 2019

@aermakov-zalando - that's a separate issue whether the metrics are reasonable or not (and node sig-node related).
One thing that I would like to confirm: do you customize k8s yourself? I would like to ensure that this issue you faced is in your setup and not in upstream k8s.

@mikkeloscar
Copy link
Contributor

mikkeloscar commented Feb 1, 2019

Our production configuration is this one: https://github.com/zalando-incubator/kubernetes-on-aws/tree/beta

Kubelet config: https://github.com/zalando-incubator/kubernetes-on-aws/blob/85f3ddea60900a7880d03e8c9c5ed7b158b100c7/cluster/node-pools/worker-default/userdata.clc.yaml#L144-L207

We run with an image called registry.opensource.zalan.do/teapot/hyperkube:v1.12.5, but it's just a mirror of k8s.gcr.io/hyperkube:v1.12.5. So upstream Kubernetes.

@szuecs
Copy link
Member Author

szuecs commented Feb 1, 2019

I try to figure out an issue why these metrics are even there and I found issues from the past showing leaks already in older versions:
#52121
#53485

Sounds for, that we need a postmortem to make this not happening again.

I don't find any issue in kubernetes nor in client-go, that could reasonable explain why these were introduced in the first place. I guess I just did not found it, and I hope someone can highlight where the decision came from.

@shrinandj
Copy link

I am running Kubernetes v1.12.3 and I don't see reflector metrics being used by kubelet (curl -s localhost:10255/metrics | sed 's/{.*//' | sort | uniq -c | sort -nr | grep reflector shows nothing).

FWIW, the cluster is created using kops.

Any idea, how are reflector metrics enabled?

@derekwaynecarr
Copy link
Member

@sjenning and myself will also look into this as well and see what we can find.

@aermakov-zalando
Copy link

I would also suggest not using imports for side effects because this usually creates hard to debug and non-obvious problems like this.

@feiskyer
Copy link
Member

I would also suggest not using imports for side effects because this usually creates hard to debug and non-obvious problems like this.

@aermakov-zalando So you mean disable the reflector metrics totally?

@aermakov-zalando
Copy link

aermakov-zalando commented Feb 12, 2019

No, I suggest changing the code so that the end users (kubelet, apiserver, etc) would have to explicitly enable the metrics by calling a function instead of having this happen as a side effect of an import statement.

@feiskyer
Copy link
Member

explicitly enable the metrics by calling a function instead of having this happen as a side effect of an import statement.

LGTM. @wojtek-t @derekwaynecarr @yujuhong WDYT? We should probably do this for both kubelet and control plane services (apiserver and controller-manager).

@wojtek-t
Copy link
Member

I'm fine with that.

@yujuhong
Copy link
Contributor

I'm fine with that.

+1

@derekwaynecarr
Copy link
Member

+1

@logicalhan
Copy link
Member

Why wouldn't we want to just actually disable this metric wholesale? Enabling this flag would basically be the same thing as saying enable a memory leak, no?

@lavalamp
Copy link
Member

/assign @logicalhan

@lavalamp
Copy link
Member

Han points out that, since the metric has a random suffix each time, it's not very useful for monitoring anyway.

@lavalamp
Copy link
Member

/sig api-machinery

@k8s-ci-robot k8s-ci-robot added the sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. label Feb 26, 2019
openshift-publish-robot pushed a commit to openshift/kubernetes that referenced this issue Mar 1, 2019
…emory leak

ref: kubernetes#73587
ref: kubernetes#74636

Origin-commit: 01380498b02d6dee75e52d9ce54e9a5dffef24fb
openshift-publish-robot pushed a commit to openshift/kubernetes-client-go that referenced this issue Mar 1, 2019
…emory leak

ref: kubernetes/kubernetes#73587
ref: kubernetes/kubernetes#74636

Origin-commit: 01380498b02d6dee75e52d9ce54e9a5dffef24fb

Kubernetes-commit: f77a2c16c80223249ead526ca12caa6962117888
deads2k pushed a commit to deads2k/kubernetes that referenced this issue Mar 14, 2019
…emory leak

ref: kubernetes#73587
ref: kubernetes#74636

Origin-commit: 01380498b02d6dee75e52d9ce54e9a5dffef24fb
openshift-publish-robot pushed a commit to openshift/kubernetes-client-go that referenced this issue Mar 15, 2019
…emory leak

ref: kubernetes/kubernetes#73587
ref: kubernetes/kubernetes#74636

Origin-commit: 01380498b02d6dee75e52d9ce54e9a5dffef24fb

Kubernetes-commit: fd85bbcb7e0922b8889c85fad1f5f2d4ca7a3fa7
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/node Categorizes an issue or PR as relevant to SIG Node.
Projects
None yet
Development

Successfully merging a pull request may close this issue.