Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

memory leak in kubelet 1.12.5 #73587

Closed
szuecs opened this issue Jan 31, 2019 · 35 comments

Comments

@szuecs
Copy link
Member

commented Jan 31, 2019

What happened:
After upgrading to kubernetes 1.12.5 we observe failing nodes, that are caused by kubelet eating all over the memory after some time.

image

I use image k8s.gcr.io/hyperkube:v1.12.5 to run kubelet on 102 clusters and since a week we see some nodes leaking memory, caused by kubelet.

I investigated some of these kubelets with strace and pprof.
With 3s of running strace I saw >= 50 openat() calls to the same file from the same threadid (pid) from kubelet:

    50 [pid 12416] openat(10, "__pycache__", O_RDONLY|O_NOCTTY|O_NONBLOCK|O_DIRECTORY|O_NOFOLLOW) = 8
    54 [pid 12368] openat(6, "__pycache__", O_RDONLY|O_NOCTTY|O_NONBLOCK|O_DIRECTORY|O_NOFOLLOW) = 4
    72 [pid 12368] openat(8, "__pycache__", O_RDONLY|O_NOCTTY|O_NONBLOCK|O_DIRECTORY|O_NOFOLLOW) = 4
    80 [pid 12368] openat(4, "__pycache__", O_RDONLY|O_NOCTTY|O_NONBLOCK|O_DIRECTORY|O_NOFOLLOW) = 3
    85 [pid 12416] openat(8, "__pycache__", O_RDONLY|O_NOCTTY|O_NONBLOCK|O_DIRECTORY|O_NOFOLLOW) = 6

If I do pprof kubelet it shows client_go metrics and compress is taking over most of the compute time.

(pprof) root@ip-172-31-10-50:~# go tool pprof  http://localhost:10248/debug/pprof/heap
Fetching profile from http://localhost:10248/debug/pprof/heap
Saved profile in /root/pprof/pprof.hyperkube.localhost:10248.alloc_objects.alloc_space.inuse_objects.inuse_space.002.pb.gz
Entering interactive mode (type "help" for commands)
(pprof) top
2406.93MB of 2451.55MB total (98.18%)
Dropped 2863 nodes (cum <= 12.26MB)
Showing top 10 nodes out of 34 (cum >= 2411.39MB)
      flat  flat%   sum%        cum   cum%
 2082.07MB 84.93% 84.93%  2082.07MB 84.93%  k8s.io/kubernetes/vendor/github.com/beorn7/perks/quantile.newStream (inline)
  311.65MB 12.71% 97.64%  2398.72MB 97.84%  k8s.io/kubernetes/vendor/github.com/prometheus/client_golang/prometheus.newSummary
   10.71MB  0.44% 98.08%  2414.43MB 98.49%  k8s.io/kubernetes/vendor/github.com/prometheus/client_golang/prometheus.(*MetricVec).getOrCreateMetricWithLabelValues
    2.50MB   0.1% 98.18%  2084.57MB 85.03%  k8s.io/kubernetes/vendor/github.com/beorn7/perks/quantile.NewTargeted
         0     0% 98.18%  2412.06MB 98.39%  k8s.io/kubernetes/cmd/kubelet/app.startKubelet.func1
         0     0% 98.18%  2412.06MB 98.39%  k8s.io/kubernetes/pkg/kubelet.(*Kubelet).HandlePodAdditions
         0     0% 98.18%  2412.06MB 98.39%  k8s.io/kubernetes/pkg/kubelet.(*Kubelet).Run
         0     0% 98.18%  2412.06MB 98.39%  k8s.io/kubernetes/pkg/kubelet.(*Kubelet).syncLoop
         0     0% 98.18%  2412.06MB 98.39%  k8s.io/kubernetes/pkg/kubelet.(*Kubelet).syncLoopIteration
         0     0% 98.18%  2411.39MB 98.36%  k8s.io/kubernetes/pkg/kubelet/pod.(*basicManager).AddPod

Memory profile png:
mem_profile001

The reflector metrics seem to be the problem:

curl -s localhost:10255/metrics | sed 's/{.*//' | sort | uniq -c | sort -nr
  25749 reflector_watch_duration_seconds
  25749 reflector_list_duration_seconds
  25749 reflector_items_per_watch
  25749 reflector_items_per_list
   8583 reflector_watches_total
   8583 reflector_watch_duration_seconds_sum
   8583 reflector_watch_duration_seconds_count
   8583 reflector_short_watches_total
   8583 reflector_lists_total
   8583 reflector_list_duration_seconds_sum
   8583 reflector_list_duration_seconds_count
   8583 reflector_last_resource_version
   8583 reflector_items_per_watch_sum
   8583 reflector_items_per_watch_count
   8583 reflector_items_per_list_sum
   8583 reflector_items_per_list_count
    165 storage_operation_duration_seconds_bucket
     51 kubelet_runtime_operations_latency_microseconds
     44 rest_client_request_latency_seconds_bucket
     33 kubelet_docker_operations_latency_microseconds
     17 kubelet_runtime_operations_latency_microseconds_sum
     17 kubelet_runtime_operations_latency_microseconds_count
     17 kubelet_runtime_operations

What you expected to happen:

I expect that kubelet does not need so much memory

How to reproduce it (as minimally and precisely as possible):

I don't know

Anything else we need to know?:

One of the affected clusters has only 120 Pods and 3 Pods are in CrashLoopBackOff state, one pod since 6 days on an affected node which was investigated.

Environment:

  • Kubernetes version (use kubectl version): v1.12.5
  • Cloud provider or hardware configuration: AWS
  • OS (e.g. from /etc/os-release):
# cat /etc/os-release
NAME="Container Linux by CoreOS"
ID=coreos
VERSION=1800.7.0
VERSION_ID=1800.7.0
BUILD_ID=2018-08-15-2254
PRETTY_NAME="Container Linux by CoreOS 1800.7.0 (Rhyolite)"
ANSI_COLOR="38;5;75"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://issues.coreos.com"
COREOS_BOARD="amd64-usr"
@szuecs

This comment has been minimized.

Copy link
Member Author

commented Jan 31, 2019

/sig node

@k8s-ci-robot k8s-ci-robot added sig/node and removed needs-sig labels Jan 31, 2019

@Shnatsel

This comment has been minimized.

Copy link

commented Jan 31, 2019

@kubernetes/sig-node-bugs

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

commented Jan 31, 2019

@Shnatsel: Reiterating the mentions to trigger a notification:
@kubernetes/sig-node-bugs

In response to this:

@kubernetes/sig-node-bugs

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@aermakov-zalando

This comment has been minimized.

Copy link

commented Jan 31, 2019

I think it might be a side effect of #71731, but I couldn't find any numbers in there. The comments mention an offline discussion, was this brought up?

@aermakov-zalando

This comment has been minimized.

Copy link

commented Jan 31, 2019

What happens (I only investigated it very briefly) is that kubelet creates a bunch of reflectors for every pod — secret mounts, configmap mounts, token mounts etc. Each reflector registers a number of histograms, gauges, etc. and even when the reflector is stopped and removed the metrics are never removed. After a few days on a busy cluster you get millions of metrics bloating up kubelet and everything else using the reflectors.

@aermakov-zalando

This comment has been minimized.

Copy link

commented Jan 31, 2019

This was broken since v1.12.0, so probably unrelated to #71731.

@dims

This comment has been minimized.

Copy link
Member

commented Jan 31, 2019

correct @aermakov-zalando that PR is only in v1.14.0-alpha.2 v1.14.0-alpha.1 and master

@szuecs

This comment has been minimized.

Copy link
Member Author

commented Jan 31, 2019

For everyone that finds this issue and needs a patch to disable the reflector metrics:

diff --git c/pkg/util/reflector/prometheus/prometheus.go i/pkg/util/reflector/prometheus/prometheus.go
index 958a0007cd..63657e9c55 100644
--- c/pkg/util/reflector/prometheus/prometheus.go
+++ i/pkg/util/reflector/prometheus/prometheus.go
@@ -85,8 +85,6 @@ func init() {
 	prometheus.MustRegister(watchDuration)
 	prometheus.MustRegister(itemsPerWatch)
 	prometheus.MustRegister(lastResourceVersion)
-
-	cache.SetReflectorMetricsProvider(prometheusMetricsProvider{})
 }
 
 type prometheusMetricsProvider struct{}
@feiskyer

This comment has been minimized.

Copy link
Member

commented Jan 31, 2019

@szuecs Do you have same metrics after disabling ReflectorMetricsProvider?

@szuecs

This comment has been minimized.

Copy link
Member Author

commented Jan 31, 2019

no, it drops the reflector metrics

@yujuhong

This comment has been minimized.

Copy link
Member

commented Jan 31, 2019

@wojtek-t is this indirectly caused by switching to watch-based managers?

@yujuhong yujuhong added this to the v1.14 milestone Jan 31, 2019

@wojtek-t

This comment has been minimized.

Copy link
Member

commented Feb 1, 2019

Yeah - switching to watch resulted in more extensive using of a reflector.
It seems that those metrics are directly coming from reflector (I wasn't aware of those).
So it's highly probably that this change cause it.

I think that we don't really need those metrics, so if we could switch them off in Kubelets, that should solve the problem.

@wojtek-t

This comment has been minimized.

Copy link
Member

commented Feb 1, 2019

#73624 is sent out for review to fix that.
We should probably cherrypick it to previous releases if we agree that's the right way to solve it.

@wojtek-t

This comment has been minimized.

Copy link
Member

commented Feb 1, 2019

Actually, I realized that I don't fully understand the problem.
It should be a problem in default Kubernetes (what we build from this repo), because those metrics are not switched on for kubelet:

  • metrics factory is defaulted to noop
  • the prometheus metrics need to be explicitly linked and initialized (as they are in apiserver e.g.:
    _ "k8s.io/kubernetes/pkg/util/reflector/prometheus" // for reflector metric registration
    )
@aermakov-zalando

This comment has been minimized.

Copy link

commented Feb 1, 2019

Wouldn't it be better to rewrite the reflector metrics so they're aggregated in a better way rather than relying on people not accidentally enabling them? Or at least put a huge warning on top saying "this will leak memory like crazy" just so the same situation doesn't repeat in other code using client-go?

@wojtek-t

This comment has been minimized.

Copy link
Member

commented Feb 1, 2019

@aermakov-zalando - that's a separate issue whether the metrics are reasonable or not (and node sig-node related).
One thing that I would like to confirm: do you customize k8s yourself? I would like to ensure that this issue you faced is in your setup and not in upstream k8s.

@mikkeloscar

This comment has been minimized.

Copy link
Contributor

commented Feb 1, 2019

Our production configuration is this one: https://github.com/zalando-incubator/kubernetes-on-aws/tree/beta

Kubelet config: https://github.com/zalando-incubator/kubernetes-on-aws/blob/85f3ddea60900a7880d03e8c9c5ed7b158b100c7/cluster/node-pools/worker-default/userdata.clc.yaml#L144-L207

We run with an image called registry.opensource.zalan.do/teapot/hyperkube:v1.12.5, but it's just a mirror of k8s.gcr.io/hyperkube:v1.12.5. So upstream Kubernetes.

@szuecs

This comment has been minimized.

Copy link
Member Author

commented Feb 1, 2019

I try to figure out an issue why these metrics are even there and I found issues from the past showing leaks already in older versions:
#52121
#53485

Sounds for, that we need a postmortem to make this not happening again.

I don't find any issue in kubernetes nor in client-go, that could reasonable explain why these were introduced in the first place. I guess I just did not found it, and I hope someone can highlight where the decision came from.

@shrinandj

This comment has been minimized.

Copy link

commented Feb 4, 2019

I am running Kubernetes v1.12.3 and I don't see reflector metrics being used by kubelet (curl -s localhost:10255/metrics | sed 's/{.*//' | sort | uniq -c | sort -nr | grep reflector shows nothing).

FWIW, the cluster is created using kops.

Any idea, how are reflector metrics enabled?

@derekwaynecarr

This comment has been minimized.

Copy link
Member

commented Feb 4, 2019

@sjenning and myself will also look into this as well and see what we can find.

@szuecs

This comment has been minimized.

Copy link
Member Author

commented Feb 6, 2019

@feiskyer

This comment has been minimized.

Copy link
Member

commented Feb 12, 2019

It should be a problem in default Kubernetes (what we build from this repo), because those metrics are not switched on for kubelet:

@wojtek-t The issue only exists if hyperkube is used when running kubelet, as hyperkube has imported the package k8s.io/kubernetes/pkg/util/reflector/prometheus for apiserver and controller-manager. If kubelet binary is used, then there're no reflector metrics (this is same with your findings).

Since the metrics is not exposed by kubelet cmd, what do you think of disabling it for kubelet subcommand in hyperkube?

I can't reproduce the issue (with hyperkube used) on my own cluster:

image

Since both apiserver and controller-manager have also imported the package, @szuecs have you also observed same issues for them?

@wojtek-t

This comment has been minimized.

Copy link
Member

commented Feb 12, 2019

Since the metrics is not exposed by kubelet cmd, what do you think of disabling it for kubelet subcommand in hyperkube?

+1

@aermakov-zalando

This comment has been minimized.

Copy link

commented Feb 12, 2019

@feiskyer Yes, we observe the same thing for apiserver and controller-manager as well.

@aermakov-zalando

This comment has been minimized.

Copy link

commented Feb 12, 2019

I would also suggest not using imports for side effects because this usually creates hard to debug and non-obvious problems like this.

@feiskyer

This comment has been minimized.

Copy link
Member

commented Feb 12, 2019

I would also suggest not using imports for side effects because this usually creates hard to debug and non-obvious problems like this.

@aermakov-zalando So you mean disable the reflector metrics totally?

@aermakov-zalando

This comment has been minimized.

Copy link

commented Feb 12, 2019

No, I suggest changing the code so that the end users (kubelet, apiserver, etc) would have to explicitly enable the metrics by calling a function instead of having this happen as a side effect of an import statement.

@feiskyer

This comment has been minimized.

Copy link
Member

commented Feb 13, 2019

explicitly enable the metrics by calling a function instead of having this happen as a side effect of an import statement.

LGTM. @wojtek-t @derekwaynecarr @yujuhong WDYT? We should probably do this for both kubelet and control plane services (apiserver and controller-manager).

@wojtek-t

This comment has been minimized.

Copy link
Member

commented Feb 13, 2019

I'm fine with that.

@yujuhong

This comment has been minimized.

Copy link
Member

commented Feb 13, 2019

I'm fine with that.

+1

@derekwaynecarr

This comment has been minimized.

Copy link
Member

commented Feb 13, 2019

+1

@logicalhan

This comment has been minimized.

Copy link
Contributor

commented Feb 26, 2019

Why wouldn't we want to just actually disable this metric wholesale? Enabling this flag would basically be the same thing as saying enable a memory leak, no?

@lavalamp

This comment has been minimized.

Copy link
Member

commented Feb 26, 2019

/assign @logicalhan

@lavalamp

This comment has been minimized.

Copy link
Member

commented Feb 26, 2019

Han points out that, since the metric has a random suffix each time, it's not very useful for monitoring anyway.

@lavalamp

This comment has been minimized.

Copy link
Member

commented Feb 26, 2019

/sig api-machinery

openshift-publish-robot pushed a commit to openshift/kubernetes that referenced this issue Mar 1, 2019

UPSTREAM: 74636: Remove reflector metrics as they currently cause a m…
…emory leak

ref: kubernetes#73587
ref: kubernetes#74636

Origin-commit: 01380498b02d6dee75e52d9ce54e9a5dffef24fb

openshift-publish-robot pushed a commit to openshift/kubernetes-client-go that referenced this issue Mar 1, 2019

UPSTREAM: 74636: Remove reflector metrics as they currently cause a m…
…emory leak

ref: kubernetes/kubernetes#73587
ref: kubernetes/kubernetes#74636

Origin-commit: 01380498b02d6dee75e52d9ce54e9a5dffef24fb

Kubernetes-commit: f77a2c16c80223249ead526ca12caa6962117888

deads2k added a commit to deads2k/kubernetes that referenced this issue Mar 14, 2019

UPSTREAM: 74636: Remove reflector metrics as they currently cause a m…
…emory leak

ref: kubernetes#73587
ref: kubernetes#74636

Origin-commit: 01380498b02d6dee75e52d9ce54e9a5dffef24fb

deads2k added a commit to deads2k/kubernetes that referenced this issue Mar 14, 2019

UPSTREAM: 74636: Remove reflector metrics as they currently cause a m…
…emory leak

ref: kubernetes#73587
ref: kubernetes#74636

Origin-commit: 01380498b02d6dee75e52d9ce54e9a5dffef24fb

openshift-publish-robot pushed a commit to openshift/kubernetes-client-go that referenced this issue Mar 15, 2019

UPSTREAM: 74636: Remove reflector metrics as they currently cause a m…
…emory leak

ref: kubernetes/kubernetes#73587
ref: kubernetes/kubernetes#74636

Origin-commit: 01380498b02d6dee75e52d9ce54e9a5dffef24fb

Kubernetes-commit: fd85bbcb7e0922b8889c85fad1f5f2d4ca7a3fa7
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.