-
Notifications
You must be signed in to change notification settings - Fork 38.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unclosed connections to kubelet #67659
Comments
/sig node |
I was about to open an issue for this from the apiserver perspective: On an 1.11.2 IBM Cloud Kubernetes Service cluster (and since reproduced in a simple kubeadm cluster), we found the apiserver memory growing. Further investigation showed a large number of long running exec requests reported in the apiserver /metrics endpoint. There were a similar number of connections open to kubelet but only a handful of connections open to the apiserver. We were able to reproduce this in both environments running a simple exec request in a loop: After multiple iteration loops spread over an hour metrics showed roughly 40% of the exec sessions had "leaked":
with 267 connections open to kubelet. We see this to a lesser extent in Kubernetes 1.10.5 - perhaps one percent of requests, and we have only reproduced a handful of occurrences by killing the |
I believe this is related directly to #67913. I've posted a traceback of the issue in |
I think what I see in 1.11 as an easily reproducible problem is different that what was reported in #67913 In my 1.11 single node kubeadm cluster I changed apiserver and kubelet logging to This is first request that "leaked": ubuntu@master-1: In apiserver log I see:
The next attempt did not leak - the gauge is unchanged. ubuntu@master-1: In apiserver log I see:
I went back double checked that "Error proxying data from client to backend" only shows for "non-leaking" exec requests. I assume that should be there and its absence is part of the puzzle. In kubelet.log (covering both requests) I see:
The entry at 13:16:27 corresponds to an earlier exec request that did not leak. These were the last lines of the kubelet log - so no chance that I am omitting kubelet logs entries. |
@jmcmeek Yeah, agreed. There might be two issues here (or this issue and #67913 are indeed different). I have seen those In our cluster, we are seeing it consistently. Rising to around 40k Not sure what would help move this issue forward at this point but I'd be happy to provide it. |
I am also seeing this issue on 1.11.x (not sure what x is ATM), if there are any logs or debugging information you would like to help determine the root cause please let me know |
What I was seeing in 1.11 - frequent "leaked" connections for exec sessions to running containers - seems to be fixed in 1.12.0-beta.1 by #67288. |
Can confirm - This issue seems to be fixed in 1.12 |
I'm using Kubectl client version 1.12.1 on a cluster with 1.10.7 version. And I still see the kubectl connection leaks. |
Right.. Those connections are kept alive by kubelet, and not by kubectl - that's why you're still affected. In my case upgrading the whole cluster (including kubelet) to 1.12 was the solution (my cluster is used for CI purposes, and we had roughly ~50k "orphan" connections after one or two weeks - since the upgrade the number of connections are bellow 150-200 after a few weeks) |
Fix is to upgrade to something containing #67288, which should now be in all releases 1.10+, or to periodically restart the kubelet. |
@MHBauer: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Is this a BUG REPORT or FEATURE REQUEST?:
/kind bug
What happened:
Noticed that pods on specific worker become unavailable after ~1 day,
kubectl exec
/kubectl logs
return next error:Future investigation shows that the number of connections to kubelet is increasing and never close.
On
pod1
are periodically executed(5 sec) commands from other pod, like:kubectl exec pod1 ps
and after introduction of this change number of opened connection on worker started to grow.Important points:
kubectl exec pod1 ps
master - number of open connections is not growing;kubectl exec pod1 ps
from a pod - number of open connections is increasing;kubectl exec
don't help - opened connections aren't resetHow to reproduce it (as minimally and precisely as possible):
On a running K8S cluster:
kubectl exec
(pod2)pod1
, check number of open descriptors for kubelet(ls -1 /proc/<kubelet-pid>/fd | wc -l
), checknetstat
output for kubeletkubectl
commands from pod2 inside of pod1, likekubectl exec pod1 ps
Anything else we need to know?:
Example output of netstat on worker:
Connection number:
pods behind
10.128.2.179
:Environment:
Kubernetes version (use
kubectl version
):Client Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.1", GitCommit:"b1b29978270dc22fecc592ac55d903350454310a", GitTreeState:"clean", BuildDate:"2018-07-17T18:53:20Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.0", GitCommit:"91e7b4fd31fcd3d5f436da26c980becec37ceefe", GitTreeState:"clean", BuildDate:"2018-06-27T20:08:34Z", GoVersion:"go1.10.2", Compiler:"gc", Platform:"linux/amd64"}
Cloud provider or hardware configuration: k8s deployed on gcp vm instances
OS (e.g. from /etc/os-release):
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"
CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"
Kernel (e.g.
uname -a
):Linux k8s-qa2-master-1 3.10.0-862.9.1.el7.x86_64 Unit test coverage in Kubelet is lousy. (~30%) #1 SMP Mon Jul 16 16:29:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Others:
kubelet version v1.11.0
docker Version: 17.03.2-ce
weave-kube:2.2.0
weave-npc:2.2.0
weaveexec:2.2.1
kube-apiserver-amd64:v1.11.0
kube-controller-manager-amd64:v1.11.0
kube-scheduler-amd64:v1.11.0
kube-proxy-amd64:v1.11.0
etcd-amd64:3.2.18
k8s-dns-kube-dns-amd64:1.14.10
k8s-dns-sidecar-amd64:1.14.10
k8s-dns-dnsmasq-nanny-amd64:1.14.10
The text was updated successfully, but these errors were encountered: