New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slow "leak" of kubectl exec requests in kube 1.10.5 #67913
Comments
/sig api-machinery |
I witnessed similar behavior and believe its the same issue (or at least related). I can reproduce with traceback. It appears to be an issue within Effectively, to reproduce, spin up a large container that takes time to download (so it stays within the
(anonymized output) HTH. |
/assign @caesarxuchao |
I definitely see many stacks like that in the kubelet.log I just checked. I had noticed those before, but mistakenly associated them with apiserver server returning 404 errors - not the long running requests. From the apiserver perspective... When the client has closed its connection, can the apiserver consider the exec request complete and close its connection to kubelet, etc? That might make the apiserver more tolerant of this kind of situation. |
@caesarxuchao This appears to have fixed a related issue I saw in 1.11.2 - #67659. I do not see that in 1.12 beta.1. What I have read suggests that the PR should fix this issue. We're awaiting the cherry picks to earlier releases to confirm that. |
Is this a BUG REPORT or FEATURE REQUEST?:
/kind bug
What happened:
Over a period of a week or so, we see apiservers grow to over 11 GB of memory. The apiserver /metrics endpoint showed a huge number of long running exec requests:
And completed requests:
Output from
ss
showed a similar number of connections from apiserver to kubelet, but only a small number of client connections to the apiserver.All the exec requests are
sh -c "df -P | sed 1d"
and complete successfully and quickly when run from kubectl. The 404 responses appear to be due to the client making exec requests to pods that no longer exist.We have seen multiple clusters doing this. All clusters are at 1.10.5 (though I do not know if the workload is also being run on 1.9 or earlier). If we restart the apiserver, memory growth and accumulation of long running exec requests resumes. The clusters in question are running customer applications, but appear to be looping through all pods in a given namespace issuing
df -P
requests every 10 seconds or so.apiserver log entries show some number of these requests completing with times ranging from 1 to at least 12 hours.
What you expected to happen:
The exec request ought to complete and the connection to kubelet closed. On the surface, given that the client connections no longer exist, it seems that the apiserver ought to be able to terminate / cleanup a request if it really hasn't completed.
How to reproduce it (as minimally and precisely as possible):
We have not been able to reproduce this problem in portable manner. Killing
kubectl exec
sometimes results in "leaked" exec requests, but as far as we know, the actual requests are running normally.Anything else we need to know?:
Kubernetes 1.11.2 seems to have similar behavior on a much worse scale, and readily reproducable. See #67659
Environment:
kubectl version
):uname -a
):The text was updated successfully, but these errors were encountered: