-
Notifications
You must be signed in to change notification settings - Fork 38.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate goroutine leakage issue in Kubelet #15805
Comments
Similar goroutine leakage from my cluster that had been running for 5 hours. My kubelet happily accumulate over 18000 goroutines.
|
As an record: I measured this before 1.0 release, and found some leakages from cAdvsor. After that fix, we reasonably keep the goroutine down to some number (i.e. 100) This should be a regression. This explains Kubelet CPU, memory usage increasing. |
I found the culprit. TL;DR: go-dockerclient now saves and reuses the connection. Kubelet calls cadvisor to get version info whenever updating the node status. In every A recent go-dockerclient commit fsouza/go-dockerclient@1632686 changes to use http.Client for HTTP requests over UNIX domain socket. This changes the behavior of the client - it no longer closes the connection immediately after the request, instead, it saves it for future use. Note that the cadvisor repository still uses the older go-dockerclient, hence could not have caught the leakage. Removing this issue from v1.1milestone since it only affects HEAD (verified). Simple code to reproduce this issue: |
Classy digging 🏆 |
Some PRs from cadvisor (#909, @jimmidyson's prometheus PRs) and a pending PR for issue #920, might have to be cherry-picked into v1.1. I think resolving this issue right away in cadvisor is necessary. |
We should resolve this issue in cadvisor, but as long as we don't bump the go-dockerclient version in v1.1, v1.1 wouldn't be affected. |
Update: cadvisor also creates a new docker client for each container. With the new go-dockerclient, it would use one connection per container, and never reclaims those connections after the containers die. |
Here are the reason I moved this back to v1.1 milestone: cc/ @pmorie Do we really need AdditionalGroups feature for 1.1 release? Otherwise, we don't need to bump go-dockerclient library to include commit fsouza/go-dockerclient@1632686 |
All pr required from cAdvisor were merged, we just need to cut a new release: google/cadvisor#935, and update our GoDep here. |
@dchen1107 I would like to have SupplementalGroups for use in 1.1, yeah. |
Is this what causes my Kubelets to use a massive amount of memory? |
@paralin If you are running kubelet from HEAD, you might hit by this issue caused by misuse of new go-dockerclient library by cAdvisor. But if you are running 1.0 release, that should be a different issue. |
@dchen1107 It's on HEAD. Is there a fix PR yet? |
Needs cherrypick, also |
I profile kubelet without much workload (one or two pods), and found both goroutine and heap increased crazily. For example, on node:
/debug/pprof/
profiles:
0 block
1010 goroutine
1658 heap
12 threadcreate
And still keeping going.
Among 1010 goroutines: there are >450 with same stack trace:
or
Then another 477 with net/http.(*persistConn).writeLoop
The text was updated successfully, but these errors were encountered: