Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Many WC processes #39334

Closed
daniel-yavorovich opened this issue Dec 31, 2016 · 28 comments
Closed

Many WC processes #39334

daniel-yavorovich opened this issue Dec 31, 2016 · 28 comments
Labels
sig/node Categorizes an issue or PR as relevant to SIG Node.

Comments

@daniel-yavorovich
Copy link

Dear Sira/Madams,

I use Kubernetes v1.4.6 and see many wc processes. Some processes are running for a month.

# ps aux | grep -c '[wc]'
3790 
# ps auxffww | grep '[wc]' | head -125 | tail -10
root     29274  0.9  0.4 5705996 147268 ?      Ssl  Nov24 505:52 /usr/bin/dockerd -H fd:// --mtu=1472 --bip=10.1.42.1/24
root     29281  0.0  0.0 3388080 20264 ?       Ssl  Nov24   4:53  \_ docker-containerd -l unix:///var/run/docker/libcontainerd/docker-containerd.sock --shim docker-containerd-shim --metrics-interval=0 --start-timeout 2m --state-dir /var/run/docker/libcontainerd/containerd --runtime docker-runc
root     29533  0.0  0.0 282564   936 ?        Sl   Nov24   5:16  |   \_ docker-containerd-shim 43862be3c4f16a0df9258806dff9bb3b4870d72e32b644442d2a57d5c9466a61 /var/run/docker/libcontainerd/43862be3c4f16a0df9258806dff9bb3b4870d72e32b644442d2a57d5c9466a61 docker-runc
root     29549  4.7  0.3 815028 119500 ?       Ssl  Nov24 2536:05  |   |   \_ /hyperkube kubelet --allow-privileged --api-servers=http://localhost:8080 --config=/etc/kubernetes/manifests-multi --cluster-dns=10.0.0.10 --cluster-domain=cluster.local --hostname-override=10.42.233.138 --v=2
root     29585  0.0  0.0 379220 15696 ?        S    Nov24   0:39  |   |       \_ journalctl -k -f
root     32761  0.0  0.0      0     0 ?        Z    Nov26   0:00  |   |       \_ [wc] <defunct>
root      9058  0.0  0.0      0     0 ?        Z    Nov27   0:00  |   |       \_ [wc] <defunct>
root     25592  0.0  0.0      0     0 ?        Z    Nov30   0:00  |   |       \_ [wc] <defunct>
root     12461  0.0  0.0      0     0 ?        Z    Dec01   0:00  |   |       \_ [wc] <defunct>
root     27458  0.0  0.0      0     0 ?        Z    Dec12   0:00  |   |       \_ [wc] <defunct>

You can answer what are generated by these processes, and is not this a bug?

Thank you.

@Typositoire
Copy link

I just restarted Kubelet (after a week) and I had 9000 wc zombies... I run on 1.5.1

@fraenkel
Copy link
Contributor

fraenkel commented Jan 3, 2017

This might be the same as #38894

@BugRoger
Copy link
Contributor

It's not necessarily the same cause as in #38894. We're experiencing the same problem but have a full featured GNU find. It is related though.

@0xmichalis
Copy link
Contributor

@kubernetes/sig-node-misc

@euank
Copy link
Contributor

euank commented Jan 16, 2017

The fixes for #38894 should be in master, just not in any release branch.

@calebamiles wanna see about getting the two commits referenced in #38894 (comment) into the 1.5 release branch?

@dashpole
Copy link
Contributor

The change in cAdvisor: #1558 still needs to be vendored into kubernetes, and cherrypicked to 1.4

@calebamiles
Copy link
Contributor

I actually don't know how cAdvisor is vendored into Kubernetes. @dashpole if you ping me on the PR that vendors into Kubernetes I can try and get it cherry picked.

cc: @euank, @saad-ali

@MartinPyka
Copy link

+1 for adding the fix for #38894 into the next release

@Bregor
Copy link

Bregor commented Mar 9, 2017

@daniel-yavorovich, @Typositoire do you use weave-net?

@daniel-yavorovich
Copy link
Author

@Bregor no, I use flannel only.

Thank you.

@Bregor
Copy link

Bregor commented Mar 10, 2017

@daniel-yavorovich ok, thank you.

@stuszynski
Copy link

stuszynski commented Mar 15, 2017

We're running onv1.5.3 and we also have exact wc processes leak issue. After the few days since the last restart of kubelet, we have hundreds of defunct wc processes with hyperkube kubelet as a parent.

$ ps ax | grep wc 
31902  0.0  0.0      0     0 ?        Z    10:51   0:00 [wc] <defunct>
32042  0.0  0.0      0     0 ?        Z    11:47   0:00 [wc] <defunct>
...
$ ps ax | grep wc | wc -l
1038

I also found some mentions in kubelet log:

Mar 14 21:51:19 kubernetes2 hyperkube[16528]: I0314 21:51:19.284652   16528 fs.go:495] killing cmd [find /var/lib/docker/overlay/4ea252565a33e2f5763648f95eeaa44d9d614fdb212b3d442d1915b1cba06393 -xdev -printf .], and cmd [wc -c] due to timeout(2m0s)

It looks like it has already been solved by this commit: c03ec46 (#39477 ). But it wasn't backported to v1.5 nor v1.4 yet.

@dashpole
Copy link
Contributor

Working on getting it to 1.5, and 1.4 #43113

@golfreeze
Copy link

I've faced this bug also on v.1.5.3 ,
To fixed it , Shall I need to upgrade kubernetes , Right ?
If "YES" , Could you show the step ?
Thank you.

@dashpole
Copy link
Contributor

dashpole commented Apr 7, 2017

The fix is in v1.5.6

@golfreeze
Copy link

Thank you dashpole!!

@calebamiles
Copy link
Contributor

@dashpole are you still planning on fixing this for 1.4? Otherwise could probably close this issue.

@dashpole
Copy link
Contributor

dashpole commented Apr 7, 2017

I am still planning on it. I ran into a roadblock cherrypicking it to 0.24.2 in the cadvisor branch, but I dont remember what it was... Ill try and do that soon

@calebamiles
Copy link
Contributor

Thanks for the update, @dashpole.

@calebamiles
Copy link
Contributor

any news, @dashpole?

@dashpole
Copy link
Contributor

dashpole commented May 1, 2017

Oh, sorry. I dont think ill be able to get to this soon. For those affected, 1.5 and 1.6 both have the fix for this. We can probably just close this.

@k8s-github-robot k8s-github-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label May 31, 2017
@0xmichalis
Copy link
Contributor

/sig node

@k8s-ci-robot k8s-ci-robot added the sig/node Categorizes an issue or PR as relevant to SIG Node. label Jun 21, 2017
@k8s-github-robot k8s-github-robot removed the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Jun 21, 2017
@euank
Copy link
Contributor

euank commented Jun 21, 2017

/close

@k8s-ci-robot
Copy link
Contributor

@euank: you can't close an issue unless you authored it or you are assigned to it.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@euank
Copy link
Contributor

euank commented Jul 26, 2017

Monthly reminder; I think this can be closed as fixed in 1.5+ if someone with the power to close it wishes to do so.

@varsharaja
Copy link

I am seeing the same issue in 1.5.5. What exactly should I do to fix this? Our kubernetes was installed with the old setup-aws scripts and not with kops.

@dashpole
Copy link
Contributor

@varsharaja as I mentioned here, upgrading to 1.5.6 will fix this.

@varsharaja
Copy link

varsharaja commented Jan 13, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sig/node Categorizes an issue or PR as relevant to SIG Node.
Projects
None yet
Development

No branches or pull requests