-
Notifications
You must be signed in to change notification settings - Fork 39.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kube-proxy process silently hangs on one of nodes #38372
Comments
Reproduced on my env. Kubernetes version : Environment:
|
cc @kubernetes/sig-network-bugs |
@ivan4th has investigated the issue and opened this discussion in golang project golang/go#18925 |
https://github.com/ivan4th/hoseproxy This is small project by @ivan4th to reproduce that bug. |
If kube-proxy is running containerized, then it could be a candidate for health-checks. Otherwise, I think handling this is probably outside the scope of kube-proxy itself, and instead a general issue for various kubernetes components, not the proxy specifically. Could we close this issue in favor of the Golang one, since hang seems to be a Go thing and not actually kube-proxy? |
We have a very similar problem with kubelet. An underlaying bug in go would explain this. Anyone else experiencing that issue with kubelet as well or is it just us? |
This looks like not a big in kube? |
We had same issue at least two times. |
@equinox0815 I have experienced this bug also with kubelet. |
Kubernetes version (use
kubectl version
):Kubernetes v1.4.3+coreos.0
Environment:
6 bare-metal nodes
Ubuntu 16.04.1 LTS (Xenial Xerus)
uname -a
):Linux node1 4.4.0-47-generic e2e-test: expose minion 8080 port #68-Ubuntu SMP Wed Oct 26 19:39:52 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
Kargo
Calico 0.21
What happened:
There is a pod (OpenStack Keystone) that runs on node1 and a service associated with it. Initially the service is reachable from all nodes, e.g. request to http://keystone.ccp:5000/ replies with valid HTTP response.
Keystone pod is restarted, as result the pod gets different IP address, DNS record is updated to point to new address. However the service is not reachable from node2.
The reason is in iptables rules not being updated:
good node
bad node
and the service:
kube-proxy pod is running on all nodes and in ready state:
lsof output shows that kube-proxy process has established connections to http://127.0.0.1:8080 (master address) and a number of open listen sockets.
however strace shows that the process and all of its threads are in futex, the typical picture is:
the last messages in log file are:
note that logs from other (live) kube-proxy also have same complains about iptables-restore, so it may be not the issue.
What you expected to happen:
Kubernetes should somehow detect that kube-proxy hangs and stopped processing events. I'd expect automatic issue detection and restart of kube-proxy pod
How to reproduce it (as minimally and precisely as possible):
The issue happened under very low load, overall number of pods is about 100, and about 20 services.
The text was updated successfully, but these errors were encountered: