-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
/opt/kubernetes/helpers/docker-healthcheck low timeout can cause DEADLOCK of the node #5434
Comments
Workaround (switch off dockerd healthcheck) apply for kops instancegroups to avoid "timeout node deadlocks" , it works on small AWS instances pretty well:
... |
In case of increased I/O load, the 10sec timeout is not enough on small / heavily loaded systems thus I propose the 60sec. The kubelet timeout is 2m (120sec) by default to detect health problems. Secondly, the docker restart can load heavily the host OS even huge systems because of many pods initialization at the same time. Continuous dockerd restart loop - a deadlock of node - is observed. Thirdly, because of the forcibly closed sockets and the kernel TCP TIME_WAIT value, the TCP sockets are not usable immediately with a "restart", wait for FIN_TIMEOUT is necessary before start services. Workaround kubernetes#1 for: kubernetes#5434
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
This has been fixed and will be released in Kops 1.11.0 /close |
@rifelpet: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
High I/O or SWAP file usage of memory intesive applications deployed to an AWS instance can cause it's dead and can cause deadlock of the node if all of them restarted by helathcheck - dockerd restart - at the same time. Typical usage is multiple JVM / Java based applications running on the node.
We faced this issue several times.
Use of swap file can help in case of spikes of JVM memory handling and helps to calibare the use of many low CPU but memory intensive appls like SpringBoot.
I make it reproducible and symulate it below using
stress
command.Thanks for submitting an issue! Please fill in as much of the template below as
you can.
The dockerd restart causing many problems:
kube-proxy
, BUT it cannot attach immediately to services because of the kerne TCP bind fails (sockets already in use):Results:
So the K8s services remain uaccessibale from/ on the node.
Many pods have no external access to network because of calico pod restart.
The pod services restart cause high I/O load, cause healthckeck failing, cause deadlock loop above.
Only the node restart helps.
------------- BUG REPORT TEMPLATE --------------------
kops
version are you running? The commandkops version
, will displaythis information.
ubuntu@kubernetes-test-bastion:~$ kops version Version 1.9.1 (git-ba77c9ca2)
kubectl version
will print theversion if a cluster is running or provide the Kubernetes version specified as
a
kops
flag.AWS
Prerequisities:
Runnig clsuer, you can access the node.
Enabled swap in kops deployment according https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/:
Instancegroup configuration in kops YAML:
I have a running cluster, deploy a simple pod e.g. a simple webserver.
Attach to a running pod running on the node.
Install
stress
command into pod:apt-get update && apt-get install stress
Force it to run out to swap with command
stress
:On the node, the
/opt/kubernetes/helpers/docker-healthcheck
failse because oftimeout 10
but the node and the dockerd itself is healthy. Thekubelet
default timeout is 2mins, the 10 sec healthcheck is a bit overkill.Increase healthcheck time to 1m from 10 sec
kops get --name my.example.com -o yaml
to display your cluster manifest.You may want to remove your cluster name and other sensitive information.
Please run the commands with most verbose logging by adding the
-v 10
flag.Paste the logs into this report, or in a gist and provide the gist link here.
Anything else do we need to know?
------------- FEATURE REQUEST TEMPLATE --------------------
Increase healthcheck timeout to 1m
The text was updated successfully, but these errors were encountered: