kubeadm: Detect kubelet readiness and error out if the kubelet is unhealthy #51369

luxas · 2017-08-25T21:51:32Z

What this PR does / why we need it:

In order to improve the UX when the kubelet is unhealthy or stopped, or whatever, kubeadm now polls the kubelet's API after 40 and 60 seconds, and then performs an exponential backoff for a total of 155 seconds.

If the kubelet endpoint is not returning ok by then, kubeadm gives up and exits.

This will miligate at least 60% of our "[apiclient] Created API client, waiting for control plane to come up" issues in the kubeadm issue tracker 🎉, as kubeadm now informs the user what's wrong and also doesn't deadlock like before.

Demo:

lucas@THEGOPHER:~/luxas/kubernetes$ sudo ./kubeadm init --skip-preflight-checks
[kubeadm] WARNING: kubeadm is in beta, please do not use it for production clusters.
[init] Using Kubernetes version: v1.7.4
[init] Using Authorization modes: [Node RBAC]
[preflight] Skipping pre-flight checks
[kubeadm] WARNING: starting in 1.8, tokens expire after 24 hours by default (if you require a non-expiring token use --token-ttl 0)
[certificates] Generated ca certificate and key.
[certificates] Generated apiserver certificate and key.
[certificates] apiserver serving cert is signed for DNS names [thegopher kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 192.168.1.115]
[certificates] Generated apiserver-kubelet-client certificate and key.
[certificates] Generated sa key and public key.
[certificates] Generated front-proxy-ca certificate and key.
[certificates] Generated front-proxy-client certificate and key.
[certificates] Valid certificates and keys now exist in "/etc/kubernetes/pki"
[kubeconfig] Wrote KubeConfig file to disk: "admin.conf"
[kubeconfig] Wrote KubeConfig file to disk: "kubelet.conf"
[kubeconfig] Wrote KubeConfig file to disk: "controller-manager.conf"
[kubeconfig] Wrote KubeConfig file to disk: "scheduler.conf"
[controlplane] Wrote Static Pod manifest for component kube-apiserver to "/etc/kubernetes/manifests/kube-apiserver.yaml"
[controlplane] Wrote Static Pod manifest for component kube-controller-manager to "/etc/kubernetes/manifests/kube-controller-manager.yaml"
[controlplane] Wrote Static Pod manifest for component kube-scheduler to "/etc/kubernetes/manifests/kube-scheduler.yaml"
[etcd] Wrote Static Pod manifest for a local etcd instance to "/etc/kubernetes/manifests/etcd.yaml"
[init] Waiting for the kubelet to boot up the control plane as Static Pods from directory "/etc/kubernetes/manifests"
[init] This often takes around a minute; or longer if the control plane images have to be pulled.
[apiclient] All control plane components are healthy after 40.502199 seconds
[markmaster] Will mark node thegopher as master by adding a label and a taint
[markmaster] Master thegopher tainted and labelled with key/value: node-role.kubernetes.io/master=""
[bootstraptoken] Using token: 5776d5.91e7ed14f9e274df
[bootstraptoken] Configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstraptoken] Configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstraptoken] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
[uploadconfig] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[addons] Applied essential addon: kube-dns
[addons] Applied essential addon: kube-proxy

Your Kubernetes master has initialized successfully!

To start using your cluster, you need to run (as a regular user):

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  http://kubernetes.io/docs/admin/addons/

You can now join any number of machines by running the following on each node
as root:

  kubeadm join --token 5776d5.91e7ed14f9e274df 192.168.1.115:6443 --discovery-token-ca-cert-hash sha256:6f301ce8c3f5f6558090b2c3599d26d6fc94ffa3c3565ffac952f4f0c7a9b2a9

lucas@THEGOPHER:~/luxas/kubernetes$ sudo ./kubeadm reset
[preflight] Running pre-flight checks
[reset] Stopping the kubelet service
[reset] Unmounting mounted directories in "/var/lib/kubelet"
[reset] Removing kubernetes-managed containers
[reset] Deleting contents of stateful directories: [/var/lib/kubelet /etc/cni/net.d /var/lib/dockershim /var/run/kubernetes /var/lib/etcd]
[reset] Deleting contents of config directories: [/etc/kubernetes/manifests /etc/kubernetes/pki]
[reset] Deleting files: [/etc/kubernetes/admin.conf /etc/kubernetes/kubelet.conf /etc/kubernetes/controller-manager.conf /etc/kubernetes/scheduler.conf]
lucas@THEGOPHER:~/luxas/kubernetes$ sudo systemctl stop kubelet
lucas@THEGOPHER:~/luxas/kubernetes$ sudo ./kubeadm init --skip-preflight-checks
[kubeadm] WARNING: kubeadm is in beta, please do not use it for production clusters.
[init] Using Kubernetes version: v1.7.4
[init] Using Authorization modes: [Node RBAC]
[preflight] Skipping pre-flight checks
[kubeadm] WARNING: starting in 1.8, tokens expire after 24 hours by default (if you require a non-expiring token use --token-ttl 0)
[certificates] Generated ca certificate and key.
[certificates] Generated apiserver certificate and key.
[certificates] apiserver serving cert is signed for DNS names [thegopher kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 192.168.1.115]
[certificates] Generated apiserver-kubelet-client certificate and key.
[certificates] Generated sa key and public key.
[certificates] Generated front-proxy-ca certificate and key.
[certificates] Generated front-proxy-client certificate and key.
[certificates] Valid certificates and keys now exist in "/etc/kubernetes/pki"
[kubeconfig] Wrote KubeConfig file to disk: "admin.conf"
[kubeconfig] Wrote KubeConfig file to disk: "kubelet.conf"
[kubeconfig] Wrote KubeConfig file to disk: "controller-manager.conf"
[kubeconfig] Wrote KubeConfig file to disk: "scheduler.conf"
[controlplane] Wrote Static Pod manifest for component kube-apiserver to "/etc/kubernetes/manifests/kube-apiserver.yaml"
[controlplane] Wrote Static Pod manifest for component kube-controller-manager to "/etc/kubernetes/manifests/kube-controller-manager.yaml"
[controlplane] Wrote Static Pod manifest for component kube-scheduler to "/etc/kubernetes/manifests/kube-scheduler.yaml"
[etcd] Wrote Static Pod manifest for a local etcd instance to "/etc/kubernetes/manifests/etcd.yaml"
[init] Waiting for the kubelet to boot up the control plane as Static Pods from directory "/etc/kubernetes/manifests"
[init] This often takes around a minute; or longer if the control plane images have to be pulled.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp 127.0.0.1:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp 127.0.0.1:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp 127.0.0.1:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz/syncloop' failed with error: Get http://localhost:10255/healthz/syncloop: dial tcp 127.0.0.1:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz/syncloop' failed with error: Get http://localhost:10255/healthz/syncloop: dial tcp 127.0.0.1:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz/syncloop' failed with error: Get http://localhost:10255/healthz/syncloop: dial tcp 127.0.0.1:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp 127.0.0.1:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz/syncloop' failed with error: Get http://localhost:10255/healthz/syncloop: dial tcp 127.0.0.1:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp 127.0.0.1:10255: getsockopt: connection refused.

Unfortunately, an error has occurred:
	timed out waiting for the condition

This error is likely caused by that:
	- The kubelet is not running
	- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)
	- There is no internet connection; so the kubelet can't pull the following control plane images:
		- gcr.io/google_containers/kube-apiserver-amd64:v1.7.4
		- gcr.io/google_containers/kube-controller-manager-amd64:v1.7.4
		- gcr.io/google_containers/kube-scheduler-amd64:v1.7.4

You can troubleshoot this for example with the following commands if you're on a systemd-powered system:
	- 'systemctl status kubelet'
	- 'journalctl -xeu kubelet'
couldn't initialize a Kubernetes cluster

In this demo, I'm first starting kubeadm normally and everything works as usual.
In the second case, I'm explicitely stopping the kubelet so it doesn't run, and skipping preflight checks, so that kubeadm doesn't even try to exec systemctl start kubelet like it does usually.
That obviously results in a non-working system, but now kubeadm tells the user what's the problem instead of waiting forever.

Which issue this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close that issue when PR gets merged): fixes #

Fixes: kubernetes/kubeadm#377

Special notes for your reviewer:

Release note:

kubeadm: Detect kubelet readiness and error out if the kubelet is unhealthy

@kubernetes/sig-cluster-lifecycle-pr-reviews @pipejakob

cc @justinsb @kris-nova @lukemarsden as well as you wanted this feature :)

luxas · 2017-08-27T16:46:39Z

/retest

luxas · 2017-08-27T18:19:44Z

/retest

mattmoyer

I'm really happy to have better diagnostic output here. It should be a big help to admins when things go wrong.

I'm slightly concerned that 155 seconds will not be enough. It seems hard to pick a safe timeout for this because it needs to cover a relatively large download over the user's internet connection which might be almost arbitrarily slow. Maybe we could keep trying forever, but start printing helpful warning/error messages after a certain amount of time? We might also just print out a message saying that kubeadm init has given up, but the kubelet might still get things up and running if you are able to restart or otherwise fix it.

My other style concern is that the number of timeout/retry constants here is kind of complex because of the way TryRunCommand and waitForAPIAndKubelet are nested. Perhaps there's a way we could refactor so there's a single select that blocks until the first failure condition is met? This is not a blocker.

The failing test looks like #51429.

mattmoyer · 2017-08-27T21:18:07Z

cmd/kubeadm/app/cmd/init.go

+		Unfortunately, an error has occurred:
+			{{ .Error }}
+
+		This error is likely caused by that:


Nit: maybe This error is likely caused by one of these problems:?

mattmoyer · 2017-08-27T21:24:35Z

cmd/kubeadm/app/cmd/init.go

+
+	go func(errC chan error, waiter apiclient.Waiter) {
+		// This goroutine can only make kubeadm init fail. If this check succeeds, it won't do anything special
+		if err := waiter.WaitForHealthyKubelet(40*time.Second, "http://localhost:10255/healthz"); err != nil {


Is it okay to assume the port number here? Isn't it configurable?

I think it's ok to assume that; it's the standard for the kubelet readonly port. If you change that manually you will way larger problems :)

luxas · 2017-09-01T14:42:11Z

I'm slightly concerned that 155 seconds will not be enough. It seems hard to pick a safe timeout for this because it needs to cover a relatively large download over the user's internet connection which might be almost arbitrarily slow

That timeout is currently 30 minutes; can increase if you think that'd be useful.

My other style concern is that the number of timeout/retry constants here is kind of complex because of the way TryRunCommand and waitForAPIAndKubelet are nested. Perhaps there's a way we could refactor so there's a single select that blocks until the first failure condition is met? This is not a blocker.

They aren't nested. TryRunCommand is used for checking the kubelet's health and the polling is used in WaitForAPI waits for the API server to become available in a second goroutine.

Maybe we could keep trying forever

That is what we explicitely want to avoid in prod scenarios.

Note that there are three different goroutines:

WaitForAPI waits for the API server to become running.
- This requires the kubelet to come up healthy, the images to be pullable and the API server to start correctly
- Timeouts after 30 mins
TryRunCommand that polls /healthz -- which is kubelet liveness
- Starts 40 secs after WaitForAPI (as the kubelet should have come up successfully in >90% of the cases there, we don't want to start too early)
- Timeouts after 155 secs
TryRunCommand that polls /healthz/syncloop -- which is kubelet readiness AFAIU
- Starts 60 secs after WaitForAPI (as the kubelet should have come up successfully in >90% of the cases there, we don't want to start too early)
- Timeouts after 155 secs

And there is a single wait block; the first goroutine to return something will be the return value used by the func. This means that if the /healthz check still isn't working after 195 secs, an error will be returned and kubeadm init will fail.

luxas · 2017-09-01T14:42:20Z

/retest

luxas · 2017-09-01T14:42:41Z

/retest

mattmoyer · 2017-09-01T20:57:35Z

/lgtm

We can always make the timeout configurable later if it is an issue in real environments. This change should be a strict improvement over the current behavior, so I don't want to block merging over that.

k8s-github-robot · 2017-09-01T20:57:58Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: luxas, mattmoyer

Associated issue: 377

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

~~cmd/kubeadm/OWNERS~~ [luxas]

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

luxas · 2017-09-01T22:11:35Z

We can always make the timeout configurable later if it is an issue in real environments

Note that this 30min timeout already exists at HEAD; it isn't added or changed in this PR

/retest

fejta-bot · 2017-09-02T02:00:51Z

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to @fejta).

Review the full test history for this PR.

fejta-bot · 2017-09-02T10:45:51Z

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to @fejta).

Review the full test history for this PR.

fejta-bot · 2017-09-02T19:30:51Z

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to @fejta).

Review the full test history for this PR.

luxas · 2017-09-02T20:53:42Z

/retest

fejta-bot · 2017-09-03T05:39:51Z

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to @fejta).

Review the full test history for this PR.

fejta-bot · 2017-09-03T09:52:07Z

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to @fejta).

Review the full test history for this PR.

…ealthy

luxas · 2017-09-03T16:06:16Z

Rebased, re-applying LGTM

luxas · 2017-09-03T16:06:31Z

/retest

fejta-bot · 2017-09-03T21:45:20Z

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to @fejta).

Review the full test history for this PR.

k8s-ci-robot · 2017-09-03T22:27:20Z

@luxas: The following test failed, say /retest to rerun them all:

Test name	Commit	Details	Rerun command
pull-kubernetes-e2e-kops-aws	`92c5997`	link	`/test pull-kubernetes-e2e-kops-aws`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

k8s-github-robot · 2017-09-03T22:54:17Z

Automatic merge from submit-queue (batch tested with PRs 51682, 51546, 51369, 50924, 51827)

ravisantoshgudimetla · 2017-10-28T05:37:09Z

cmd/kubeadm/app/cmd/init.go

+	fmt.Printf("[init] Waiting for the kubelet to boot up the control plane as Static Pods from directory %q\n", kubeadmconstants.GetStaticPodDirectory())
+	fmt.Println("[init] This often takes around a minute; or longer if the control plane images have to be pulled.")
+
+	go func(errC chan error, waiter apiclient.Waiter) {


Shouldn't this be based on --skip-preflight-checks, if this flag is set, we shouldn't check the status of kubelet, if not it makes sense to check the status.

It's a hard line to balance.. I'd prefer to have this as-is although there is a small change it will timeout before all the images have pulled.

If you want to discuss this, please open a new issue in kubernetes/kubeadm though

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Aug 25, 2017

luxas assigned timothysc and pipejakob Aug 25, 2017

luxas added this to the v1.8 milestone Aug 25, 2017

k8s-github-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. release-note Denotes a PR that will be considered when it comes time to generate release notes. labels Aug 25, 2017

luxas assigned mattmoyer Aug 27, 2017

mattmoyer reviewed Aug 27, 2017

View reviewed changes

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 1, 2017

luxas force-pushed the kubeadm_poll_kubelet branch from 757e2e1 to 6853df7 Compare September 3, 2017 13:59

k8s-github-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 3, 2017

kubeadm: Detect kubelet readiness and error out if the kubelet is unh…

92c5997

…ealthy

luxas force-pushed the kubeadm_poll_kubelet branch from 6853df7 to 92c5997 Compare September 3, 2017 15:03

luxas added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 3, 2017

k8s-github-robot merged commit e528a6e into kubernetes:master Sep 3, 2017

multimac mentioned this pull request Oct 13, 2017

Kubeadm times out before kubelet has finished downloading #53853

Closed

ravisantoshgudimetla reviewed Oct 28, 2017

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kubeadm: Detect kubelet readiness and error out if the kubelet is unhealthy #51369

kubeadm: Detect kubelet readiness and error out if the kubelet is unhealthy #51369

luxas commented Aug 25, 2017

luxas commented Aug 27, 2017

luxas commented Aug 27, 2017

mattmoyer left a comment

mattmoyer Aug 27, 2017

mattmoyer Aug 27, 2017

luxas Sep 1, 2017

luxas commented Sep 1, 2017

luxas commented Sep 1, 2017

luxas commented Sep 1, 2017

mattmoyer commented Sep 1, 2017

k8s-github-robot commented Sep 1, 2017

luxas commented Sep 1, 2017

fejta-bot commented Sep 2, 2017

fejta-bot commented Sep 2, 2017

fejta-bot commented Sep 2, 2017

luxas commented Sep 2, 2017

fejta-bot commented Sep 3, 2017

fejta-bot commented Sep 3, 2017

luxas commented Sep 3, 2017

luxas commented Sep 3, 2017

fejta-bot commented Sep 3, 2017

k8s-ci-robot commented Sep 3, 2017

k8s-github-robot commented Sep 3, 2017

ravisantoshgudimetla Oct 28, 2017

luxas Oct 28, 2017

luxas Oct 28, 2017

kubeadm: Detect kubelet readiness and error out if the kubelet is unhealthy #51369

kubeadm: Detect kubelet readiness and error out if the kubelet is unhealthy #51369

Conversation

luxas commented Aug 25, 2017

luxas commented Aug 27, 2017

luxas commented Aug 27, 2017

mattmoyer left a comment

Choose a reason for hiding this comment

mattmoyer Aug 27, 2017

Choose a reason for hiding this comment

mattmoyer Aug 27, 2017

Choose a reason for hiding this comment

luxas Sep 1, 2017

Choose a reason for hiding this comment

luxas commented Sep 1, 2017

luxas commented Sep 1, 2017

luxas commented Sep 1, 2017

mattmoyer commented Sep 1, 2017

k8s-github-robot commented Sep 1, 2017

luxas commented Sep 1, 2017

fejta-bot commented Sep 2, 2017

fejta-bot commented Sep 2, 2017

fejta-bot commented Sep 2, 2017

luxas commented Sep 2, 2017

fejta-bot commented Sep 3, 2017

fejta-bot commented Sep 3, 2017

luxas commented Sep 3, 2017

luxas commented Sep 3, 2017

fejta-bot commented Sep 3, 2017

k8s-ci-robot commented Sep 3, 2017

k8s-github-robot commented Sep 3, 2017

ravisantoshgudimetla Oct 28, 2017

Choose a reason for hiding this comment

luxas Oct 28, 2017

Choose a reason for hiding this comment

luxas Oct 28, 2017

Choose a reason for hiding this comment