Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix race condition when joining nodes #72030

Merged

Conversation

@ereslibre
Copy link
Member

commented Dec 13, 2018

What type of PR is this?
/kind bug

What this PR does / why we need it:
Despite we were checking for the kubelet kubeconfig file to be present, the
kubelet first writes this file and then the certificates the kubeconfig file
refers to. This represents a race condition in kubeadm in which when we confirm
that the kubelet's kubeconfig file is present we continue creating a clientset
out of it. However, the clientset creation will ensure that the certificates the
kubeconfig file refers to exist on the filesystem.

To fix this problem, not only wait for the kubelet's kubeconfig file to be
present, but also ensure that we can create a clientset ouf of it on our polling
process, while we wait for the kubelet to have performed the TLS bootstrap.

Which issue(s) this PR fixes:
Fixes kubernetes/kubeadm#1319

Does this PR introduce a user-facing change?:

Fix a race condition in which kubeadm only waits for the kubelets kubeconfig file when it has performed the TLS bootstrap, but wasn't waiting for certificates to be present in the filesystem
@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

commented Dec 13, 2018

Hi @ereslibre. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@ereslibre ereslibre force-pushed the ereslibre:fix-race-condition-on-node-join branch from e0752d5 to 043b025 Dec 13, 2018
@k8s-ci-robot k8s-ci-robot added size/XS and removed size/S labels Dec 13, 2018
@ereslibre ereslibre force-pushed the ereslibre:fix-race-condition-on-node-join branch from 043b025 to 4f5448e Dec 13, 2018
Despite we were checking for the kubelet kubeconfig file to be present, the
kubelet first writes this file and then the certificates the kubeconfig file
refers to. This represents a race condition in kubeadm in which when we confirm
that the kubelet's kubeconfig file is present we continue creating a clientset
out of it. However, the clientset creation will ensure that the certificates the
kubeconfig file refers to exist on the filesystem.

To fix this problem, not only wait for the kubelet's kubeconfig file to be
present, but also ensure that we can create a clientset ouf of it on our polling
process, while we wait for the kubelet to have performed the TLS bootstrap.
@ereslibre ereslibre force-pushed the ereslibre:fix-race-condition-on-node-join branch from 4f5448e to 6a8a832 Dec 13, 2018
Copy link
Member

left a comment

@ereslibre Thanks! For this PR
I'm ok with improving this wait cycle in order to make it more robust.

However, I kindly ask opinion from @luxas / @timothysc as well, because I think that if there is a flake in the kubelet TLS bootstrap might be better to get it fixed in kubelet as well.
I did a little bit of research in kubelet code base, and if I got it right the kubelet.conf file is written here, but if I got this right this happen after all the certificates are written to disk...

@ereslibre

This comment has been minimized.

Copy link
Member Author

commented Dec 14, 2018

@fabriziopandini As far as I could see from the kubelet bootstrap logic:

Between writeKubeconfigFromBootstrapping() and the last step is the window in which the kubeconfig is written to disk, but is referencing authentication files that don't exist yet until they are finally written and symlinked.

This is what I understood from a first inspection of the source code, if I'm wrong please correct me.

I don't think the way the kubelet bootstraps is flake itself per-se, but how kubeadm is solely waiting for the kubelets final kubeconfig to be present, and assuming that at that very moment it exists we can continue creating a clientset out of it (when the kubelet might not yet have written the certificates to disk and created the symlink for the current). The kubelet always bootstraps just fine, but is kubeadm the one that aborts the execution when trying to create the clientset; causing the cri annotation to not be created on the new node (and potentially further operations that could be added in the future to not happen after that failure).

Let's wait for @luxas and/or @timothysc to confirm, thanks for your time @fabriziopandini!

@rosti
rosti approved these changes Dec 14, 2018
Copy link
Member

left a comment

Thanks for the proposed solution @ereslibre !
I kind of share the opinion of @fabriziopandini, that this is probably a kubelet problem, that needs to be fixed on their side. We may ping some SIG-Node folks for this.
However, if we need to fix it on our side, then I am OK with the proposed solution.
/ok-to-test

@ereslibre

This comment has been minimized.

Copy link
Member Author

commented Dec 14, 2018

For the sake of completion, apart from the error itself, here's more evidence about the times these files have been created:

vagrant@worker:~$ sudo ls -la --time-style=full-iso /etc/kubernetes/
total 20
drwxr-xr-x  3 root root 4096 2018-12-14 11:15:38.666832056 +0000 .
drwxr-xr-x 94 root root 4096 2018-12-14 11:15:53.246117528 +0000 ..
-rw-------  1 root root 1783 2018-12-14 11:15:37.394196102 +0000 bootstrap-kubelet.conf
-rw-------  1 root root 1853 2018-12-14 11:15:38.666832056 +0000 kubelet.conf
drwxr-xr-x  2 root root 4096 2018-12-14 11:15:37.394196102 +0000 pki
vagrant@worker:~$ sudo ls -la --time-style=full-iso /var/lib/kubelet/pki
total 20
drwxr-xr-x 2 root root 4096 2018-12-14 11:15:38.818908050 +0000 .
drwx------ 9 root root 4096 2018-12-14 11:15:38.806902051 +0000 ..
-rw------- 1 root root 1110 2018-12-14 11:15:38.818908050 +0000 kubelet-client-2018-12-14-11-15-38.pem
lrwxrwxrwx 1 root root   59 2018-12-14 11:15:38.818908050 +0000 kubelet-client-current.pem -> /var/lib/kubelet/pki/kubelet-client-2018-12-14-11-15-38.pem
-rw-r--r-- 1 root root 2153 2018-12-14 11:15:38.650824056 +0000 kubelet.crt
-rw------- 1 root root 1671 2018-12-14 11:15:38.650824056 +0000 kubelet.key

As you can see kubelet.conf is created before kubelet-client-2018-12-14-11-15-38.pem and its symlink kubelet-client-current.pem.

@fabriziopandini

This comment has been minimized.

Copy link
Member

commented Dec 14, 2018

@ereslibre thanks for the super detailed analysis!
/approve

I let the final lgtm to @luxas / @timothysc

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

commented Dec 14, 2018

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ereslibre, fabriziopandini

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@fabriziopandini

This comment has been minimized.

Copy link
Member

commented Dec 18, 2018

Since we are getting near to v1.13.2 I make this merge on master in order to start working in the cherry pick. I defer @timothysc / @luxas opinion to the cherry pick PR
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm label Dec 18, 2018
@k8s-ci-robot k8s-ci-robot merged commit 73b9eff into kubernetes:master Dec 18, 2018
18 checks passed
18 checks passed
cla/linuxfoundation ereslibre authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-cross Skipped
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-100-performance Job succeeded.
Details
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-e2e-gke Skipped
pull-kubernetes-e2e-kops-aws Job succeeded.
Details
pull-kubernetes-godeps Skipped
pull-kubernetes-integration Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce-big Job succeeded.
Details
pull-kubernetes-local-e2e Skipped
pull-kubernetes-local-e2e-containerized Skipped
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-typecheck Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details
tide In merge pool.
Details
@ereslibre ereslibre deleted the ereslibre:fix-race-condition-on-node-join branch Dec 18, 2018
@rosti

This comment has been minimized.

Copy link
Member

commented Dec 18, 2018

@ereslibre can you add a release note to this PR as it's needed for the cherry-pick PR too?

k8s-ci-robot added a commit that referenced this pull request Dec 28, 2018
…030-upstream-release-1.13

Automated cherry pick of #72030: Fix race condition when joining nodes
@ereslibre ereslibre referenced this pull request Jan 27, 2019
6 of 6 tasks complete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.