Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix race condition when joining nodes #72030

Merged
merged 1 commit into from
Dec 18, 2018
Merged

Fix race condition when joining nodes #72030

merged 1 commit into from
Dec 18, 2018

Conversation

ereslibre
Copy link
Contributor

@ereslibre ereslibre commented Dec 13, 2018

What type of PR is this?
/kind bug

What this PR does / why we need it:
Despite we were checking for the kubelet kubeconfig file to be present, the
kubelet first writes this file and then the certificates the kubeconfig file
refers to. This represents a race condition in kubeadm in which when we confirm
that the kubelet's kubeconfig file is present we continue creating a clientset
out of it. However, the clientset creation will ensure that the certificates the
kubeconfig file refers to exist on the filesystem.

To fix this problem, not only wait for the kubelet's kubeconfig file to be
present, but also ensure that we can create a clientset ouf of it on our polling
process, while we wait for the kubelet to have performed the TLS bootstrap.

Which issue(s) this PR fixes:
Fixes kubernetes/kubeadm#1319

Does this PR introduce a user-facing change?:

Fix a race condition in which kubeadm only waits for the kubelets kubeconfig file when it has performed the TLS bootstrap, but wasn't waiting for certificates to be present in the filesystem

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. kind/bug Categorizes issue or PR as related to a bug. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Dec 13, 2018
@k8s-ci-robot
Copy link
Contributor

Hi @ereslibre. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. area/kubeadm sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Dec 13, 2018
@k8s-ci-robot k8s-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Dec 13, 2018
Despite we were checking for the kubelet kubeconfig file to be present, the
kubelet first writes this file and then the certificates the kubeconfig file
refers to. This represents a race condition in kubeadm in which when we confirm
that the kubelet's kubeconfig file is present we continue creating a clientset
out of it. However, the clientset creation will ensure that the certificates the
kubeconfig file refers to exist on the filesystem.

To fix this problem, not only wait for the kubelet's kubeconfig file to be
present, but also ensure that we can create a clientset ouf of it on our polling
process, while we wait for the kubelet to have performed the TLS bootstrap.
Copy link
Member

@fabriziopandini fabriziopandini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ereslibre Thanks! For this PR
I'm ok with improving this wait cycle in order to make it more robust.

However, I kindly ask opinion from @luxas / @timothysc as well, because I think that if there is a flake in the kubelet TLS bootstrap might be better to get it fixed in kubelet as well.
I did a little bit of research in kubelet code base, and if I got it right the kubelet.conf file is written here, but if I got this right this happen after all the certificates are written to disk...

@ereslibre
Copy link
Contributor Author

ereslibre commented Dec 14, 2018

@fabriziopandini As far as I could see from the kubelet bootstrap logic:

Between writeKubeconfigFromBootstrapping() and the last step is the window in which the kubeconfig is written to disk, but is referencing authentication files that don't exist yet until they are finally written and symlinked.

This is what I understood from a first inspection of the source code, if I'm wrong please correct me.

I don't think the way the kubelet bootstraps is flake itself per-se, but how kubeadm is solely waiting for the kubelets final kubeconfig to be present, and assuming that at that very moment it exists we can continue creating a clientset out of it (when the kubelet might not yet have written the certificates to disk and created the symlink for the current). The kubelet always bootstraps just fine, but is kubeadm the one that aborts the execution when trying to create the clientset; causing the cri annotation to not be created on the new node (and potentially further operations that could be added in the future to not happen after that failure).

Let's wait for @luxas and/or @timothysc to confirm, thanks for your time @fabriziopandini!

Copy link
Contributor

@rosti rosti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the proposed solution @ereslibre !
I kind of share the opinion of @fabriziopandini, that this is probably a kubelet problem, that needs to be fixed on their side. We may ping some SIG-Node folks for this.
However, if we need to fix it on our side, then I am OK with the proposed solution.
/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Dec 14, 2018
@ereslibre
Copy link
Contributor Author

For the sake of completion, apart from the error itself, here's more evidence about the times these files have been created:

vagrant@worker:~$ sudo ls -la --time-style=full-iso /etc/kubernetes/
total 20
drwxr-xr-x  3 root root 4096 2018-12-14 11:15:38.666832056 +0000 .
drwxr-xr-x 94 root root 4096 2018-12-14 11:15:53.246117528 +0000 ..
-rw-------  1 root root 1783 2018-12-14 11:15:37.394196102 +0000 bootstrap-kubelet.conf
-rw-------  1 root root 1853 2018-12-14 11:15:38.666832056 +0000 kubelet.conf
drwxr-xr-x  2 root root 4096 2018-12-14 11:15:37.394196102 +0000 pki
vagrant@worker:~$ sudo ls -la --time-style=full-iso /var/lib/kubelet/pki
total 20
drwxr-xr-x 2 root root 4096 2018-12-14 11:15:38.818908050 +0000 .
drwx------ 9 root root 4096 2018-12-14 11:15:38.806902051 +0000 ..
-rw------- 1 root root 1110 2018-12-14 11:15:38.818908050 +0000 kubelet-client-2018-12-14-11-15-38.pem
lrwxrwxrwx 1 root root   59 2018-12-14 11:15:38.818908050 +0000 kubelet-client-current.pem -> /var/lib/kubelet/pki/kubelet-client-2018-12-14-11-15-38.pem
-rw-r--r-- 1 root root 2153 2018-12-14 11:15:38.650824056 +0000 kubelet.crt
-rw------- 1 root root 1671 2018-12-14 11:15:38.650824056 +0000 kubelet.key

As you can see kubelet.conf is created before kubelet-client-2018-12-14-11-15-38.pem and its symlink kubelet-client-current.pem.

@fabriziopandini
Copy link
Member

@ereslibre thanks for the super detailed analysis!
/approve

I let the final lgtm to @luxas / @timothysc

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ereslibre, fabriziopandini

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 14, 2018
@fabriziopandini
Copy link
Member

Since we are getting near to v1.13.2 I make this merge on master in order to start working in the cherry pick. I defer @timothysc / @luxas opinion to the cherry pick PR
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 18, 2018
@k8s-ci-robot k8s-ci-robot merged commit 73b9eff into kubernetes:master Dec 18, 2018
@rosti
Copy link
Contributor

rosti commented Dec 18, 2018

@ereslibre can you add a release note to this PR as it's needed for the cherry-pick PR too?

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesn't merit a release note. labels Dec 18, 2018
k8s-ci-robot added a commit that referenced this pull request Dec 28, 2018
…030-upstream-release-1.13

Automated cherry pick of #72030: Fix race condition when joining nodes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubeadm cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Race condition when joining nodes
4 participants