Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make kubeadm join discovery wait for a finite time #80804

Merged
merged 1 commit into from Aug 1, 2019

Conversation

@olivierlemasle
Copy link
Contributor

commented Jul 31, 2019

What type of PR is this?

/kind bug

What this PR does / why we need it:

Add a timeout to discovery in kubeadm join, when using a discovery file.

Currently the discovery logic runs indefinitely. During review of #80675, @neolit123 requested to add a timeout: #80675 (comment).

Which issue(s) this PR fixes:

Fixes kubernetes/kubeadm#1693

Special notes for your reviewer:

Timeout value is cfg.Discovery.Timeout.Duration, which is already the timeout for token-based discovery. It can be customized in JoinConfiguration and defaults to v1beta2.DefaultDiscoveryTimeout, which is 5 minutes. This should be sufficient for worker nodes waiting for control-plane nodes in case of parallel provisioning.

Does this PR introduce a user-facing change?:

"kubeadm join" fails if file-based discovery is too long, with a default timeout of 5 minutes.

/sig cluster-lifecycle
/cc fabriziopandini neolit123
/priority important-longterm

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

commented Jul 31, 2019

Hi @olivierlemasle. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@yagonobre
Copy link
Member

left a comment

Thanks @olivierlemasle
/lgtm
/ok-to-test

if err == wait.ErrWaitTimeout {
timeoutErr := errors.Errorf("Abort reading the %s ConfigMap after timeout of %v", bootstrapapi.ConfigMapClusterInfo, discoveryTimeout)
klog.V(1).Infof("[discovery] %v\n", timeoutErr)
return nil, timeoutErr

This comment has been minimized.

Copy link
@neolit123

neolit123 Jul 31, 2019

Member

should be enough to just:

return nil, errors.Errorf("Abort reading the %s ConfigMap after timeout of %v", bootstrapapi.ConfigMapClusterInfo, discoveryTimeout)

without the klog v1, otherwise for --v=1 the error will be printed twice.

@neolit123

This comment has been minimized.

Copy link
Member

commented Jul 31, 2019

@k8s-ci-robot k8s-ci-robot removed the lgtm label Jul 31, 2019

@olivierlemasle olivierlemasle force-pushed the olivierlemasle:add-join-timeout branch from 5ac4bd0 to 1289ce5 Jul 31, 2019

@olivierlemasle

This comment has been minimized.

Copy link
Contributor Author

commented Jul 31, 2019

Thanks for your review @neolit123

I fixed the failing unit tests and applied your suggestion regarding logging.

i think we should also change these polls:
https://github.com/kubernetes/kubernetes/blob/master/cmd/kubeadm/app/discovery/token/token.go#L78
https://github.com/kubernetes/kubernetes/blob/master/cmd/kubeadm/app/discovery/token/token.go#L142

I actually did see these wait.PollImmediateInfinite in token-based discovery before opening the PR; however in this case discovery is always finite, because they're wrapped in fetchKubeConfigWithTimeout, which fails after a timeout (see here). Actually, it uses the same timeout cfg.Discovery.Timeout I used in this PR.
So I suppose we can keep these wait.PollImmediateInfinite, right?

@neolit123

This comment has been minimized.

Copy link
Member

commented Jul 31, 2019

/assign @rosti

@rosti
Copy link
Member

left a comment

Thank you @olivierlemasle !
Can you squash your commits?

Ideally, the HTTPS case should have the fetch timeout into this one, but that would complicate the code. Thus, I am file with how things look ATM. Let's merge it as is.

/approve

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

commented Aug 1, 2019

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: olivierlemasle, rosti

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Make discovery wait for a finite time
Add a timeout to discovery in `kubeadm join`,
when using a discovery file.

@olivierlemasle olivierlemasle force-pushed the olivierlemasle:add-join-timeout branch from 1289ce5 to 5c61056 Aug 1, 2019

@olivierlemasle

This comment has been minimized.

Copy link
Contributor Author

commented Aug 1, 2019

@rosti, thanks, I've squashed the commits.

@olivierlemasle

This comment has been minimized.

Copy link
Contributor Author

commented Aug 1, 2019

/test pull-kubernetes-e2e-gce-100-performance

@yagonobre

This comment has been minimized.

Copy link
Member

commented Aug 1, 2019

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm label Aug 1, 2019

@k8s-ci-robot k8s-ci-robot merged commit 547617a into kubernetes:master Aug 1, 2019

23 checks passed

cla/linuxfoundation olivierlemasle authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-conformance-image-test Skipped.
pull-kubernetes-cross Skipped.
pull-kubernetes-dependencies Job succeeded.
Details
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-100-performance Job succeeded.
Details
pull-kubernetes-e2e-gce-csi-serial Skipped.
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-e2e-gce-iscsi Skipped.
pull-kubernetes-e2e-gce-iscsi-serial Skipped.
pull-kubernetes-e2e-gce-storage-slow Skipped.
pull-kubernetes-godeps Skipped.
pull-kubernetes-integration Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce-big Job succeeded.
Details
pull-kubernetes-local-e2e Skipped.
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-node-e2e-containerd Job succeeded.
Details
pull-kubernetes-typecheck Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details
pull-publishing-bot-validate Skipped.
tide In merge pool.
Details

@k8s-ci-robot k8s-ci-robot added this to the v1.16 milestone Aug 1, 2019

@olivierlemasle olivierlemasle deleted the olivierlemasle:add-join-timeout branch Aug 1, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.