New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubeadm: Add configurable control plane up timeout #70480

Merged
merged 1 commit into from Nov 5, 2018

Conversation

@rosti
Contributor

rosti commented Oct 31, 2018

What type of PR is this?
/kind feature

What this PR does / why we need it:

Until now the control plane timeout was fixed to 4 minutes and users did not have the ability to change it. This commit allows that timeout to be configured via the new timeoutForControlPlane option in the API server config (itself a member of the ClusterConfiguration).

The default timeout is still 4 minutes.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Refs kubernetes/kubeadm#911

Special notes for your reviewer:

This PR depends on:

Please, review only the last commit here.

This PR does not modify the self hosting timeout code, which is going to be deleted in another PR:

/cc @kubernetes/sig-cluster-lifecycle-pr-reviews
/area kubeadm
/assign @fabriziopandini
/assign @timothysc

Does this PR introduce a user-facing change?:

kubeadm: timeoutForControlPlane is introduced as part of the API Server config, that controls the timeout for the wait for control plane to be up. Default value is 4 minutes.
@neolit123

thanks @rosti

@@ -212,6 +212,7 @@ limitations under the License.
// certSANs:
// - "10.100.1.1"
// - "ec2-10-100-0-1.compute-1.amazonaws.com"
// timeoutForControlPlane: 3m0s

This comment has been minimized.

@neolit123

neolit123 Oct 31, 2018

Member

should we use the default here 4m?

This comment has been minimized.

@rosti

rosti Oct 31, 2018

Contributor

I don't think it's necessary.

This comment has been minimized.

@timothysc

timothysc Nov 1, 2018

Member

I agree with @neolit123, please change to 4 b/c that's the default used all over.

@timothysc

/approve

Generally looks like what we talked about, thx for the patch.

@@ -212,6 +212,7 @@ limitations under the License.
// certSANs:
// - "10.100.1.1"
// - "ec2-10-100-0-1.compute-1.amazonaws.com"
// timeoutForControlPlane: 3m0s

This comment has been minimized.

@timothysc

timothysc Nov 1, 2018

Member

I agree with @neolit123, please change to 4 b/c that's the default used all over.

@fabriziopandini

Thanks @rosti!
/approve
Waiting for dependency to merge before final lgtm

@rosti

This comment has been minimized.

Contributor

rosti commented Nov 2, 2018

Hold because of unmerged dependency PR.

/hold

kubeadm: Add configurable control plane up timeout
Until now the control plane timeout was fixed to 4 minutes and users did not
have the ability to change it. This commit allows that timeout to be configured
via the new `timeoutForControlPlane` option in the API server config (itself a
member of the ClusterConfiguration).

The default timeout is still 4 minutes.

Signed-off-by: Rostislav M. Georgiev <rostislavg@vmware.com>
@rosti

This comment has been minimized.

Contributor

rosti commented Nov 5, 2018

/hold cancel

@neolit123

This comment has been minimized.

Member

neolit123 commented Nov 5, 2018

@rosti @fabriziopandini

https://github.com/kubernetes/kubernetes/blob/master/cmd/kubeadm/app/util/apiclient/wait.go#L159
^ it runs a couple of go routines concurrently an whichever fails first returns an error. so if f() there is for instance WaitForAPI() (for init / wait-control-plane phase) the control plane timeout would be respected but the other target is always WaitForHealthyKubelet() which has a timeout of total 195 seconds. so even if the user sets DefaultControlPlaneTimeout to 10 minutes it will always fail in no more than 195 seconds.

@fabriziopandini

This comment has been minimized.

Contributor

fabriziopandini commented Nov 5, 2018

@neolit123 @rosti
In my opionin it makes sense

  • if kubelet doesn't start, fail fast after 195 sec (does not make sense wait for the control plane)
    – if kubelet starts, wait for control plane max 4 minutes
@neolit123

This comment has been minimized.

Member

neolit123 commented Nov 5, 2018

i think what has to be done here is not use exponential backoff in WaitForHealthyKubelet, but rather allow us to make the function match a timeout the user has specified - e..g. DefaultKubeletTimeout

@dixudx

dixudx approved these changes Nov 5, 2018

// TODO: NewControlPlaneWaiter should be converted to private after the self-hosting phase is removed.
waiter, err := phases.NewControlPlaneWaiter(i.dryRun, client, i.outputWriter)
timeout := i.cfg.ClusterConfiguration.APIServer.TimeoutForControlPlane.Duration

This comment has been minimized.

@dixudx

dixudx Nov 5, 2018

Member

Why declaring this var, which is used only once?

This comment has been minimized.

@rosti

rosti Nov 5, 2018

Contributor

That's because the expression to take the timeout value is huge and best viewed on a single line. Also the call to NewControlPlaneWaiter on the following line continues to fit on screen.

@rosti

This comment has been minimized.

Contributor

rosti commented Nov 5, 2018

@neolit123 I agree with you, that the waiter code can be simplified a bit (to say the least), but perhaps this is a job for another PR?

@fabriziopandini

@rosti Thanks for this PR!
/approve
/lgtm

@neolit123 @dixudx we can eventually iterate on details, but I'm moving on with this PR to get all the API changes in before code slush

@k8s-ci-robot k8s-ci-robot added the lgtm label Nov 5, 2018

@k8s-ci-robot

This comment has been minimized.

Contributor

k8s-ci-robot commented Nov 5, 2018

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: fabriziopandini, rosti, timothysc

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit b3d81e5 into kubernetes:master Nov 5, 2018

18 checks passed

cla/linuxfoundation rosti authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-cross Skipped
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-100-performance Job succeeded.
Details
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-e2e-gke Skipped
pull-kubernetes-e2e-kops-aws Job succeeded.
Details
pull-kubernetes-e2e-kubeadm-gce Skipped
pull-kubernetes-integration Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce-big Job succeeded.
Details
pull-kubernetes-local-e2e Skipped
pull-kubernetes-local-e2e-containerized Skipped
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-typecheck Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details
tide In merge pool.
Details

@rosti rosti deleted the rosti:controlplane-timeout branch Nov 5, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment