Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubeadm: fix the bug that 'kubeadm upgrade' hangs in single node cluster #88434

Merged
merged 1 commit into from Feb 24, 2020

Conversation

@SataQiu
Copy link
Member

SataQiu commented Feb 23, 2020

What type of PR is this?
/kind bug

What this PR does / why we need it:
kubeadm: fix the bug that 'kubeadm upgrade' hangs in single node cluster

Which issue(s) this PR fixes:

Fixes kubernetes/kubeadm#2035

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

kubeadm: fix the bug that 'kubeadm upgrade' hangs in single node cluster

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

NONE
@SataQiu

This comment has been minimized.

Copy link
Member Author

SataQiu commented Feb 23, 2020

/test pull-kubernetes-e2e-gce-100-performance

Copy link
Member

neolit123 left a comment

/assign @rajansandeep

// If we're dry-running, we don't need to wait for the new DNS addon to become ready
if !dryRun {
nodes, err := client.CoreV1().Nodes().List(context.TODO(), metav1.ListOptions{
FieldSelector: fields.Set{"spec.unschedulable": "false"}.AsSelector().String(),

This comment has been minimized.

Copy link
@neolit123

neolit123 Feb 23, 2020

Member

i do not recall observing nodes being unschedulable while the CoreDNS addon is being upgraded during "kubeadm upgrade". is this something you have seen @SataQiu ?

This comment has been minimized.

Copy link
@SataQiu

SataQiu Feb 24, 2020

Author Member

According to this guide https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/#upgrade-the-first-control-plane-node, kubectl drain <cp-node-name> --ignore-daemonsets in step 2 will make the control plane node unschedulable.
If we are using a single node cluster, the only node will be marked as unschedulable. In this case, new DNS deployment will never be ready. That's why the program is stuck.

This comment has been minimized.

Copy link
@neolit123

neolit123 Feb 24, 2020

Member

that is true.

Copy link
Contributor

chrisohaver left a comment

lgtm

Copy link
Member

rajansandeep left a comment

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm label Feb 24, 2020
@rajansandeep

This comment has been minimized.

Copy link
Member

rajansandeep commented Feb 24, 2020

Is it worth having a single node cluster upgrade test in https://k8s-testgrid.appspot.com/sig-cluster-lifecycle-kubeadm to catch cases like these? Currently, all the tests are multi-node ones.

@neolit123

This comment has been minimized.

Copy link
Member

neolit123 commented Feb 24, 2020

Is it worth having a single node cluster upgrade test in https://k8s-testgrid.appspot.com/sig-cluster-lifecycle-kubeadm to catch cases like these? Currently, all the tests are multi-node ones.

given the minimal bandwidth that we have to monitor and update our e2e tests, i'd argue that the maintenance burden will not be justified for having single-CP upgrade tests for all branches.
possibly having only one for the master branch is manageable as a start.

but something to note here is that our current e2e tests does not drain/cordon at all:
https://github.com/kubernetes/kubeadm/blob/adeeff900fc6024f36817468bcf43a455e18e2e2/kinder/pkg/cluster/manager/actions/kubeadm-upgrade.go#L33

so possibly this is something that can be done first.

/approve

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

k8s-ci-robot commented Feb 24, 2020

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: neolit123, SataQiu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit b68f869 into kubernetes:master Feb 24, 2020
16 checks passed
16 checks passed
cla/linuxfoundation SataQiu authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-dependencies Job succeeded.
Details
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-100-performance Job succeeded.
Details
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-e2e-kind Job succeeded.
Details
pull-kubernetes-e2e-kind-ipv6 Job succeeded.
Details
pull-kubernetes-integration Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce-big Job succeeded.
Details
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-node-e2e-containerd Job succeeded.
Details
pull-kubernetes-typecheck Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details
tide In merge pool.
Details
@k8s-ci-robot k8s-ci-robot added this to the v1.18 milestone Feb 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.