Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubeadm: upgrade node for HA #78408

Merged
merged 2 commits into from May 30, 2019

Conversation

@fabriziopandini
Copy link
Contributor

commented May 27, 2019

What type of PR is this?
/kind feature

What this PR does / why we need it:
kubeadm upgrade for HA cluster is not optimal

As of today, the user has to run:

  • kubeadm upgrade apply on a first control-plane node
  • kubeadm upgrade node experimental-control-plane and kubeadm upgrade node config on secondary control-plane nodes
  • kubeadm upgrade node config on worker nodes

with this PR, the UX gets simplified

  • kubeadm upgrade apply on a first control-plane node
  • kubeadm upgrade node on all the other nodes (secondary control-plane nodes and worker nodes)

Which issue(s) this PR fixes:
Fixes #kubernetes/kubeadm#1071
Rif #kubernetes/kubeadm#1567

Special notes for your reviewer:
The first commits only move two func to phases/upgrade in order to avoid circular references.
The second commit is the actual PR

Does this PR introduce a user-facing change?:

kubeadm: a new command `kubeadm upgrade node` is introduced for upgrading nodes (both secondary control-plane nodes and worker nodes) 
The command `kubeadm upgrade node config` is now deprecated; use `kubeadm upgrade node` instead.
The command `kubeadm upgrade node experimental-control-plane` is now deprecated; use `kubeadm upgrade node` instead.

/sig cluster-lifecycle
/area kubeadm
/priority important-soon
@kubernetes/sig-cluster-lifecycle-pr-reviews

/assign @neolit123
/assign @rosti
/cc @yagonobre
/cc @ereslibre

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

commented May 27, 2019

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: fabriziopandini

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@neolit123
Copy link
Member

left a comment

thanks for this work @fabriziopandini
added minor comments.

Show resolved Hide resolved cmd/kubeadm/app/cmd/phases/upgrade/node/controlplane.go Outdated
Show resolved Hide resolved cmd/kubeadm/app/cmd/phases/upgrade/node/kubeletconfig.go Outdated
Show resolved Hide resolved cmd/kubeadm/app/cmd/phases/upgrade/node/controlplane.go Outdated
Show resolved Hide resolved cmd/kubeadm/app/cmd/upgrade/node.go Outdated
Show resolved Hide resolved cmd/kubeadm/app/cmd/upgrade/node.go Outdated
}
// binds the Runner to kubeadm upgrade node command by altering
// command help, adding --skip-phases flag and by adding phases subcommands
nodeRunner.BindToCommand(cmd)

This comment has been minimized.

Copy link
@neolit123

neolit123 May 27, 2019

Member

in an ideal world kubeadm upgrade apply should have been the UX that works all use cases:

  • primary CP
  • secondary CP
  • worker
    (as long as we can detect primary vs secondary)

This comment has been minimized.

Copy link
@rosti

rosti May 28, 2019

Member

@neolit123 that's a nice UX suggestion, but a bit hard and unreliable to implement in my opinion. The primary CP does not need to be invoked on the kubeadm init node. It just needs to be done on a CP node only once. Then it should store somewhere a result of this operation and allow for the same command to detect it on other nodes.
This may be unreliable, hence we should always have the manual interface as a safe backup.

This comment has been minimized.

Copy link
@fabriziopandini

fabriziopandini May 28, 2019

Author Contributor

I'm not sure it is a good idea, but this is something out of scope of this PR

This comment has been minimized.

Copy link
@neolit123

neolit123 May 28, 2019

Member

clearly out of scope of this PR. just saying that the apply vs node is not exactly intuitive.

This comment has been minimized.

Copy link
@ereslibre

ereslibre Jun 6, 2019

Member

@rosti, @neolit123, checking the logic for kubeadm upgrade apply I'm wondering what is preventing it from being idempotent, assuming the same version is passed to kubeadm upgrade apply on all control plane nodes.

  • PerformControlPlaneUpgrade -> PerformStaticPodUpgrade is trivial and always happens.
  • PerformPostUpgradeTasks from what I could see should be safe to re-run.

Of course is not a "supported" way of upgrading, but I fail to see what is preventing us from doing so.

In general (and if we are really planning to improve the upgrade UX), once that we have run kubeadm upgrade apply once in one control plane, we'll have upgraded the ClusterConfiguration with the new Kubernetes version, and in secondary control planes we could know from what version we are trying to update to what version (present in the kubeadm-config configmap in the cluster configuration).

Unsure if I'm missing something...

This comment has been minimized.

Copy link
@neolit123

neolit123 Jun 6, 2019

Member

once that we have run kubeadm upgrade apply once in one control plane, we'll have upgraded the ClusterConfiguration with the new Kubernetes version, and in secondary control planes we could know from what version we are trying to update to what version (present in the kubeadm-config configmap in the cluster configuration).

this is similar to what i was thinking without doing much research on this topic.

Show resolved Hide resolved cmd/kubeadm/app/cmd/upgrade/node.go Outdated
Show resolved Hide resolved cmd/kubeadm/app/cmd/upgrade/node.go
return cmd
}

// NewCmdUpgradeControlPlane returns the cobra.Command for upgrading the controlplane instance on this node

This comment has been minimized.

Copy link
@neolit123

neolit123 May 27, 2019

Member

add TODO to remove when 1.18 is released?

This comment has been minimized.

Copy link
@rosti

rosti May 28, 2019

Member

The experimental prefix of the original command implies an alpha state of this. Hence we can remove it. Although, I wouldn't mind keeping it for a cycle or two. Both options work for me.

Is this even plugged into the command line interface somewhere? NewCmdUpgradeControlPlane seems unused ATM.

This comment has been minimized.

Copy link
@fabriziopandini

fabriziopandini May 28, 2019

Author Contributor

Now it is plugged in the command line
Even if it is alpha, the approach I'm proposing is to keep this around for 1 cycle, so we are not breaking users

This comment has been minimized.

Copy link
@neolit123

neolit123 May 28, 2019

Member

+1 for keeping even if alpha.
we are GA and alpha will still break users.

This comment has been minimized.

Copy link
@fabriziopandini

fabriziopandini Jun 2, 2019

Author Contributor

Probably it is necessary to make more clear what is GA and what not

@neolit123

This comment has been minimized.

Copy link
Member

commented May 27, 2019

release note:

The command kubeadm upgrade node experimental-control-plane was removed; use kubeadm upgrade node instead.

deprecated instead of removed?
https://github.com/kubernetes/kubernetes/pull/78408/files#diff-6e2b4c55127823adea7f886e312a729dL123

@rosti
Copy link
Member

left a comment

Thanks @fabriziopandini !

// NewControlPlane creates a kubeadm workflow phase that implements handling of control-plane upgrade.
func NewControlPlane() workflow.Phase {
phase := workflow.Phase{
Name: "control-plane",

This comment has been minimized.

Copy link
@rosti

rosti May 28, 2019

Member

An alias for controlplane will be nice.

This comment has been minimized.

Copy link
@fabriziopandini

fabriziopandini May 28, 2019

Author Contributor

kubeadm init phase control-plane does not provide an alias, so I don't think we should add one here.
Eventually, this can be done in a follow up PR, but it should be done consistently across all control-plane commands

This comment has been minimized.

Copy link
@neolit123

neolit123 May 28, 2019

Member

our naming has the - between words everywhere.

Show resolved Hide resolved cmd/kubeadm/app/cmd/upgrade/node.go Outdated
return cmd
}

// NewCmdUpgradeControlPlane returns the cobra.Command for upgrading the controlplane instance on this node

This comment has been minimized.

Copy link
@rosti

rosti May 28, 2019

Member

The experimental prefix of the original command implies an alpha state of this. Hence we can remove it. Although, I wouldn't mind keeping it for a cycle or two. Both options work for me.

Is this even plugged into the command line interface somewhere? NewCmdUpgradeControlPlane seems unused ATM.

}
// binds the Runner to kubeadm upgrade node command by altering
// command help, adding --skip-phases flag and by adding phases subcommands
nodeRunner.BindToCommand(cmd)

This comment has been minimized.

Copy link
@rosti

rosti May 28, 2019

Member

@neolit123 that's a nice UX suggestion, but a bit hard and unreliable to implement in my opinion. The primary CP does not need to be invoked on the kubeadm init node. It just needs to be done on a CP node only once. Then it should store somewhere a result of this operation and allow for the same command to detect it on other nodes.
This may be unreliable, hence we should always have the manual interface as a safe backup.

// isControlPlane checks if a node is a control-plane node by looking up
// the kube-apiserver manifest file
isControlPlaneNode := true
filepath := kubeadmconstants.GetStaticPodFilepath(kubeadmconstants.KubeAPIServer, kubeadmconstants.GetStaticPodDirectory())

This comment has been minimized.

Copy link
@rosti

rosti May 28, 2019

Member

This exact check is present somewhere in reset (I think) as a separate func. Probably extracting it as an utility func somewhere is a good idea.

This comment has been minimized.

Copy link
@fabriziopandini

fabriziopandini May 28, 2019

Author Contributor

I will open an issue to track this

@fabriziopandini fabriziopandini force-pushed the fabriziopandini:upgrade-node-ha branch from 7c219fe to 2cc0288 May 28, 2019

@fabriziopandini

This comment has been minimized.

Copy link
Contributor Author

commented May 28, 2019

@rosti @neolit123 thanks for the feedback!
I've addressed almost all the comments; other answers are in the comments

@neolit123

This comment has been minimized.

Copy link
Member

commented May 28, 2019

/lgtm
/hold
feel free to unhold.

@yagonobre
Copy link
Member

left a comment

Thanks @fabriziopandini, I added some non-blocking comments.
/lgtm

return phase
}

func runControlPlane() func(c workflow.RunData) error {

This comment has been minimized.

Copy link
@yagonobre

yagonobre May 29, 2019

Member

Optionally it can be just func runControlPlane(c workflow.RunData) error


// if this is not a control-plande node, this phase should not be executed
if !data.IsControlPlaneNode() {
fmt.Printf("[upgrade] Skipping phase. Not a control plane node")

This comment has been minimized.

Copy link
@yagonobre

yagonobre May 29, 2019

Member

Should it be [control-plane] ...?

kubeletVersionStr := cfg.ClusterConfiguration.KubernetesVersion
if data.KubeletVersion() != "" && data.KubeletVersion() != kubeletVersionStr {
kubeletVersionStr = data.KubeletVersion()
fmt.Printf("[upgrade] Using kubelet config version %s, while kubernetes-version is %s\n", kubeletVersionStr, cfg.ClusterConfiguration.KubernetesVersion)

This comment has been minimized.

Copy link
@yagonobre

yagonobre May 29, 2019

Member

Should it be [kubelet-config] ...?


// printFilesIfDryRunning prints the Static Pod manifests to stdout and informs about the temporary directory to go and lookup
func printFilesIfDryRunning(dryRun bool, kubeletDir string) error {
if !dryRun {

This comment has been minimized.

Copy link
@yagonobre

yagonobre May 29, 2019

Member

Due we are calling it from if dryRun {} we can remove this if and rename to printFiles

@rosti

rosti approved these changes May 29, 2019

Copy link
Member

left a comment

Thanks @fabriziopandini !
I am happy with it, so let's get this on the merge queue ASAP. CF is rapidly approaching...
/lgtm
/hold cancel

@ereslibre

This comment has been minimized.

Copy link
Member

commented May 29, 2019

Will LGTM as soon as the conflict is resolved. Thanks @fabriziopandini!

fabriziopandini added some commits May 29, 2019

@fabriziopandini fabriziopandini force-pushed the fabriziopandini:upgrade-node-ha branch from 2cc0288 to 67d76c4 May 29, 2019

@fabriziopandini

This comment has been minimized.

Copy link
Contributor Author

commented May 29, 2019

@ereslibre
rebased (a simple hack/update-bazel.sh resolved all conflicts, no other changes) & squashed

@neolit123

This comment has been minimized.

Copy link
Member

commented May 29, 2019

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm label May 29, 2019

@ereslibre

This comment has been minimized.

Copy link
Member

commented May 29, 2019

/lgtm

@neolit123

This comment has been minimized.

Copy link
Member

commented May 29, 2019

/retest

@rosti

This comment has been minimized.

Copy link
Member

commented May 29, 2019

/test pull-kubernetes-integration

@k8s-ci-robot k8s-ci-robot merged commit de81752 into kubernetes:master May 30, 2019

21 checks passed

cla/linuxfoundation fabriziopandini authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-conformance-image-test Skipped.
pull-kubernetes-cross Skipped.
pull-kubernetes-dependencies Job succeeded.
Details
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-100-performance Job succeeded.
Details
pull-kubernetes-e2e-gce-csi-serial Skipped.
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-e2e-gce-storage-slow Skipped.
pull-kubernetes-godeps Skipped.
pull-kubernetes-integration Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce-big Job succeeded.
Details
pull-kubernetes-local-e2e Skipped.
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-node-e2e-containerd Job succeeded.
Details
pull-kubernetes-typecheck Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details
pull-publishing-bot-validate Skipped.
tide In merge pool.
Details

@fabriziopandini fabriziopandini deleted the fabriziopandini:upgrade-node-ha branch May 30, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.