Enhanced kubernetes version upgrades for workload clusters #3203

JTarasovic · 2020-06-17T17:40:41Z

User Story

As an operator, I would like to be able to easily update the Kubernetes version of my workload clusters to be able to stay on top of security patches and new features.

Detailed Description

The procedure for updating the k8s version currently* is to copy the MachineTemplate for KCP, update KCP w/ new version and reference to new MachineTemplate which causes a rollout. Rinse and repeat for MachineDeployments.

https://cluster-api.sigs.k8s.io/tasks/kubeadm-control-plane.html

Ideally, I'd be able to declare my intent to upgrade the workload cluster and that would be reconciled and rolled out for me.

Anything else you would like to add:

Discussed on 17 June 2020 weekly meeting.

/kind feature

The text was updated successfully, but these errors were encountered:

fabriziopandini · 2020-06-18T07:38:57Z

This issue requires a certain degree of coordination across several components, so the first question in my mind is where to implement this logic.
I don't think this should go at cluster level, because cluster main responsibility is the cluster infrastructure, so what about assuming this should be implemented in separated extension (with it's own CRD/controller)?

vincepri · 2020-06-18T18:35:51Z

/milestone v0.4.0

We should revisit in v1alpha4 timeframe, probably needs a more detailed proposal

CecileRobertMichon · 2020-06-18T22:16:39Z

cc @rbitia

Ria, this might fit into your "cluster group" proposal?

JTarasovic · 2020-06-18T22:53:55Z

We have a relatively small (but growing) number of clusters so we're currently doing upgrades sort of manually. Conceptually, we think about our clusters in 3 streams - alpha, beta and stable - and roll out upgrades and configuration changes according to stream.

Our plan right now is to have common configuration for a stream in a CR (StreamConfig) w/ a controller. The StreamConfig controller would reconcile to ClusterConfigs based on label / annotation with its controller handling the actual cluster resource reconciliation (eg creation, k8s version upgrades, etc).¹

I don't think that it's CAPIs responsibility to implement all of that (or any) but if we can do some of the common stuff (version upgrades) here, that seems like it would be super valuable for the whole community. It also seems like the logic would be broadly applicable - copy template, update KCP, rollout, copy template, update MDs, rollout, profit².

¹Names are illustrative and not definitive. Something, something hard problems in Computer Science. ²Grossly over-simplified here for effect.

vincepri · 2020-06-18T23:03:00Z

Thanks for the extra context @JTarasovic, from everything I'm hearing here it might be worth considering some extra utilities/libraries/commands under clusterctl which could perform some variations of the concepts described above.

seh · 2020-06-23T15:36:41Z

Ideally, I'd be able to declare my intent to upgrade the workload cluster and that would be reconciled and rolled out for me.

I find that if I change the "spec.version" field in an existing KubeadmControlPlane object and apply the change, usually the controllers will upgrade my control plane, without me introducing a new (AWS)MachineTemplate. It sounds like that's not supposed to work, and yet it does—most of the time. Why is that?

JTarasovic · 2020-06-24T16:46:29Z

Does it actually change the version of the running cluster - eg kubectl get no -o wide shows the new version?

It did not in our experience. It would roll the control plane instances but they'd still be on the previous version.

CecileRobertMichon · 2020-06-24T16:53:33Z

This is how upgrading k8s version on control planes works currently: https://cluster-api.sigs.k8s.io/tasks/kubeadm-control-plane.html?highlight=rolling#how-to-upgrade-the-kubernetes-control-plane-version

Note that you might need to update the image as well if you are specifying the image to use in the machine template.

seh · 2020-06-24T19:36:20Z

Does it actually change the version of the running cluster - eg kubectl get no -o wide shows the new version?

Yes, it shows the new version there.

fejta-bot · 2020-09-22T19:56:38Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

vincepri · 2020-09-28T18:07:40Z

/remove-lifecycle stale

fejta-bot · 2020-12-27T18:25:08Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

vincepri · 2021-01-04T06:49:08Z

Any updates or actions items here?

JTarasovic · 2021-01-04T17:34:48Z

I think the clusterctl rollout issue linked above is a good first approximation but I agree w/ @detiber's comment there:

propose support in upstream Kubernetes/kubectl/kubebuilder for a sub-resource type

as that should allow folks to build controllers on top of it.

I'm cool with closing this issue in favor of that.

fejta-bot · 2021-02-03T18:16:13Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

CecileRobertMichon · 2021-02-10T00:16:31Z

I think the clusterctl rollout feature doesn't solve the problem of having to update the image + k8s version for every machine deployment / machine pool / kubeadm control plane that you want to upgrade as a user, although it does give more control on the rollout of machines. It would still be nice to have some sort of higher order "upgrade my cluster" automation. @craiglpeters @devigned and I were discussing this earlier today and one thing that came up was maybe having a way to tell your management cluster which image to use for which k8s version and having the machine template look that up instead of having to individually update the image version on each cluster. This would also allow patching images across all your clusters if you have to rebuild an image for the same k8s version (eg. because of a CVE).

CecileRobertMichon · 2021-02-10T00:16:50Z

/remove-lifecycle rotten

fiunchinho · 2021-03-09T14:25:40Z

We have a relatively small (but growing) number of clusters so we're currently doing upgrades sort of manually. Conceptually, we think about our clusters in 3 streams - alpha, beta and stable - and roll out upgrades and configuration changes according to stream.

Our plan right now is to have common configuration for a stream in a CR (StreamConfig) w/ a controller. The StreamConfig controller would reconcile to ClusterConfigs based on label / annotation with its controller handling the actual cluster resource reconciliation (eg creation, k8s version upgrades, etc).

I don't think that it's CAPIs responsibility to implement all of that (or any) but if we can do some of the common stuff (version upgrades) here, that seems like it would be super valuable for the whole community. It also seems like the logic would be broadly applicable - copy template, update KCP, rollout, copy template, update MDs, rollout, profit.

We are in a really similar situation with a large number of clusters and three different pipelines/streams for development/staging/production clusters. We are starting the development of a new component to handle this in a similar fashion (copy template, update KCP, update MachinePool, etc), so it'd be great if we could share tooling. We were also interested in making this component capable of orchestrating this upgrade process so we could, for instance, decide to upgrade node pools one after the other, with some wait period in between, instead of all at once.

If I understand it correctly, this proposal adds kubectl rollout like subcommands to clusterctl but this wouldn't solve the use cases discussed above.

Should we submit a new CAEP proposal for discussion?

enxebre · 2021-04-08T11:26:43Z

Same use case here, looping over machine compute scalable resources e.g machineDeployments to upgrade them one by one against the current control plane version.

For scenarios where more control is required it'd be possibly good to have autoUpgrade: false/true control per machine scalable resource. So you can leveraged are more controlled upgrade for a given machine pool e.g #4346

smcaine · 2021-04-14T08:55:07Z

we have similar use case, we are using gitops + capi, to upgrade our clusters, for now we have to create new machinetemplate, update kcp, wait for that to finish delete old template, create new machinetemplate for machinedeployment, wait for rollout, delete old machinetemplate.. an operator or additional feature/resource that could handle this lifecycle as a whole (declaritively) would be ideal for us, so we can upgrade the KCP and machinedeployments machinetemplate references at same time and let the cluster reconcile and upgrade the controlplane and workers in correct order, then purge unwanted machinetemplates

enxebre · 2021-04-21T08:43:03Z

This relates to the cluster Class discussion #4430.
This will require a considerable amount of work and thinking to get it right. @vincepri is this work still intended to make it to v1alpha4 or can we move it to next milestone?

/area upgrades

fejta-bot · 2021-07-20T09:33:43Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

fabriziopandini · 2021-07-22T09:23:16Z

What about closing this given the ClusterClass work?

sbueringer · 2021-07-22T13:06:26Z

Agree. This will be 100% covered by what we want to do with ClusterClass.

k8s-triage-robot · 2021-08-21T13:25:41Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

fabriziopandini · 2021-08-23T08:38:08Z

/close
As per comment above this is part of ClusterClass; ongoing work in #5059

k8s-ci-robot · 2021-08-23T08:38:17Z

@fabriziopandini: Closing this issue.

In response to this:

/close
As per comment above this is part of ClusterClass; ongoing work in #5059

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Jun 17, 2020

k8s-ci-robot added this to the v0.4.0 milestone Jun 18, 2020

Arvinderpal mentioned this issue Aug 3, 2020

clusterctl rollout #3439

Open

9 tasks

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 22, 2020

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 28, 2020

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 27, 2020

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 3, 2021

k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Feb 10, 2021

CecileRobertMichon mentioned this issue Apr 7, 2021

ClusterClass and managed topologies #4430

Closed

k8s-ci-robot added the area/upgrades Issues or PRs related to upgrades label Apr 21, 2021

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 20, 2021

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Aug 21, 2021

k8s-ci-robot closed this as completed Aug 23, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhanced kubernetes version upgrades for workload clusters #3203

Enhanced kubernetes version upgrades for workload clusters #3203

JTarasovic commented Jun 17, 2020

fabriziopandini commented Jun 18, 2020

vincepri commented Jun 18, 2020

CecileRobertMichon commented Jun 18, 2020

JTarasovic commented Jun 18, 2020 •

edited

vincepri commented Jun 18, 2020

seh commented Jun 23, 2020

JTarasovic commented Jun 24, 2020

CecileRobertMichon commented Jun 24, 2020 •

edited

seh commented Jun 24, 2020

fejta-bot commented Sep 22, 2020

vincepri commented Sep 28, 2020

fejta-bot commented Dec 27, 2020

vincepri commented Jan 4, 2021

JTarasovic commented Jan 4, 2021

fejta-bot commented Feb 3, 2021

CecileRobertMichon commented Feb 10, 2021

CecileRobertMichon commented Feb 10, 2021

fiunchinho commented Mar 9, 2021 •

edited

enxebre commented Apr 8, 2021

smcaine commented Apr 14, 2021

enxebre commented Apr 21, 2021

fejta-bot commented Jul 20, 2021

fabriziopandini commented Jul 22, 2021

sbueringer commented Jul 22, 2021

k8s-triage-robot commented Aug 21, 2021

fabriziopandini commented Aug 23, 2021

k8s-ci-robot commented Aug 23, 2021

Enhanced kubernetes version upgrades for workload clusters #3203

Enhanced kubernetes version upgrades for workload clusters #3203

Comments

JTarasovic commented Jun 17, 2020

fabriziopandini commented Jun 18, 2020

vincepri commented Jun 18, 2020

CecileRobertMichon commented Jun 18, 2020

JTarasovic commented Jun 18, 2020 • edited

vincepri commented Jun 18, 2020

seh commented Jun 23, 2020

JTarasovic commented Jun 24, 2020

CecileRobertMichon commented Jun 24, 2020 • edited

seh commented Jun 24, 2020

fejta-bot commented Sep 22, 2020

vincepri commented Sep 28, 2020

fejta-bot commented Dec 27, 2020

vincepri commented Jan 4, 2021

JTarasovic commented Jan 4, 2021

fejta-bot commented Feb 3, 2021

CecileRobertMichon commented Feb 10, 2021

CecileRobertMichon commented Feb 10, 2021

fiunchinho commented Mar 9, 2021 • edited

enxebre commented Apr 8, 2021

smcaine commented Apr 14, 2021

enxebre commented Apr 21, 2021

fejta-bot commented Jul 20, 2021

fabriziopandini commented Jul 22, 2021

sbueringer commented Jul 22, 2021

k8s-triage-robot commented Aug 21, 2021

fabriziopandini commented Aug 23, 2021

k8s-ci-robot commented Aug 23, 2021

JTarasovic commented Jun 18, 2020 •

edited

CecileRobertMichon commented Jun 24, 2020 •

edited

fiunchinho commented Mar 9, 2021 •

edited