Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clusterctl rollout #3439

Open
6 of 9 tasks
Arvinderpal opened this issue Aug 3, 2020 · 27 comments
Open
6 of 9 tasks

clusterctl rollout #3439

Arvinderpal opened this issue Aug 3, 2020 · 27 comments
Labels
area/clusterctl Issues or PRs related to clusterctl kind/feature Categorizes issue or PR as related to a new feature. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. priority/backlog Higher priority than priority/awaiting-more-evidence.

Comments

@Arvinderpal
Copy link
Contributor

Arvinderpal commented Aug 3, 2020

As an operator I would like a convenient and consistent mechanism through which I can rollout updates to my control-plane and worker nodes.

As an operator I would like to inspect a rollout as it occurs, rollback changes if needed and view the rollout history.

Detailed Description

Motivated by kubectl rollout.

The idea is to create a new clusterctl sub-command: clusterctl rollout.

Issue/PR Tracker:

Related:
Issue #3401
Issue #3203

/kind feature

@k8s-ci-robot k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Aug 3, 2020
@vincepri
Copy link
Member

vincepri commented Aug 3, 2020

+1 this feature makes sense, we might need a small RFE/proposal

@Arvinderpal
Copy link
Contributor Author

Common usage patterns may include:

  1. Immediate Rollouts:
clusterctl rollout machinedeployment/my-cluster-md-0
clusterctl rollout kubeadmcontrolplane/my-cluster-control-plane
  1. Rollout based on specific infra machine template. For example, modify the existing MachineDeployment to reference the new infra (e.g. docker) machine template resource. It's assumed that the user has created the my-cluster-md-0-rev-1 beforehand:
clusterctl rollout machinedeployment/my-cluster-md-0 --template dockermachinetemplate/my-cluster-md-0-rev-1
  1. Monitor status:
clusterctl rollout status machinedeployment/my-cluster-md-0
clusterctl rollout status kubeadmcontrolplane/my-cluster-control-plane
  1. Rollback to the previous deployment or a specific revision:
clusterctl rollout undo machinedeployment/my-cluster-md-0
clusterctl  rollout undo machinedeployment/my-cluster-md-0 --to-revision=2         
  1. History:
clusterctl rollout history machinedeployment/my-cluster-md-0

@Arvinderpal
Copy link
Contributor Author

+1 this feature makes sense, we might need a small RFE/proposal

More than happy to put together a proposal and a POC if we agree that this is the right way to go about this.

@vincepri
Copy link
Member

vincepri commented Aug 3, 2020

cc @wfernandes @fabriziopandini

/milestone v0.4.0

@k8s-ci-robot k8s-ci-robot added this to the v0.4.0 milestone Aug 3, 2020
@detiber
Copy link
Member

detiber commented Aug 3, 2020

+1 from me to the high level approach for a near term solution to the problem. It might make sense to also propose support in upstream Kubernetes/kubectl/kubebuilder for a sub-resource type interface so that we could eventually have direct support in kubectl similar to the way we have with the scale subresource today.

@fabriziopandini
Copy link
Member

I'm ok with the proposal but I agree with @detiber that the long term solution is to make this to work in kubectl

@Arvinderpal
Copy link
Contributor Author

I added a link to the proposal. PTAL

@Arvinderpal
Copy link
Contributor Author

I'm going to start implementing a PoC -- focusing just on MachineDeployments for now.
I wanted to ask, if people are okay with having a top level command like clusterctl rolloutor would you prefer something else like (i) clusterctl experimental rollout (ii) clusterctl workload-cluster rollout (iii) ...?

@fabriziopandini @wfernandes,

@vincepri
Copy link
Member

clusterctl alpha <>? So we can follow the alpha phases we have in other tools

@fabriziopandini
Copy link
Member

/area clusterctl

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 4, 2021
@fabriziopandini
Copy link
Member

/lifecycle frozen

@k8s-ci-robot k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jul 5, 2021
@Arvinderpal
Copy link
Contributor Author

The remaining MD commands -- status and history -- depend on conditions in MD. Here is the tracker for that: #3486

@vincepri
Copy link
Member

/milestone v1.0

@k8s-ci-robot k8s-ci-robot modified the milestones: v0.4, v1.0 Oct 19, 2021
@chrischdi
Copy link
Member

Just because I did some research: some context when considering an implementation of clusterctl alpha rollout undo for KCP: if it gets implemented it should take care to not allow downgrades of ControlPlane nodes which could break a cluster.

  • We should always take https://kubernetes.io/releases/version-skew-policy/ into account which more or less means from a ControlPlane perspective that no MachineDeployments or MachinePools of the cluster should run a kubelet in a minor version which is newer than the ControlPlane kubernetes version to downgrade to.
  • Also from an etcd perspective: downgrades of minor versions of etcd are not allowed according 3.3 -> 3.4 and 3.4 -> 3.5 upgrade docs, once a cluster was fully upgraded to a specific etcd minor version.

Some more context from upstream discussions about downgrades are available at:

(which got closed due to rotten, not resolved).

@fabriziopandini fabriziopandini added the triage/accepted Indicates an issue or PR is ready to be actively worked on. label Jul 29, 2022
@fabriziopandini fabriziopandini removed this from the v1.2 milestone Jul 29, 2022
@fabriziopandini fabriziopandini removed the triage/accepted Indicates an issue or PR is ready to be actively worked on. label Jul 29, 2022
@fabriziopandini
Copy link
Member

/triage accepted

@k8s-ci-robot k8s-ci-robot added the triage/accepted Indicates an issue or PR is ready to be actively worked on. label Aug 5, 2022
@hiromu-a5a
Copy link
Contributor

/assign

@killianmuldoon
Copy link
Contributor

@hiromu-a5a Good to see somebody picking up this work! I just wanted to mention that some parts of this - if they involve changes to the MachineDeployment controller - might overlap with work ongoing in #7730

I think it might be a good idea to sync on those parts of the work to ensure stability on main (and have fewer rebases 😄 ).

Thanks again for picking this up though! I think the pieces that impact clusterctl (like #7988) should have no / few clashes with the MD work.

@hiromu-a5a
Copy link
Contributor

hiromu-a5a commented Feb 15, 2023

While I tried the existing rollout undo command, I felt that the rollout might violate the version skew policy easily and accidentally. I'd like to suggest emitting a warning if the operation of a user breaks the version skew policy. What do you think?
If you agree, I'll make another issue.

@fabriziopandini
Copy link
Member

I'm +1 to open a discussion on how to prevent undo operations that can lead to issue

@hiromu-a5a
Copy link
Contributor

Posted discussion.

@hiromu-a5a
Copy link
Contributor

I couldn't find any responses to #8170.
Please let me know if there is any appropriate forum for discussion.
(If you meant something different, such as opening a discussion in the office hour, I am sorry)

@fabriziopandini
Copy link
Member

@hiromu-a5a i'm not sure to understand why this topic is not gaining traction after callouts to office hours.
My only assumption is that really few users are relying on this feature, and this somehow matches with the fact that no one reported other existing issues we found while working on label propagation (on top of my mind: history was not tracking in-place changes, clusterctl rollout was not considering all the versions an MS might have, probably more)

What I can suggest at this stage is to continue to collect ideas on this issue or to take the initiative in defining what should be improved in this feature and how.

@hiromu-a5a
Copy link
Contributor

Thank you for your FB.

To take the initiative, I've opened issue for now. I think this should be discussed in a separated issue rather than a sub topic of this issue.
#8408

@k8s-triage-robot
Copy link

This issue has not been updated in over 1 year, and should be re-triaged.

You can:

  • Confirm that this issue is still relevant with /triage accepted (org members only)
  • Close this issue with /close

For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/

/remove-triage accepted

@k8s-ci-robot k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. and removed triage/accepted Indicates an issue or PR is ready to be actively worked on. labels Mar 27, 2024
@fabriziopandini
Copy link
Member

/priority backlog

@k8s-ci-robot k8s-ci-robot added the priority/backlog Higher priority than priority/awaiting-more-evidence. label Apr 11, 2024
@fabriziopandini
Copy link
Member

/unassign @hiromu-a5a

My personal understanding about this feature is that it is becoming less and less relevant considering git ops, cluster class, lack of request/queries/feedback from the community etc.

Considering that, the fact that we never completed this feature, and we have pending issues, we have maintenace costs related to it, I think that as a project we should ask ourself it it the case to deprecate and remove it.

/remove-lifecycle frozen

@k8s-ci-robot k8s-ci-robot removed the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Apr 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/clusterctl Issues or PRs related to clusterctl kind/feature Categorizes issue or PR as related to a new feature. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. priority/backlog Higher priority than priority/awaiting-more-evidence.
Projects
None yet
Development

No branches or pull requests

10 participants