Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update clusters according to the version skew policy #9375

Merged
merged 4 commits into from Mar 24, 2022

Conversation

xrstf
Copy link
Contributor

@xrstf xrstf commented Mar 17, 2022

What does this PR do / Why do we need it:

This PR implements a new update controller that implements version-skew-policy-aware updating for userclusters. It can handle the cluster.spec.version jumping from 1.20 to 1.23 properly.

The old update controller

KKP already had an update controller, which was responsible for watching Clusters and applying the auto-updates rules to bump the cluster's version accordingly. Likewise it would update MachineDeployments when an auto-node-update rule matches.

This controller still exists and does the exact same thing, but was renamed to auto-update-controller. Note that this controller does not care or know how to gradually roll out a version change. It could very well jump from 1.20 to 1.69 (right now it doesn't, but it could).

The new update controller

The new update-controller implements the gradual cluster update. It uses the new version information in the ClusterStatus, the version from the cluster spec and gathers version info from the ReplicaSets in the cluster namespace. Based on these 3 pieces of information it makes decisions whether or not to update and which component to update. These decisions are then persisted in the ClusterStatus, which in turn will lead to reconciliations by other controllers.

In contrast the the auto-update controller, the new update controller does not care why .Spec.Version was updated, it just focussed on making it happen.

Also importantly, all our reconciling coding (the creators that create the apiserver Deployment, the etcd StatefulSet, etc.) all simply rely on the new ClusterStatus. None of them have anything to do with update logic, they do not care why apiserver vX.Y is going to be deployed and they also do not need to dynamically query the control plane status. That makes the creators faster, easier to understand and easier to test. Creators should always be stateless.

Cluster.Spec.Version now considered harmful

Since we now keep track of the effective current versions of the control plane components in the ClusterStatus, pretty much no code should ever use .Spec.Version if it needs to know the current cluster version. .Spec.Version could be far, far in the future and might not even be reachable (because old old nodes or no update path being configured).

Instead, use Cluster.Status.Versions.ControlPlane to know what is actually running right now.

The node-version controller

A new controller in the user-cluster-controller-manager is responsible for effective "caching" node information. It watches all nodes in the user cluster and stores the oldest version in Cluster.Status.Versions.OldestNodeVersion. This is both useful for speeding up the update controller (as it doesn't need to query the nodes every time a Cluster object changes) and is a nice trigger signal for when old nodes disappear from the cluster.

Update Procedure

As per version skew policy, the new controller will always first ensure that the cluster namespace is completely healthy before doing anything. This is required to ensure that when for example etcd is updated (because the apiserver went from 1.21 to 1.22), we wait for etcd to roll out completely before progressing.

The controller then queries the cluster namespace and collects version information from the ReplicaSets and the OldestNodeVersion field. It now knows what is going on in the cluster and what is currently intended (ClusterStatus.Versions). It also knows the ultimate goal (.Spec.Version).

The current apiserver version is always mirrored in the Cluster.Status.Versions.ControlPlane, so once a new apiserver comes alive, this field is updated and in turn triggers a whooooole lot of other reconciliations by other controllers. So first Versions.Apiserver is updated, then we wait, once the apiservers are all rotated we bump Versions.ControlPlane.

Once the apiserver is updated, controller-manager and scheduler are updated. Then we wait again for them to roll out and for the cluster namespace to become healthy again.

Once all 3 control plane components are updated, the controller checks the OldestNodeVersion. If this version is more than 1 minor releases away from the current cluster version, no further updates will happen, as the nodes need to be rotated/replaced first. This is outside of the responsibility of this controller.

Debuggability

Putting the version numbers in the ClusterStatus makes debugging much, much easier. It also allows for emergency interventions, where an KKP admin might choose to manually edit the ClusterStatus. It also made writing and testing the controller pretty straightforward, as no reconciling is involved directly.

To further improve the process, the new controller emits an event every time it changes something. It also makes use of a new ClusterCondition to inform about the current state (e.g. whether or not it's waiting right now and what it is waiting for). kubectl describeing a cluster would look this this (in this example, I updated a 1.20 cluster to 1.22):

Status:
  Conditions:
    Update Progress:
      Kubermatic Version:   750cdd363e205ef0f655cb222cf58e49d3f7fbda
      Last Heartbeat Time:  2022-03-18T20:37:12Z
      Message:              No update in progress, cluster has reached its desired version.
      Reason:               UpToDate
      Status:               True
  Versions:
    Apiserver:            1.22.7
    Control Plane:        1.22.7
    Controller Manager:   1.22.7
    Oldest Node Version:  1.20.14
    Scheduler:            1.22.7
Events:
  Type     Reason                      Age                  From                        Message
  ----     ------                      ----                 ----                        -------
  Normal   ApiserverUpdated            2m6s                 kkp-update-controller       Kubernetes apiserver was updated to version 1.21.10.
  Normal   ControlPlaneVersionChanged  115s                 kkp-update-controller       Cluster control plane has reached version 1.21.10.
  Normal   ControllerManagerUpdated    115s                 kkp-update-controller       Kubernetes controller-manager was updated to version 1.21.10.
  Normal   SchedulerUpdated            115s                 kkp-update-controller       Kubernetes scheduler was updated to version 1.21.10.
  Normal   ApiserverUpdated            85s                  kkp-update-controller       Kubernetes apiserver was updated to version 1.22.7.
  Normal   ControlPlaneVersionChanged  64s                  kkp-update-controller       Cluster control plane has reached version 1.22.7.
  Normal   ControllerManagerUpdated    21s                  kkp-update-controller       Kubernetes controller-manager was updated to version 1.22.7.
  Normal   SchedulerUpdated            21s                  kkp-update-controller       Kubernetes scheduler was updated to version 1.22.7.

(Note that in the example above, the controller would refuse to update to 1.23.x, as there are 1.20 nodes in the cluster).

If one follows the individual steps of this cluster upgrade, it would look like this:

--- (none)
+++ Cluster cxl7d75cwb v2289 (2022-03-18T21:30:54+01:00) (gen. 1)
@@ -0 +1,2 @@
+spec:
+  version: 1.20.14        <- cluster was created

--- Cluster cxl7d75cwb v2289 (2022-03-18T21:30:54+01:00) (gen. 1)
+++ Cluster cxl7d75cwb v2292 (2022-03-18T21:30:54+01:00) (gen. 1)
@@ -1,2 +1,8 @@
 spec:
   version: 1.20.14
+status:                   <- initial version gets set
+  versions:
+    apiserver: 1.20.14
+    controlPlane: 1.20.14
+    controllerManager: 1.20.14
+    scheduler: 1.20.14

--- Cluster cxl7d75cwb v3676 (2022-03-18T21:32:27+01:00) (gen. 9)
+++ Cluster cxl7d75cwb v4277 (2022-03-18T21:34:53+01:00) (gen. 9)
@@ -5,4 +5,5 @@
     apiserver: 1.20.14
     controlPlane: 1.20.14
     controllerManager: 1.20.14
+    oldestNodeVersion: 1.20.14           <- the first nodes appear a few minutes later
     scheduler: 1.20.14

--- Cluster cxl7d75cwb v4277 (2022-03-18T21:34:53+01:00) (gen. 9)
+++ Cluster cxl7d75cwb v4386 (2022-03-18T21:35:20+01:00) (gen. 10)
@@ -1,5 +1,5 @@
 spec:
-  version: 1.20.14
+  version: 1.22.7                   <- I triggered the update
 status:
   versions:
     apiserver: 1.20.14

--- Cluster cxl7d75cwb v4386 (2022-03-18T21:35:20+01:00) (gen. 10)
+++ Cluster cxl7d75cwb v4387 (2022-03-18T21:35:20+01:00) (gen. 10)
@@ -2,7 +2,7 @@
   version: 1.22.7
 status:
   versions:
-    apiserver: 1.20.14
+    apiserver: 1.21.10               <- time to go to the next apiserver
     controlPlane: 1.20.14
     controllerManager: 1.20.14
     oldestNodeVersion: 1.20.14

--- Cluster cxl7d75cwb v4589 (2022-03-18T21:35:31+01:00) (gen. 10)
+++ Cluster cxl7d75cwb v4590 (2022-03-18T21:35:31+01:00) (gen. 10)
@@ -3,7 +3,7 @@
 status:
   versions:
     apiserver: 1.21.10
-    controlPlane: 1.20.14
+    controlPlane: 1.21.10               <- new apiserver is alive, no 1.20 apiservers remain
     controllerManager: 1.20.14
     oldestNodeVersion: 1.20.14
     scheduler: 1.20.14

--- Cluster cxl7d75cwb v4592 (2022-03-18T21:35:31+01:00) (gen. 10)
+++ Cluster cxl7d75cwb v4593 (2022-03-18T21:35:31+01:00) (gen. 10)
@@ -4,6 +4,6 @@
   versions:
     apiserver: 1.21.10
     controlPlane: 1.21.10
-    controllerManager: 1.20.14
+    controllerManager: 1.21.10           <- make ctlr-mgr follow
     oldestNodeVersion: 1.20.14
     scheduler: 1.20.14

--- Cluster cxl7d75cwb v4595 (2022-03-18T21:35:31+01:00) (gen. 10)
+++ Cluster cxl7d75cwb v4596 (2022-03-18T21:35:31+01:00) (gen. 10)
@@ -6,4 +6,4 @@
     controlPlane: 1.21.10
     controllerManager: 1.21.10
     oldestNodeVersion: 1.20.14
-    scheduler: 1.20.14
+    scheduler: 1.21.10                   <- make scheduler follow

--- Cluster cxl7d75cwb v4872 (2022-03-18T21:36:01+01:00) (gen. 10)
+++ Cluster cxl7d75cwb v4873 (2022-03-18T21:36:01+01:00) (gen. 10)
@@ -2,7 +2,7 @@
   version: 1.22.7
 status:
   versions:
-    apiserver: 1.21.10
+    apiserver: 1.22.7              <- final update step
     controlPlane: 1.21.10
     controllerManager: 1.21.10
     oldestNodeVersion: 1.20.14

--- Cluster cxl7d75cwb v4920 (2022-03-18T21:36:02+01:00) (gen. 10)
+++ Cluster cxl7d75cwb v5147 (2022-03-18T21:36:22+01:00) (gen. 10)
@@ -3,7 +3,7 @@
 status:
   versions:
     apiserver: 1.22.7
-    controlPlane: 1.21.10
+    controlPlane: 1.22.7              <- new apiserves came online
     controllerManager: 1.21.10
     oldestNodeVersion: 1.20.14
     scheduler: 1.21.10

--- Cluster cxl7d75cwb v5527 (2022-03-18T21:37:05+01:00) (gen. 10)
+++ Cluster cxl7d75cwb v5528 (2022-03-18T21:37:05+01:00) (gen. 10)
@@ -4,6 +4,6 @@
   versions:
     apiserver: 1.22.7
     controlPlane: 1.22.7
-    controllerManager: 1.21.10
+    controllerManager: 1.22.7               <- make ctrl-mgr follow (a minute later, because since the last update, etcd was updated as well!)
     oldestNodeVersion: 1.20.14
     scheduler: 1.21.10

--- Cluster cxl7d75cwb v5528 (2022-03-18T21:37:05+01:00) (gen. 10)
+++ Cluster cxl7d75cwb v5530 (2022-03-18T21:37:05+01:00) (gen. 10)
@@ -6,4 +6,4 @@
     controlPlane: 1.22.7
     controllerManager: 1.22.7
     oldestNodeVersion: 1.20.14
-    scheduler: 1.21.10
+    scheduler: 1.22.7                  <- finally, scheduler is updated

Does this PR close any issues?:
Fixes #5949

Does this PR introduce a user-facing change?:

Kubermatic now updates clusters according to the Kubernetes version skew policy.

@kubermatic-bot kubermatic-bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. dco-signoff: yes Denotes that all commits in the pull request have the valid DCO signoff message. sig/api Denotes a PR or issue as being assigned to SIG API. sig/cluster-management Denotes a PR or issue as being assigned to SIG Cluster Management. sig/crd size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Mar 17, 2022
@xrstf xrstf force-pushed the new-update-controller branch 5 times, most recently from dcc57f5 to c4c7ff5 Compare March 18, 2022 20:21
@xrstf xrstf changed the title WIP - New update controller WIP - Update clusters according to the version skew policy Mar 18, 2022
@kubermatic-bot kubermatic-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 21, 2022
@xrstf xrstf changed the title WIP - Update clusters according to the version skew policy Update clusters according to the version skew policy Mar 21, 2022
@kubermatic-bot kubermatic-bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 21, 2022
@xrstf xrstf removed the sig/api Denotes a PR or issue as being assigned to SIG API. label Mar 21, 2022
@kron4eg kron4eg self-assigned this Mar 22, 2022
@kron4eg
Copy link
Member

kron4eg commented Mar 22, 2022

/approve

@kubermatic-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: kron4eg, xrstf

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kron4eg
Copy link
Member

kron4eg commented Mar 24, 2022

/lgtm

@kubermatic-bot kubermatic-bot added the lgtm Indicates that a PR is ready to be merged. label Mar 24, 2022
@kubermatic-bot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 28f833e6d5b569dc2fe14ec9e6f35f9c5406ce6e

Copy link
Member

@xmudrii xmudrii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@xrstf
Copy link
Contributor Author

xrstf commented Mar 24, 2022

/hold

prioritize other PRs to get merged first

@kubermatic-bot kubermatic-bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 24, 2022
@xmudrii
Copy link
Member

xmudrii commented Mar 24, 2022

/hold cancel

@kubermatic-bot kubermatic-bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 24, 2022
@xmudrii
Copy link
Member

xmudrii commented Mar 24, 2022

/retest

@kubermatic-bot kubermatic-bot merged commit 6f06b3c into kubermatic:master Mar 24, 2022
@xrstf xrstf deleted the new-update-controller branch March 24, 2022 12:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Denotes that all commits in the pull request have the valid DCO signoff message. lgtm Indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/cluster-management Denotes a PR or issue as being assigned to SIG Cluster Management. sig/crd size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cluster version skew rule violation
4 participants