Update clusters according to the version skew policy #9375

xrstf · 2022-03-17T23:55:39Z

What does this PR do / Why do we need it:

This PR implements a new update controller that implements version-skew-policy-aware updating for userclusters. It can handle the cluster.spec.version jumping from 1.20 to 1.23 properly.

The old update controller

KKP already had an update controller, which was responsible for watching Clusters and applying the auto-updates rules to bump the cluster's version accordingly. Likewise it would update MachineDeployments when an auto-node-update rule matches.

This controller still exists and does the exact same thing, but was renamed to auto-update-controller. Note that this controller does not care or know how to gradually roll out a version change. It could very well jump from 1.20 to 1.69 (right now it doesn't, but it could).

The new update controller

The new update-controller implements the gradual cluster update. It uses the new version information in the ClusterStatus, the version from the cluster spec and gathers version info from the ReplicaSets in the cluster namespace. Based on these 3 pieces of information it makes decisions whether or not to update and which component to update. These decisions are then persisted in the ClusterStatus, which in turn will lead to reconciliations by other controllers.

In contrast the the auto-update controller, the new update controller does not care why .Spec.Version was updated, it just focussed on making it happen.

Also importantly, all our reconciling coding (the creators that create the apiserver Deployment, the etcd StatefulSet, etc.) all simply rely on the new ClusterStatus. None of them have anything to do with update logic, they do not care why apiserver vX.Y is going to be deployed and they also do not need to dynamically query the control plane status. That makes the creators faster, easier to understand and easier to test. Creators should always be stateless.

`Cluster.Spec.Version` now considered harmful

Since we now keep track of the effective current versions of the control plane components in the ClusterStatus, pretty much no code should ever use .Spec.Version if it needs to know the current cluster version. .Spec.Version could be far, far in the future and might not even be reachable (because old old nodes or no update path being configured).

Instead, use Cluster.Status.Versions.ControlPlane to know what is actually running right now.

The node-version controller

A new controller in the user-cluster-controller-manager is responsible for effective "caching" node information. It watches all nodes in the user cluster and stores the oldest version in Cluster.Status.Versions.OldestNodeVersion. This is both useful for speeding up the update controller (as it doesn't need to query the nodes every time a Cluster object changes) and is a nice trigger signal for when old nodes disappear from the cluster.

Update Procedure

As per version skew policy, the new controller will always first ensure that the cluster namespace is completely healthy before doing anything. This is required to ensure that when for example etcd is updated (because the apiserver went from 1.21 to 1.22), we wait for etcd to roll out completely before progressing.

The controller then queries the cluster namespace and collects version information from the ReplicaSets and the OldestNodeVersion field. It now knows what is going on in the cluster and what is currently intended (ClusterStatus.Versions). It also knows the ultimate goal (.Spec.Version).

The current apiserver version is always mirrored in the Cluster.Status.Versions.ControlPlane, so once a new apiserver comes alive, this field is updated and in turn triggers a whooooole lot of other reconciliations by other controllers. So first Versions.Apiserver is updated, then we wait, once the apiservers are all rotated we bump Versions.ControlPlane.

Once the apiserver is updated, controller-manager and scheduler are updated. Then we wait again for them to roll out and for the cluster namespace to become healthy again.

Once all 3 control plane components are updated, the controller checks the OldestNodeVersion. If this version is more than 1 minor releases away from the current cluster version, no further updates will happen, as the nodes need to be rotated/replaced first. This is outside of the responsibility of this controller.

Debuggability

Putting the version numbers in the ClusterStatus makes debugging much, much easier. It also allows for emergency interventions, where an KKP admin might choose to manually edit the ClusterStatus. It also made writing and testing the controller pretty straightforward, as no reconciling is involved directly.

To further improve the process, the new controller emits an event every time it changes something. It also makes use of a new ClusterCondition to inform about the current state (e.g. whether or not it's waiting right now and what it is waiting for). kubectl describeing a cluster would look this this (in this example, I updated a 1.20 cluster to 1.22):

Status:
  Conditions:
    Update Progress:
      Kubermatic Version:   750cdd363e205ef0f655cb222cf58e49d3f7fbda
      Last Heartbeat Time:  2022-03-18T20:37:12Z
      Message:              No update in progress, cluster has reached its desired version.
      Reason:               UpToDate
      Status:               True
  Versions:
    Apiserver:            1.22.7
    Control Plane:        1.22.7
    Controller Manager:   1.22.7
    Oldest Node Version:  1.20.14
    Scheduler:            1.22.7
Events:
  Type     Reason                      Age                  From                        Message
  ----     ------                      ----                 ----                        -------
  Normal   ApiserverUpdated            2m6s                 kkp-update-controller       Kubernetes apiserver was updated to version 1.21.10.
  Normal   ControlPlaneVersionChanged  115s                 kkp-update-controller       Cluster control plane has reached version 1.21.10.
  Normal   ControllerManagerUpdated    115s                 kkp-update-controller       Kubernetes controller-manager was updated to version 1.21.10.
  Normal   SchedulerUpdated            115s                 kkp-update-controller       Kubernetes scheduler was updated to version 1.21.10.
  Normal   ApiserverUpdated            85s                  kkp-update-controller       Kubernetes apiserver was updated to version 1.22.7.
  Normal   ControlPlaneVersionChanged  64s                  kkp-update-controller       Cluster control plane has reached version 1.22.7.
  Normal   ControllerManagerUpdated    21s                  kkp-update-controller       Kubernetes controller-manager was updated to version 1.22.7.
  Normal   SchedulerUpdated            21s                  kkp-update-controller       Kubernetes scheduler was updated to version 1.22.7.

(Note that in the example above, the controller would refuse to update to 1.23.x, as there are 1.20 nodes in the cluster).

If one follows the individual steps of this cluster upgrade, it would look like this:

--- (none)
+++ Cluster cxl7d75cwb v2289 (2022-03-18T21:30:54+01:00) (gen. 1)
@@ -0 +1,2 @@
+spec:
+  version: 1.20.14        <- cluster was created

--- Cluster cxl7d75cwb v2289 (2022-03-18T21:30:54+01:00) (gen. 1)
+++ Cluster cxl7d75cwb v2292 (2022-03-18T21:30:54+01:00) (gen. 1)
@@ -1,2 +1,8 @@
 spec:
   version: 1.20.14
+status:                   <- initial version gets set
+  versions:
+    apiserver: 1.20.14
+    controlPlane: 1.20.14
+    controllerManager: 1.20.14
+    scheduler: 1.20.14

--- Cluster cxl7d75cwb v3676 (2022-03-18T21:32:27+01:00) (gen. 9)
+++ Cluster cxl7d75cwb v4277 (2022-03-18T21:34:53+01:00) (gen. 9)
@@ -5,4 +5,5 @@
     apiserver: 1.20.14
     controlPlane: 1.20.14
     controllerManager: 1.20.14
+    oldestNodeVersion: 1.20.14           <- the first nodes appear a few minutes later
     scheduler: 1.20.14

--- Cluster cxl7d75cwb v4277 (2022-03-18T21:34:53+01:00) (gen. 9)
+++ Cluster cxl7d75cwb v4386 (2022-03-18T21:35:20+01:00) (gen. 10)
@@ -1,5 +1,5 @@
 spec:
-  version: 1.20.14
+  version: 1.22.7                   <- I triggered the update
 status:
   versions:
     apiserver: 1.20.14

--- Cluster cxl7d75cwb v4386 (2022-03-18T21:35:20+01:00) (gen. 10)
+++ Cluster cxl7d75cwb v4387 (2022-03-18T21:35:20+01:00) (gen. 10)
@@ -2,7 +2,7 @@
   version: 1.22.7
 status:
   versions:
-    apiserver: 1.20.14
+    apiserver: 1.21.10               <- time to go to the next apiserver
     controlPlane: 1.20.14
     controllerManager: 1.20.14
     oldestNodeVersion: 1.20.14

--- Cluster cxl7d75cwb v4589 (2022-03-18T21:35:31+01:00) (gen. 10)
+++ Cluster cxl7d75cwb v4590 (2022-03-18T21:35:31+01:00) (gen. 10)
@@ -3,7 +3,7 @@
 status:
   versions:
     apiserver: 1.21.10
-    controlPlane: 1.20.14
+    controlPlane: 1.21.10               <- new apiserver is alive, no 1.20 apiservers remain
     controllerManager: 1.20.14
     oldestNodeVersion: 1.20.14
     scheduler: 1.20.14

--- Cluster cxl7d75cwb v4592 (2022-03-18T21:35:31+01:00) (gen. 10)
+++ Cluster cxl7d75cwb v4593 (2022-03-18T21:35:31+01:00) (gen. 10)
@@ -4,6 +4,6 @@
   versions:
     apiserver: 1.21.10
     controlPlane: 1.21.10
-    controllerManager: 1.20.14
+    controllerManager: 1.21.10           <- make ctlr-mgr follow
     oldestNodeVersion: 1.20.14
     scheduler: 1.20.14

--- Cluster cxl7d75cwb v4595 (2022-03-18T21:35:31+01:00) (gen. 10)
+++ Cluster cxl7d75cwb v4596 (2022-03-18T21:35:31+01:00) (gen. 10)
@@ -6,4 +6,4 @@
     controlPlane: 1.21.10
     controllerManager: 1.21.10
     oldestNodeVersion: 1.20.14
-    scheduler: 1.20.14
+    scheduler: 1.21.10                   <- make scheduler follow

--- Cluster cxl7d75cwb v4872 (2022-03-18T21:36:01+01:00) (gen. 10)
+++ Cluster cxl7d75cwb v4873 (2022-03-18T21:36:01+01:00) (gen. 10)
@@ -2,7 +2,7 @@
   version: 1.22.7
 status:
   versions:
-    apiserver: 1.21.10
+    apiserver: 1.22.7              <- final update step
     controlPlane: 1.21.10
     controllerManager: 1.21.10
     oldestNodeVersion: 1.20.14

--- Cluster cxl7d75cwb v4920 (2022-03-18T21:36:02+01:00) (gen. 10)
+++ Cluster cxl7d75cwb v5147 (2022-03-18T21:36:22+01:00) (gen. 10)
@@ -3,7 +3,7 @@
 status:
   versions:
     apiserver: 1.22.7
-    controlPlane: 1.21.10
+    controlPlane: 1.22.7              <- new apiserves came online
     controllerManager: 1.21.10
     oldestNodeVersion: 1.20.14
     scheduler: 1.21.10

--- Cluster cxl7d75cwb v5527 (2022-03-18T21:37:05+01:00) (gen. 10)
+++ Cluster cxl7d75cwb v5528 (2022-03-18T21:37:05+01:00) (gen. 10)
@@ -4,6 +4,6 @@
   versions:
     apiserver: 1.22.7
     controlPlane: 1.22.7
-    controllerManager: 1.21.10
+    controllerManager: 1.22.7               <- make ctrl-mgr follow (a minute later, because since the last update, etcd was updated as well!)
     oldestNodeVersion: 1.20.14
     scheduler: 1.21.10

--- Cluster cxl7d75cwb v5528 (2022-03-18T21:37:05+01:00) (gen. 10)
+++ Cluster cxl7d75cwb v5530 (2022-03-18T21:37:05+01:00) (gen. 10)
@@ -6,4 +6,4 @@
     controlPlane: 1.22.7
     controllerManager: 1.22.7
     oldestNodeVersion: 1.20.14
-    scheduler: 1.21.10
+    scheduler: 1.22.7                  <- finally, scheduler is updated

Does this PR close any issues?:
Fixes #5949

Does this PR introduce a user-facing change?:

Kubermatic now updates clusters according to the Kubernetes version skew policy.

pkg/apis/kubermatic/v1/cluster.go

pkg/controller/user-cluster-controller-manager/node-version-controller/doc.go

kron4eg · 2022-03-22T15:37:29Z

/approve

kubermatic-bot · 2022-03-22T15:37:38Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: kron4eg, xrstf

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [kron4eg,xrstf]
~~charts/kubermatic-operator/OWNERS~~ [kron4eg,xrstf]
~~pkg/apis/OWNERS~~ [kron4eg,xrstf]
~~pkg/controller/OWNERS~~ [kron4eg,xrstf]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

kron4eg · 2022-03-24T09:08:52Z

/lgtm

kubermatic-bot · 2022-03-24T09:08:56Z

LGTM label has been added.

Git tree hash: 28f833e6d5b569dc2fe14ec9e6f35f9c5406ce6e

xmudrii

/lgtm
/approve

xrstf · 2022-03-24T09:47:22Z

/hold

prioritize other PRs to get merged first

xmudrii · 2022-03-24T10:42:19Z

/hold cancel

xmudrii · 2022-03-24T11:18:06Z

/retest

xrstf force-pushed the new-update-controller branch 5 times, most recently from dcc57f5 to c4c7ff5 Compare March 18, 2022 20:21

xrstf changed the title ~~WIP - New update controller~~ WIP - Update clusters according to the version skew policy Mar 18, 2022

xrstf force-pushed the new-update-controller branch from c4c7ff5 to 9b252d1 Compare March 21, 2022 15:28

kubermatic-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 21, 2022

xrstf changed the title ~~WIP - Update clusters according to the version skew policy~~ Update clusters according to the version skew policy Mar 21, 2022

kubermatic-bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 21, 2022

xrstf removed the sig/api Denotes a PR or issue as being assigned to SIG API. label Mar 21, 2022

xrstf force-pushed the new-update-controller branch from 9b252d1 to fd028bd Compare March 21, 2022 17:24

kron4eg self-assigned this Mar 22, 2022

xmudrii reviewed Mar 22, 2022

View reviewed changes

pkg/apis/kubermatic/v1/cluster.go Show resolved Hide resolved

pkg/controller/user-cluster-controller-manager/node-version-controller/doc.go Outdated Show resolved Hide resolved

xrstf force-pushed the new-update-controller branch from ea409f0 to f66b792 Compare March 23, 2022 09:58

xrstf added 4 commits March 23, 2022 20:45

add new cluster update controller

7f81df3

fix typo

be420dc

do not reconcile clusters in deletion

360af5b

post-rebase blues

007595c

xrstf force-pushed the new-update-controller branch from 64686a9 to 007595c Compare March 23, 2022 19:47

kubermatic-bot added the lgtm Indicates that a PR is ready to be merged. label Mar 24, 2022

xmudrii approved these changes Mar 24, 2022

View reviewed changes

kubermatic-bot assigned xmudrii Mar 24, 2022

kubermatic-bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 24, 2022

kubermatic-bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 24, 2022

kubermatic-bot merged commit 6f06b3c into kubermatic:master Mar 24, 2022

xrstf deleted the new-update-controller branch March 24, 2022 12:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update clusters according to the version skew policy #9375

Update clusters according to the version skew policy #9375

xrstf commented Mar 17, 2022 •

edited

kron4eg commented Mar 22, 2022

kubermatic-bot commented Mar 22, 2022

kron4eg commented Mar 24, 2022

kubermatic-bot commented Mar 24, 2022

xmudrii left a comment

xrstf commented Mar 24, 2022

xmudrii commented Mar 24, 2022

xmudrii commented Mar 24, 2022

Update clusters according to the version skew policy #9375

Update clusters according to the version skew policy #9375

Conversation

xrstf commented Mar 17, 2022 • edited

What does this PR do / Why do we need it:

The old update controller

The new update controller

Cluster.Spec.Version now considered harmful

The node-version controller

Update Procedure

Debuggability

kron4eg commented Mar 22, 2022

kubermatic-bot commented Mar 22, 2022

kron4eg commented Mar 24, 2022

kubermatic-bot commented Mar 24, 2022

xmudrii left a comment

Choose a reason for hiding this comment

xrstf commented Mar 24, 2022

xmudrii commented Mar 24, 2022

xmudrii commented Mar 24, 2022

xrstf commented Mar 17, 2022 •

edited

`Cluster.Spec.Version` now considered harmful