RKE2 node driver cluster | nodes are replaced when upgrading Rancher from 2.6.3 to 2.6-head #36627

sgapanovich · 2022-02-24T00:08:04Z

Rancher Server Setup

Rancher version: start on 2.6.3 and upgrade to 2.6-head 8c785a1
Installation option (Docker install/Helm Chart): Helm
- If Helm Chart, Kubernetes Cluster and version (RKE1, RKE2, k3s, EKS, etc): RKE1
Proxy/Cert Details: self-signed

Information about the Cluster

Kubernetes version: v1.21.9+rke2r1
Cluster Type (Local/Downstream): downstream
- If downstream, what type of cluster? (Custom/Imported or specify provider for Hosted/Infrastructure Provider): RKE2 digital ocean node driver

Describe the bug

Downstream RKE2 node driver cluster has its nodes redeployed when upgrading Rancher from 2.6.3 to 2.6-head

To Reproduce

Create HA Rancher
Deploy a downstream DO RKE2 node driver with 3 nodes (each role)
Upgrade Rancher to 2.6-head

Result
You can see extra nodes in the downstream cluster being deployed. After they are deployed the old ones are deleted.

Expected Result

Downstream cluster nodes should not be affected by Rancher upgrade

Screenshots

As soon as Rancher is upgraded you can see new nodes provisioning

each pool got a new node

when new nodes are provisioned the old ones are deleted

Additional context

The text was updated successfully, but these errors were encountered:

thedadams · 2022-03-01T22:24:02Z

There has been so much that has changed in RKE2 provisioning. One example is that users are able to specify DrainBeforeDelete.

In order to support these things, the MachineDeployments are updated. When this happens (which is all the Rancher controls), then CAPI will create new machines and delete the old ones.

Given the "Tech Preview" status of RKE2 provisioning, this may be a necessary evil as we go to GA status.

Oats87 · 2022-03-01T23:08:22Z

Can we invert behavior on our end to just default to DrainBeforeDelete being on by default?

snasovich · 2022-03-01T23:26:42Z

@Oats87 , that's not the only reason it's triggered I believe.
At this time I think the current behavior is acceptable given the Tech Preview status of RKE2 provisioning in 2.6.3. We would like to release note it.

snasovich · 2022-03-09T20:23:56Z

Moving to test to ensure QA actually validates that cluster comes back up active on upgrade (but per this issue, it's acceptable the nodes will be replaced).

Though at this time it's blocked by #36807.

jakefhyde · 2022-03-09T23:49:04Z

Can we invert behavior on our end to just default to DrainBeforeDelete being on by default?

Just wanted to add that despite CAPI draining by default, we chose to default DrainBeforeDelete to false for parity with RKE1

sowmyav27 · 2022-03-11T22:26:37Z

On an upgrade from 2.6.3 to 2.6-head commit id: bab14a3

Deployed a 3 etcd, 2 cp and 3 worker node RKE2 cluster on v1.21.9+rke2r1 version
deployed resources on the downstream cluster
Upgrade rancher to 2.6-head
Cluster is in an error state for sometime, before going back Active and the nodes start getting deleted and reprovisioned (logged bug - Error error syncing 'fleet-default/sowmya-rke2-263': handler rke-cluster: failed to update fleet-default/sowmya-rke2-263 cluster.x-k8s.io/v1beta1 seen on RKE2 cluster after upgrade from 2.6.3 to 2.6head #36852)
Noticed that the nodes got deleted in parallel - 1 etcd, 1 control and 1 worker nodes got deleted in parallel and reprovisioned.
Existing resources on the clusters worked fine, and new resources were deployed

@snasovich @Oats87 Is this expected? "Noticed that the nodes got deleted in parallel - 1 etcd, 1 control and 1 worker nodes got deleted in parallel and reprovisioned. "

sgapanovich self-assigned this Feb 24, 2022

sgapanovich added this to the v2.6.4 milestone Feb 24, 2022

sgapanovich mentioned this issue Feb 24, 2022

Scale Node Pool Up or Down for RKE2 provisioned clusters is not working for long cluster/node pool names #35452

Closed

thedadams self-assigned this Feb 24, 2022

jakefhyde self-assigned this Feb 25, 2022

jakefhyde added [zube]: To Triage and removed [zube]: To Triage labels Feb 25, 2022

zube bot removed the [zube]: Next Up label Feb 25, 2022

thedadams added [zube]: Next Up and removed [zube]: Working labels Feb 25, 2022

snasovich added release-note Note this issue in the milestone's release notes [zube]: Need Info labels Mar 1, 2022

zube bot removed the [zube]: Working label Mar 1, 2022

snasovich assigned snasovich and unassigned thedadams and jakefhyde Mar 1, 2022

snasovich added the [zube]: To Test label Mar 9, 2022

zube bot removed the [zube]: Need Info label Mar 9, 2022

snasovich added the [zube]: QA Blocked label Mar 9, 2022

zube bot removed the [zube]: To Test label Mar 9, 2022

sowmyav27 mentioned this issue Mar 11, 2022

RKE2 Cluster is stuck in provisioning state after an upgrade to 2.6-head and rollback to 2.6.3 #36859

Closed

sowmyav27 closed this as completed Mar 18, 2022

zube bot added [zube]: Done and removed [zube]: QA Blocked labels Mar 18, 2022

zube bot removed the [zube]: Done label Jun 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RKE2 node driver cluster | nodes are replaced when upgrading Rancher from 2.6.3 to 2.6-head #36627

RKE2 node driver cluster | nodes are replaced when upgrading Rancher from 2.6.3 to 2.6-head #36627

sgapanovich commented Feb 24, 2022

thedadams commented Mar 1, 2022

Oats87 commented Mar 1, 2022

snasovich commented Mar 1, 2022

snasovich commented Mar 9, 2022 •

edited

Loading

jakefhyde commented Mar 9, 2022

sowmyav27 commented Mar 11, 2022 •

edited

Loading

RKE2 node driver cluster | nodes are replaced when upgrading Rancher from 2.6.3 to 2.6-head #36627

RKE2 node driver cluster | nodes are replaced when upgrading Rancher from 2.6.3 to 2.6-head #36627

Comments

sgapanovich commented Feb 24, 2022

thedadams commented Mar 1, 2022

Oats87 commented Mar 1, 2022

snasovich commented Mar 1, 2022

snasovich commented Mar 9, 2022 • edited Loading

jakefhyde commented Mar 9, 2022

sowmyav27 commented Mar 11, 2022 • edited Loading

snasovich commented Mar 9, 2022 •

edited

Loading

sowmyav27 commented Mar 11, 2022 •

edited

Loading