Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RKE2 node driver cluster | nodes are replaced when upgrading Rancher from 2.6.3 to 2.6-head #36627

Closed
sgapanovich opened this issue Feb 24, 2022 · 6 comments
Assignees
Labels
area/capr/rke2 RKE2 Provisioning issues involving CAPR area/provisioning-v2 Provisioning issues that are specific to the provisioningv2 generating framework kind/bug-qa Issues that have not yet hit a real release. Bugs introduced by a new feature or enhancement release-note Note this issue in the milestone's release notes team/hostbusters The team that is responsible for provisioning/managing downstream clusters + K8s version support
Milestone

Comments

@sgapanovich
Copy link

Rancher Server Setup

  • Rancher version: start on 2.6.3 and upgrade to 2.6-head 8c785a1
  • Installation option (Docker install/Helm Chart): Helm
    • If Helm Chart, Kubernetes Cluster and version (RKE1, RKE2, k3s, EKS, etc): RKE1
  • Proxy/Cert Details: self-signed

Information about the Cluster

  • Kubernetes version: v1.21.9+rke2r1
  • Cluster Type (Local/Downstream): downstream
    • If downstream, what type of cluster? (Custom/Imported or specify provider for Hosted/Infrastructure Provider): RKE2 digital ocean node driver

Describe the bug

Downstream RKE2 node driver cluster has its nodes redeployed when upgrading Rancher from 2.6.3 to 2.6-head

To Reproduce

  1. Create HA Rancher
  2. Deploy a downstream DO RKE2 node driver with 3 nodes (each role)
  3. Upgrade Rancher to 2.6-head

Result
You can see extra nodes in the downstream cluster being deployed. After they are deployed the old ones are deleted.

Expected Result

Downstream cluster nodes should not be affected by Rancher upgrade

Screenshots

As soon as Rancher is upgraded you can see new nodes provisioning
1

each pool got a new node
2

when new nodes are provisioned the old ones are deleted
3

Additional context

@sgapanovich sgapanovich self-assigned this Feb 24, 2022
@sgapanovich sgapanovich added area/provisioning-v2 Provisioning issues that are specific to the provisioningv2 generating framework area/capr/rke2 RKE2 Provisioning issues involving CAPR team/hostbusters The team that is responsible for provisioning/managing downstream clusters + K8s version support kind/bug-qa Issues that have not yet hit a real release. Bugs introduced by a new feature or enhancement labels Feb 24, 2022
@sgapanovich sgapanovich added this to the v2.6.4 milestone Feb 24, 2022
@thedadams thedadams self-assigned this Feb 24, 2022
@jakefhyde jakefhyde self-assigned this Feb 25, 2022
@zube zube bot removed the [zube]: Next Up label Feb 25, 2022
@thedadams
Copy link
Contributor

There has been so much that has changed in RKE2 provisioning. One example is that users are able to specify DrainBeforeDelete.

In order to support these things, the MachineDeployments are updated. When this happens (which is all the Rancher controls), then CAPI will create new machines and delete the old ones.

Given the "Tech Preview" status of RKE2 provisioning, this may be a necessary evil as we go to GA status.

@Oats87
Copy link
Contributor

Oats87 commented Mar 1, 2022

Can we invert behavior on our end to just default to DrainBeforeDelete being on by default?

@snasovich
Copy link
Collaborator

@Oats87 , that's not the only reason it's triggered I believe.
At this time I think the current behavior is acceptable given the Tech Preview status of RKE2 provisioning in 2.6.3. We would like to release note it.

@snasovich snasovich added release-note Note this issue in the milestone's release notes [zube]: Need Info labels Mar 1, 2022
@zube zube bot removed the [zube]: Working label Mar 1, 2022
@snasovich snasovich assigned snasovich and unassigned thedadams and jakefhyde Mar 1, 2022
@zube zube bot removed the [zube]: Need Info label Mar 9, 2022
@snasovich
Copy link
Collaborator

snasovich commented Mar 9, 2022

Moving to test to ensure QA actually validates that cluster comes back up active on upgrade (but per this issue, it's acceptable the nodes will be replaced).

Though at this time it's blocked by #36807.

@jakefhyde
Copy link
Contributor

Can we invert behavior on our end to just default to DrainBeforeDelete being on by default?

Just wanted to add that despite CAPI draining by default, we chose to default DrainBeforeDelete to false for parity with RKE1

@sowmyav27
Copy link
Contributor

sowmyav27 commented Mar 11, 2022

On an upgrade from 2.6.3 to 2.6-head commit id: bab14a3

@snasovich @Oats87 Is this expected? "Noticed that the nodes got deleted in parallel - 1 etcd, 1 control and 1 worker nodes got deleted in parallel and reprovisioned. "

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/capr/rke2 RKE2 Provisioning issues involving CAPR area/provisioning-v2 Provisioning issues that are specific to the provisioningv2 generating framework kind/bug-qa Issues that have not yet hit a real release. Bugs introduced by a new feature or enhancement release-note Note this issue in the milestone's release notes team/hostbusters The team that is responsible for provisioning/managing downstream clusters + K8s version support
Projects
Development

No branches or pull requests

6 participants