Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cluster goes into updating state after rancher upgrade #32002

Closed
sowmyav27 opened this issue Apr 9, 2021 · 5 comments
Closed

cluster goes into updating state after rancher upgrade #32002

sowmyav27 opened this issue Apr 9, 2021 · 5 comments
Assignees
Labels
kind/bug-qa Issues that have not yet hit a real release. Bugs introduced by a new feature or enhancement kind/enhancement Issues that improve or augment existing functionality priority/0 release-note Note this issue in the milestone's release notes
Milestone

Comments

@sowmyav27
Copy link
Contributor

sowmyav27 commented Apr 9, 2021

What kind of request is this (question/bug/enhancement/feature request): bug

Steps to reproduce (least amount of steps as possible):

  • Deploy a HA setup - 2.4.5
  • Deploy a custom cluster - 1.18.17
  • Upgrade rancher to 2.5.8-rc2
  • The custom cluster goes into upgrading state and does an rke up; even goes through Updating worker nodes

Expected Result:
Cluster should not get upgraded

Other details that may be helpful:

  • 1.17.17 and 1.18.14 clusters also go into Updating state - a full reconcile happens after upgrade from 2.4.15 to 2.5-head commit id: 6f71d4c
  • on an upgrade from 2.4.15 to 2.5.7: 1.18.14 cluster did go into updating state.
  • K8s v1.18.12 did not go into Updating/provisioning state

Environment information

  • Rancher version (rancher/rancher/rancher/server image tag or shown bottom left in the UI): 2.5.8-rc2
  • Installation option (single install/HA): Single

Cluster information

  • Cluster type (Hosted/Infrastructure Provider/Custom/Imported): custom RKE
  • Kubernetes version (use kubectl version):
1.18
@sowmyav27 sowmyav27 self-assigned this Apr 9, 2021
@sowmyav27 sowmyav27 added the kind/bug-qa Issues that have not yet hit a real release. Bugs introduced by a new feature or enhancement label Apr 9, 2021
@sowmyav27 sowmyav27 added this to the v2.5.8 milestone Apr 9, 2021
@kinarashah kinarashah self-assigned this Apr 9, 2021
@zube zube bot removed the [zube]: Next Up label Apr 12, 2021
@kinarashah
Copy link
Member

This issue will reproduce for the following k8s versions:
v1.18.14-rancher1-1, v1.18.15-rancher1-1, v1.18.16-rancher1-1 and v1.18.17-rancher1-1.

Steps:

  • Any Rancher server version v2.4.x (>=v2.4.5)
  • Create a RKE cluster with any of the above mentioned k8s versions.
  • Upgrade to Rancher server version v2.5.x
  • Observe that the cluster automatically goes into Updating state. Also, it goes through all the provisioning steps again before becoming Active.

This issue will reproduce for the following k8s versions:
v1.17.16-rancher1-1, v1.17.17-rancher1-1 and v1.17.7-rancher2-1.

Steps:

  • Any Rancher server version v2.4.x (>=v2.4.4)
  • Create a RKE cluster with any of the above mentioned k8s versions.
  • Upgrade to Rancher server version v2.5.x
  • Observe that the cluster automatically goes into Updating state. Also, it goes through all the provisioning steps again before becoming Active.

Root cause:
System images were added for these k8s versions in order to introduce support for ACI CNI image. These images are not present in v2.4.x but got added for the same version in v2.5.x. ACI was first introduced with v1.17.16 and v1.18.14. Since then, we introduced more versions through KDM and Rancher releases. They were all based off the same changes, so this continues till the latest v1.17.17 and v1.18.17.

@kinarashah
Copy link
Member

kinarashah commented Apr 13, 2021

Note: Code fix isn't enough for this issue, extra step is required to ensure clusters don't update unless required. We'll need to document it.

Steps:

  • Any Rancher server version v2.4.x (>=v2.4.5)

  • Create a RKE cluster with any of these k8s versions (Use Edit as Yaml on cluster if the version doesn't show up in UI)
    v1.17.16-rancher1-1, v1.17.17-rancher1-1, v1.17.17-rancher2-1
    v1.18.14-rancher1-1, v1.18.15-rancher1-1, v1.18.16-rancher1-1 and v1.18.17-rancher1-1.

  • Refresh KDM, dev-v2.4 (after the fix is merged)

  • Extra step: This is to ensure the version becomes in sync with the v2.5.x versions and doesn't upgrade. Make sure the cluster remains in the Active state after the kubectl edit.

    kubectl edit cluster c-xxxx -o yaml. Increment the X in rancher1-X for the field kubernetesVersion in both spec
    and applied (status).
    For example, if the cluster version is v1.18.15-rancher1-1, change it to v1.18.15-rancher1-2 in both spec and status.
    For example, if the cluster version is v1.17.17-rancher2-1, change it to v1.17.17-rancher2-2 in both spec and status.

  • Upgrade to Rancher server version v2.5.x

  • Refresh KDM, dev-v2.5 (after the fix is merged)

  • Make sure cluster remains in the Active state.

  • If users want to stay with the same exact version (v1.18.15-rancher1-2 for our example) and still get support for ACI in Rancher v2.5.x, they can upgrade to v1.18.15-rancher1-3 for that purpose. It won't show in the UI dropdown, use Edit and Yaml option while Editing cluster to set the kubernetes_version.

@zube zube bot removed the [zube]: Working label Apr 13, 2021
@kinarashah kinarashah added release-note Note this issue in the milestone's release notes [zube]: To Test labels Apr 13, 2021
@zube zube bot removed the [zube]: Review label Apr 14, 2021
@sowmyav27
Copy link
Contributor Author

sowmyav27 commented Apr 14, 2021

On 2.4.15

  • Created custom and AWS EC2 rke clusters with these k8s versions:
v1.17.16-rancher1-1 
v1.17.17-rancher1-1
v1.17.17-rancher2-1
v1.18.14-rancher1-1
v1.18.15-rancher1-1
v1.18.16-rancher1-1 
v1.18.17-rancher1-1
  • Changed KDM to point to dev-v2.4
  • Did a kubectl edit of the cluster and changed the k8s version to -2
  • the clusters remain Active, did not go into Updating state
For example - in the management plane - exec into rancher docker container (single node docker install rancher)
For v1.17.16-rancher1-1  cluster - 

- kubectl edit cluster <cluster-id> -o yaml
- changed v1.17.16-rancher1-1  to v1.17.16-rancher1-2
- saved changes made
- The cluster did not go into Updating state. Do this for all clusters
  • Upgrade rancher to 2.5.8-rc3
  • KDM has to be manually changed to point to dev-v2.5
  • Cluster did not go into updating state.
  • Changed the k8s version of the cluster through cluster.yml in Rancher UI to -3

- For example:
- for v1.17.16-rancher1-2 cluster, edit cluster 
- Edit cluster.yaml, changed v1.17.16-rancher1-2 to v1.17.16-rancher1-3
- Save changes made

  • the cluster goes into Updating/Provisioning state. and comes back Active as expected.
  • The system workloads in the cluster are up and Active.

@Tejeev
Copy link

Tejeev commented Jun 23, 2021

If the code fix is not enough, have we documented this anywhere yet? If not, I don't think we can close this bug.
Maybe we missed it. If so can you please point us to the docs and we can close the bug.

@Tejeev Tejeev reopened this Jun 23, 2021
@kinarashah
Copy link
Member

kinarashah commented Jun 23, 2021

@Tejeev Yep, we documented it in the release notes for 2.5.8 under upgrade notes https://github.com/rancherlabs/release-notes/blob/main/rancher/v2.5.md#installupgrade-notes.

I didn't put it in the docs since the behavior is scoped only for a rancher upgrade and once the cluster goes into updating state after rancher upgrade, it doesn't happen again. Can also add it to docs if you have a page/location you'd prefer to have this information? Let me know.

@cbron cbron closed this as completed Oct 4, 2021
@zube zube bot removed the [zube]: Done label Jan 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug-qa Issues that have not yet hit a real release. Bugs introduced by a new feature or enhancement kind/enhancement Issues that improve or augment existing functionality priority/0 release-note Note this issue in the milestone's release notes
Projects
None yet
Development

No branches or pull requests

6 participants