Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Local k3s clusters were not able to be upgraded within Rancher post upgrade from 2.5 to 2.6 #35797

Closed
deniseschannon opened this issue Dec 8, 2021 · 10 comments
Assignees
Labels
internal team/hostbusters The team that is responsible for provisioning/managing downstream clusters + K8s version support
Milestone

Comments

@deniseschannon
Copy link

deniseschannon commented Dec 8, 2021

This seems to be fixed with the latest SUC version introduced in 2.6.3, but this is a explicit request to test this use case to check.

SURE-3539
SURE-3388

Before rancher server upgrade: Rancher version - v2.5.9, k3s v1.20.9+k3s1, SUC: 0.6.2

Post rancher server upgrade: Rancher version - v2.6.2, k3s v1.20.11+k3s2, SUC: 0.7.5

  1. Created a k3s HA cluster on the version 1.20.9
  2. Installed rancher v2.5.9 on this k3s cluster
  3. Upgraded the k3s version to v1.20.11+k3s2
  4. Verified all the nodes were upgraded - SUC version 0.6.2 and the service account name is system-upgrade
    kubectl get serviceaccount -A | grep -i system-upgrade
    apps & market --> installed apps --> SUC app installed
  5. Upgraded the rancher version to v2.6.2
  6. Verified all rancher pods were upgraded. SUC version is 0.7.5
  7. Upgraded the k3s version in the local cluster from the UI to v1.21.4+k3s2
  8. The initial node, the upgrade was attempted on was stuck in cordoned and the local cluster is stuck in upgrading and the service account name is system-upgrade-controller
  9. Following error is seen in the cluster events:
    Error creating: pods "apply-k3s-master-plan-on-ip-172-31-8-247-with-57eb7803114-9fcec-" is forbidden: error looking up service account cattle-system/system-upgrade: serviceaccount "system-upgrade" not found
@deniseschannon deniseschannon added this to the v2.6.3 milestone Dec 8, 2021
@snasovich snasovich added the team/hostbusters The team that is responsible for provisioning/managing downstream clusters + K8s version support label Dec 8, 2021
@snasovich snasovich self-assigned this Dec 8, 2021
@snasovich
Copy link
Collaborator

Assigning myself for tracking purposes.
@sowmyav27 , please assign QA and test. Thank you.

@jmcsagdc
Copy link

jmcsagdc commented Dec 12, 2021

verified fix in v2.6-e62e1ae34eca46bd733b7f67d1cea055395ac45d-head

  1. Created a k3s HA cluster on the version 1.20.9
  2. Installed rancher v2.5.9 on this k3s cluster
  3. Upgraded the k3s version to v1.20.12+k3s1
  4. Verified all the nodes were upgraded
  5. kubectl get serviceaccount -A | grep -i system-upgrade (present as expected)
  6. apps & market --> installed apps --> SUC app installed (yes)
  7. Upgraded the rancher version to v2.6-head
  8. Verified all rancher pods were upgraded. SUC version is 0.6.2
  9. Upgraded the k3s version in the local cluster from the UI to v1.22.4+k3s1

RESULTS:
Upgrades complete successfully, there are no cordoned pods, no pods in error state, no missing system-upgrade error, no forbidden error. Stack is up and useful.

k3s-upgrade-25

k3s-upgrade-26

@sowmyav27
Copy link
Contributor

Reopening for a validation from 1.20 to 1.21 k3s version upgrade

@jmcsagdc
Copy link

jmcsagdc commented Dec 14, 2021

On 2.6.3-rc4 the issue remains:

  1. Created a k3s HA cluster on the version 1.20.9
  2. Installed rancher v2.5.9 on this k3s cluster
  3. Upgraded the k3s version to v1.20.12+k3s1
  4. Verified all the nodes were upgraded
  5. kubectl get serviceaccount -A | grep -i system-upgrade (present as expected)
  6. apps & market --> installed apps --> SUC app installed (yes)
  7. Upgraded the rancher version to v2.6-rc4
  8. Verified all rancher pods were upgraded.
  9. Upgraded the k3s version in the local cluster from the UI to v1.21.5+k3s2 (rancher/system-upgrade-controller:v0.8.0 - visible in local / all namespaces > cattle-system > workload tab)

EXPECTED:
Upgrade to proceed as normal

ACTUAL:
Get the forbidden error and the cluster is stuck in "Upgrading" status

1 21 fails

@jmcsagdc
Copy link

Timing with the SUC versions, when they launch and what they are included here.

2.5.9

1-detail-fail

Post-Upgrade to 2.6-head

2-detail-fail

@kinarashah
Copy link
Member

Available to test with v2.6-head after https://drone-publish.rancher.io/rancher/rancher/7030 passes.

@jmcsagdc
Copy link

Fixed in v2.6-bbde38e8cbf1007ba81ce5e68f674948a0033c0d-head

SUC 0.8.1

fixed-in-head

@Vashiru
Copy link

Vashiru commented Dec 21, 2021

I just ran into this with one of my downstream cluster rather than my local cluster.

  1. Install Rancher 2.5.9
  2. Manage a downstream cluster running v1.20.9+k3s1
  3. Upgrade to Rancher 2.5.11
  4. Upgrade to Rancher 2.6.2
  5. Upgrade the downstream cluster from v1.20.9+k3s1 to v1.21.7+k3s1 (1.20.12 was NOT offered to me, so I figured I was on the latest 1.20 release, turns out I wasn't...)
  6. I get stuck here:

image

How do I recover from this now? The upgrade has been attempting for well over an hour now on my Raspberry Pi 4 4 GB.

@dnoland1
Copy link
Contributor

@Vashiru The following worked for me:

kubectl get serviceaccount -n cattle-system system-upgrade-controller -o yaml > system-upgrade.yaml
vi system-upgrade.yaml    # change metadata.name to system-upgrade
kubectl apply -f system-upgrade.yaml
kubectl get clusterrolebinding system-upgrade-controller -o yaml > crb.yaml
vi crb.yaml     # change metadata.name and subjects name to system-upgrade
kubectl apply -f crb.yaml

@Vashiru
Copy link

Vashiru commented Dec 21, 2021

@dnoland1 Cheers mate, appreciated! That did the trick for me too.

For anyone else running into this. It might still throw a message about that system-upgrade-token volume not being attached at first, but wait a few more minutes. Shortly after it first pulled some old images and then started upgrading. Couple more minutes later my Raspberry Pi was purring like a kitten running v1.21.7. I'll wait for Rancher 2.6.3 before upgrading my local cluster.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
internal team/hostbusters The team that is responsible for provisioning/managing downstream clusters + K8s version support
Projects
None yet
Development

No branches or pull requests

10 participants