Upgrading from k8s 1.23.14 to 1.23.15 fails in Rancher 2.6.9 #40280

wargamez · 2023-01-24T10:01:31Z

In Cluster manager I tried to upgrade to kubernetes 1.23.15 from 1.23.14 but the process ends in upgrade failed and error message , see screenshots. However when it says fail I hit the edit cluster again and to my surprise 1.23.14 is still a selectable option. If I choose that, and hit upgrade(downgrade) again, the cluster becomes green again. However the cluster info now says 1.23.15 and all nodes show 1.23.15. Is this a known bug?

wargamez · 2023-01-24T11:50:20Z

Seems worker nodes are stuck at 1.23.14 but control-plane and etcd nodes are upgraded to 1.23.15...

gestgithub · 2023-01-30T13:07:21Z

I can confirm I do also have the same issue. Tried several times with both Rancher 2.6.9 and 2.7.0 to upgrade to 1.23.15 as well as 1.24.9.

It seems that under "Workload" --> "Jobs" in the namespace kube-system you can see that the Job rke-network-plugin-deploy-job keeps failing with messages mentioned in this issue: projectcalico/calico#6258

However not all of our clusters have this issue. All of them but one did the upgrade fine.

mat1010 · 2023-01-31T08:57:21Z

We can confirm the behaviour while upgrading from 1.24.4 to 1.24.9 with rancher 2.7.1

jimliming · 2023-01-31T23:41:28Z

confirmed editing these crd's allowed the update to complete successfully /w rke 1.4.2 & k8s 1.21.5 => 1.24.9

rancher/kontainer-driver-metadata@608a5ff

niusmallnan · 2023-02-01T04:10:26Z

I am trying to figure out this problem, considering that it may be caused by the upgrade of CNI components, so I simulate the version upgrade by adjusting KDM(https://github.com/rancher/kontainer-driver-metadata).

Rancher: v2.6.10
KDM 1: https://raw.githubusercontent.com/rancher/kontainer-driver-metadata/08-30-2022/data/data.json
KDM 2: https://releases.rancher.com/kontainer-driver-metadata/release-v2.6/data.json

Case	From	To	Result
A	KDM 1 RKE 1.24.4-rancher1-1 Calico v3.22.0	KDM 2 RKE 1.24.9-rancher1-1 Calico v3.22.5	Success
B	KDM 1 RKE 1.22.13-rancher1-1 Calico v3.21.1	KDM 2 RKE 1.22.17-rancher1-1 Calico v3.22.5	Success
C	KDM 2 RKE v1.21.14-rancher1-1 Calico v3.19.2	KDM 2 RKE 1.22.17-rancher1-1 Calico v3.22.5	Success
D	KDM 2 RKE v1.20.15-rancher2-2 Calico v3.17.2	KDM 2 RKE 1.22.17-rancher1-1 Calico v3.22.5	Success

I haven't found a way to reproduce it yet.

Is the CNI using the default configuration, or has it been changed since the initial installation? Maybe this is a clue?

rootwuj · 2023-02-01T05:16:52Z

I tried to follow these steps to test but did not reproduce the issue. Cluster can be upgraded.

Rancher:v2.6.9

KDM1: https://raw.githubusercontent.com/rancher/kontainer-driver-metadata/11-28-2022/data/data.json
KDM2: https://releases.rancher.com/kontainer-driver-metadata/release-v2.6/data.json

Steps:

install v2.6.9
update settings -> rke-metadata-config url to KDM1
create rke cluster, k8s version select v1.23.14-rancher1-1, network select Calico, the cluster active.
update settings -> rke-metadata-config url to KDM2
Upgrade the k8s version of the cluster to v1.23.15-rancher1-1，The cluster can be upgraded successfully.

fengxx · 2023-02-01T05:50:55Z

there are RKE1 clusters setup 2 years ago which used v1beta1 CRD, and spec.preserveUnknownFields value is set to true (v1beta1 default to true).
if you use recent k8s version and want to reproduce the issue, just edit CRD and set preserveUnknownFields to true before upgrading

workaround:

kubectl get crd ipamblocks.crd.projectcalico.org -o yaml |sed 's#preserveUnknownFields: true#preserveUnknownFields: false#' |kubectl apply -f -
kubectl get crd felixconfigurations.crd.projectcalico.org -o yaml |sed 's#preserveUnknownFields: true#preserveUnknownFields: false#' |kubectl apply -f -

P.S. upstream already added backwards-compatible fix in projectcalico/calico#6242 but only in 3.24

niusmallnan · 2023-02-01T15:18:20Z

Using the v1beta1 CRD as a clue, I did this test.

I would check the setting of preserveUnknownFields.
kubectl get crd ipamblocks.crd.projectcalico.org -o yaml | grep preserveUnknownFields

Calico uses v1 CRDs since 3.15.

Rancher: v2.6.10
Upgrade path:

RKE v1.18.18-rancher1-2, Calico/Canal v3.13.4, preserveUnknownFields=true, Success
RKE v1.19.16-rancher2-1, Calico/Canal v3.16.5, preserveUnknownFields=true, Success
RKE v1.20.15-rancher2-1, Calico/Canal v3.17.2, preserveUnknownFields=true, Success
RKE v1.21.14-rancher1-1, Calico/Canal v3.19.2, preserveUnknownFields=true, Success
RKE v1.22.17-rancher1-1, Calico/Canal v3.22.5, preserveUnknownFields=true, Fail

When the upgrade fails, I can see that the rke-network-plugin-deploy-job fails to run. It shows some error logs:

for: "/etc/config/rke-network-plugin.yaml": customresourcedefinitions.apiextensions.k8s.io "felixconfigurations.crd.projectcalico.org" is invalid: metadata.resourceVersion: Invalid value: 0x0: must be specified for an update
for: "/etc/config/rke-network-plugin.yaml": customresourcedefinitions.apiextensions.k8s.io "ipamblocks.crd.projectcalico.org" is invalid: metadata.resourceVersion: Invalid value: 0x0: must be specified for an update

I run the following command, edit the cluster on the UI, and click the save button to upgrade the cluster successfully.

kubectl get crd ipamblocks.crd.projectcalico.org -o yaml |sed 's#preserveUnknownFields: true#preserveUnknownFields: false#' |kubectl apply -f -
kubectl get crd felixconfigurations.crd.projectcalico.org -o yaml |sed 's#preserveUnknownFields: true#preserveUnknownFields: false#' |kubectl apply -f -
kubectl annotate crd felixconfigurations.crd.projectcalico.org kubectl.kubernetes.io/last-applied-configuration-
kubectl annotate crd ipamblocks.crd.projectcalico.org kubectl.kubernetes.io/last-applied-configuration-

The Calico v3.22.5 version may have some special changes. From the information provided, all RKEs that fail to upgrade use Calico v3.22.5. Never mind, there should be a solution here.

If anyone finds other errors in rke-network-plugin-deploy-job, please provide clues. I can update the workaround.

fengxx · 2023-02-01T15:49:14Z

Calico 3.22 CRD introduced new fields with default values, so have to set preserveUnknownFields=false, please help review the PR rancher/kontainer-driver-metadata#1069

pkhamre · 2023-02-02T11:26:31Z

I run the following command, edit the cluster on the UI, and click the save button to upgrade the cluster successfully.

kubectl get crd ipamblocks.crd.projectcalico.org -o yaml |sed 's#preserveUnknownFields: true#preserveUnknownFields: false#' |kubectl apply -f -
kubectl get crd felixconfigurations.crd.projectcalico.org -o yaml |sed 's#preserveUnknownFields: true#preserveUnknownFields: false#' |kubectl apply -f -
kubectl annotate crd felixconfigurations.crd.projectcalico.org kubectl.kubernetes.io/last-applied-configuration-
kubectl annotate crd ipamblocks.crd.projectcalico.org kubectl.kubernetes.io/last-applied-configuration-

I can confirm this workaround successfully worked on our Rancher-environment, thanks.

gregfurman · 2023-02-03T17:05:34Z

We used the workaround provided by @niusmallnan to go from K8s v1.22.9 and Rancher v2.6.5 -> K8s v1.24.9 Rancher v2.7.1.

wargamez · 2023-02-04T22:24:34Z

I can confirm that this worked aswell. Thank you so much!

snasovich · 2023-02-09T23:41:45Z

/forwardport v2.7.2

jloisel · 2023-02-10T08:06:57Z

Same issue here, upgrading a cluster from v1.23.14 to v1.23.15 on Rancher 2.6.10. Can confirm we have a RKE1 cluster from 3 years ago.

rayandas · 2023-02-15T05:11:04Z

Steps to validate:

Setup Rancher v2.6.5 using docker
Create a v1.18.20-rancher1 cluster.
Upgrade to Rancher v2.6.10 using these steps
Upgrade the existing v1.18.20-rancher1 cluster to v1.23.16-rancher1 OR v1.24.10-rancher1

The job rke-network-plugin-deploy-job shouldn't fail and calico/canal pods should be running.

cc: @rishabhmsra

rishabhmsra · 2023-02-15T10:51:06Z

Validated using below steps:

On rancher v2.6.5, provisioned k8s v1.18.20-rancher1-1, ec2 node driver cluster(1-cp, 1-e, 1-w)
Upgraded rancher server to version v2.6.10
Upgraded existing k8s cluster to version v1.23.16-rancher1-1.
Validated the rke-network-plugin-deploy-job and pods.

Result:

Cluster upgraded successfully to v1.23.16-rancher1-1 and rke-network-plugin-deploy-job is in Completed state with canal pods up and running:

rishabhmsra · 2023-02-20T11:51:31Z

Validated again using below steps:

On rancher v2.6.5, provisioned k8s v1.18.20-rancher1-3, ec2 node driver cluster(1-cp, 1-e, 1-w)
Upgraded rancher server to v2.6-head(5d91d1a) successfully with KDM pointing to dev-v2.6.
Upgraded existing k8s cluster to version : v1.23.16-rancher2-1
Validated the rke-network-plugin-deploy-job and pods.

Result :

Cluster upgraded successfully to v1.23.16-rancher2-1 and rke-network-plugin-deploy-job is in Completed state with all pods, including canal pods up and running:

wargamez added the kind/bug Issues that are defects reported by users or that we know have reached a real release label Jan 24, 2023

fengxx mentioned this issue Jan 31, 2023

[v2.6] add preserveUnknownFields: false to crd, fix The CustomResourceDefini… rancher/kontainer-driver-metadata#1069

Closed

Sahota1225 added this to the v2.6-KDM-January-2023-patches milestone Feb 2, 2023

Sahota1225 added the team/hostbusters The team that is responsible for provisioning/managing downstream clusters + K8s version support label Feb 2, 2023

rayandas mentioned this issue Feb 7, 2023

[dev-v2.6-0123-patches] Add preserveUnknownFields: false in calico and canal template rancher/kontainer-driver-metadata#1072

Merged

1 task

rayandas self-assigned this Feb 7, 2023

rancherbot mentioned this issue Feb 9, 2023

[Forwardport v2.7] Upgrading from k8s 1.23.14 to 1.23.15 fails in Rancher 2.6.9 #40485

Closed

rayandas added [zube]: Review team/infracloud [zube]: To Test labels Feb 10, 2023

zube bot removed the [zube]: Review label Feb 15, 2023

rishabhmsra added the [zube]: QA Working label Feb 15, 2023

zube bot removed the [zube]: To Test label Feb 15, 2023

rishabhmsra closed this as completed Feb 15, 2023

zube bot added [zube]: Done and removed [zube]: QA Working labels Feb 15, 2023

snasovich modified the milestones: v2.6-KDM-January-2023-patches, v2.6.11 Feb 20, 2023

mitulshah-suse reopened this Feb 20, 2023

mitulshah-suse added [zube]: To Test and removed [zube]: Done labels Feb 20, 2023

rishabhmsra self-assigned this Feb 20, 2023

rishabhmsra added the [zube]: QA Working label Feb 20, 2023

zube bot removed the [zube]: To Test label Feb 20, 2023

rishabhmsra closed this as completed Feb 20, 2023

zube bot added [zube]: Done and removed [zube]: QA Working labels Feb 20, 2023

zube bot removed the [zube]: Done label May 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upgrading from k8s 1.23.14 to 1.23.15 fails in Rancher 2.6.9 #40280

Upgrading from k8s 1.23.14 to 1.23.15 fails in Rancher 2.6.9 #40280

wargamez commented Jan 24, 2023 •

edited

wargamez commented Jan 24, 2023 •

edited

gestgithub commented Jan 30, 2023 •

edited

mat1010 commented Jan 31, 2023

jimliming commented Jan 31, 2023

niusmallnan commented Feb 1, 2023

rootwuj commented Feb 1, 2023

fengxx commented Feb 1, 2023 •

edited

niusmallnan commented Feb 1, 2023 •

edited

fengxx commented Feb 1, 2023

pkhamre commented Feb 2, 2023

gregfurman commented Feb 3, 2023

wargamez commented Feb 4, 2023 •

edited

snasovich commented Feb 9, 2023

jloisel commented Feb 10, 2023 •

edited

rayandas commented Feb 15, 2023

rishabhmsra commented Feb 15, 2023

rishabhmsra commented Feb 20, 2023 •

edited

Upgrading from k8s 1.23.14 to 1.23.15 fails in Rancher 2.6.9 #40280

Upgrading from k8s 1.23.14 to 1.23.15 fails in Rancher 2.6.9 #40280

Comments

wargamez commented Jan 24, 2023 • edited

wargamez commented Jan 24, 2023 • edited

gestgithub commented Jan 30, 2023 • edited

mat1010 commented Jan 31, 2023

jimliming commented Jan 31, 2023

niusmallnan commented Feb 1, 2023

rootwuj commented Feb 1, 2023

fengxx commented Feb 1, 2023 • edited

niusmallnan commented Feb 1, 2023 • edited

fengxx commented Feb 1, 2023

pkhamre commented Feb 2, 2023

gregfurman commented Feb 3, 2023

wargamez commented Feb 4, 2023 • edited

snasovich commented Feb 9, 2023

jloisel commented Feb 10, 2023 • edited

rayandas commented Feb 15, 2023

rishabhmsra commented Feb 15, 2023

rishabhmsra commented Feb 20, 2023 • edited

wargamez commented Jan 24, 2023 •

edited

wargamez commented Jan 24, 2023 •

edited

gestgithub commented Jan 30, 2023 •

edited

fengxx commented Feb 1, 2023 •

edited

niusmallnan commented Feb 1, 2023 •

edited

wargamez commented Feb 4, 2023 •

edited

jloisel commented Feb 10, 2023 •

edited

rishabhmsra commented Feb 20, 2023 •

edited