Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v2.6] KDM Update for K8s 2022 April patch release #37388

Closed
snasovich opened this issue Apr 18, 2022 · 14 comments
Closed

[v2.6] KDM Update for K8s 2022 April patch release #37388

snasovich opened this issue Apr 18, 2022 · 14 comments
Assignees
Labels
area/kdm feature/k8s-version team/hostbusters The team that is responsible for provisioning/managing downstream clusters + K8s version support team/infracloud

Comments

@snasovich
Copy link
Collaborator

snasovich commented Apr 18, 2022

KDM Updates

TO BE RELEASED: Enabled OOB after Rancher v2.6.5 (likely before v2.6.5 actually)

New Kubernetes Version Min Rancher Version Min RKE Version
1.23.6-rancher1-1 2.6.4-rc0 1.3.3-rc0
1.22.9-rancher1-1 2.6.3-rc0 1.3.3-rc0
1.21.12-rancher1-1 2.6.0-rc0 1.3.0-rc0

Rancher System Images & Add-ons

Note: Please update system images and add-ons in the template as needed.

TBD

Developer Changelog

  • Add templates for new K8s versions. If a template that hasn't been released yet already exists, update it in place.
  • Update the default K8s version, and add or update the version information (MinRancherVersion and MinRKEVersion) in k8s_version_info.go for new K8s versions if needed (Use table from above).
  • Confirm versions for system images, add-on templates are correct using versioning reference table from below (expand details to view the table). Update the table if new versions are introduced.
  • Confirm add-on template version constraints in template.go are correct using versioning reference table from below (expand details to view the table). Update the table if needed.

Versioning Reference

Expand for system images & add-on templates versioning table

Note: Truncated repository names with a * indicates there are multiple images with that prefix. All repository names that match a truncated prefix should be checked for the same version stated in the table.
For example: rancher/weave-* matches weaveworks/weave-kube and weaveworks/weave-npc, and they should both have the same version.

v1.23.6-rancher1-1

Repository Tag
rancher/mirrored-coreos-etcd v3.4.16-rancher1
rancher/hyperkube v1.23.6-rancher1
rancher/rke-tools v0.1.78
rancher/mirrored-k8s-dns-* 1.17.4
rancher/mirrored-cluster-proportional-autoscaler 1.8.3
rancher/mirrored-coreos-flannel v0.15.1
rancher/flannel-cni v0.3.0-rancher6
rancher/mirrored-calico-* v3.19.1
weaveworks/weave-* 2.8.1
noiro/* (All ACI images) 5.1.1.0.1ae238a
rancher/mirrored-pause 3.4.1
rancher/nginx-ingress-controller nginx-1.1.0-rancher1
rancher/mirrored-nginx-ingress-controller-defaultbackend 1.5-rancher1
rancher/mirrored-jettech-kube-webhook-certgen v1.1.1
rancher/mirrored-metrics-server v0.5.1
rancher/mirrored-coredns-coredns 1.8.4
rancher/mirrored-cluster-proportional-autoscaler 1.8.3
rancher/kubelet-pause v0.1.6
rancher/mirrored-k8s-dns-node-cache 1.18.0
Add-on Name Add-on Template Version Constraint
Calico calicov319
Canal canalv3211
Flannel flannelv0140
CoreDNS coreDnsv183
KubeDNS kubeDnsv116
MetricsServer metricsServerv050
Weave weavev120
Aci aciv500
NginxIngress nginxIngressv110Rancher3
Nodelocal nodelocalv121

v1.22.9-rancher1-1

Repository Tag
rancher/mirrored-coreos-etcd v3.4.16-rancher1
rancher/hyperkube v1.22.9-rancher1
rancher/rke-tools v0.1.78
rancher/mirrored-k8s-dns-* 1.17.4
rancher/mirrored-cluster-proportional-autoscaler 1.8.3
rancher/mirrored-coreos-flannel v0.15.1
rancher/flannel-cni v0.3.0-rancher6
rancher/mirrored-calico-* v3.19.1
weaveworks/weave-* 2.8.1
noiro/* (All ACI images) 5.1.1.0.1ae238a
rancher/mirrored-pause 3.4.1
rancher/nginx-ingress-controller nginx-1.1.0-rancher1
rancher/mirrored-nginx-ingress-controller-defaultbackend 1.5-rancher1
rancher/mirrored-jettech-kube-webhook-certgen v1.1.1
rancher/mirrored-metrics-server v0.5.1
rancher/mirrored-coredns-coredns 1.8.4
rancher/mirrored-cluster-proportional-autoscaler 1.8.3
rancher/kubelet-pause v0.1.6
rancher/mirrored-k8s-dns-node-cache 1.18.0
Add-on Name Add-on Template Version Constraint
Calico calicov319
Canal canalv3211
Flannel flannelv0140
CoreDNS coreDnsv183
KubeDNS kubeDnsv116
MetricsServer metricsServerv050
Weave weavev120
Aci aciv500
NginxIngress nginxIngressv110Rancher3
Nodelocal nodelocalv121

v1.21.12-rancher1-1

Repository Tag
rancher/mirrored-coreos-etcd v3.4.16-rancher1
rancher/hyperkube v1.21.12-rancher1
rancher/rke-tools v0.1.78
rancher/mirrored-k8s-dns-* 1.17.4
rancher/mirrored-cluster-proportional-autoscaler 1.8.3
rancher/mirrored-coreos-flannel v0.15.1
rancher/flannel-cni v0.3.0-rancher6
rancher/mirrored-calico-* v3.19.1
weaveworks/weave-* 2.8.1
noiro/* (All ACI images) 5.1.1.0.1ae238a
rancher/mirrored-pause 3.4.1
rancher/nginx-ingress-controller nginx-1.1.0-rancher1
rancher/mirrored-nginx-ingress-controller-defaultbackend 1.5-rancher1
rancher/mirrored-jettech-kube-webhook-certgen v1.1.1
rancher/mirrored-metrics-server v0.5.0
rancher/mirrored-coredns-coredns 1.8.4
rancher/mirrored-cluster-proportional-autoscaler 1.8.3
rancher/kubelet-pause v0.1.6
rancher/mirrored-k8s-dns-node-cache 1.18.0
Add-on Name Add-on Template Version Constraint
Calico calicov319
Canal canalv319
Flannel flannelv115
CoreDNS coreDnsv183
KubeDNS kubeDnsv116
MetricsServer metricsServerv050
Weave weavev120
Aci aciv500
NginxIngress nginxIngressv110Rancher3
Nodelocal nodelocalv121
@snasovich snasovich added team/hostbusters The team that is responsible for provisioning/managing downstream clusters + K8s version support team/infracloud labels Apr 18, 2022
@sowmyav27 sowmyav27 changed the title [v2.6] KDM Update for K8s 2022 March patch release [v2.6] KDM Update for K8s 2022 April patch release Apr 18, 2022
@rayandas rayandas self-assigned this Apr 19, 2022
@snasovich
Copy link
Collaborator Author

One of the reasons we're pushing forward for these versions is to bump etcd to 3.5.3 as it has the fix for possible data corruption issue. These will need to be bumped for 1.22 and 1.23 versions.
@kinarashah , am I correct that we just need to add this new version to https://github.com/rancher/image-mirror/blob/master/images-list and then update referenced image in KDM?
cc: @rayandas

@vivek-shilimkar vivek-shilimkar self-assigned this Apr 20, 2022
@rayandas
Copy link
Contributor

Alright! I am raising a PR to add etcd 3.5.3 if we're going to bump the etcd version in April patch. and then update the templates for 1.22 and 1.23 with the updated etcd image.

@kinarashah
Copy link
Member

@snasovich yeah, that's correct. @rayandas if you run into issues bringing up the new versions in case it needs changes in client code, let me know.

@rayandas
Copy link
Contributor

rayandas commented Apr 21, 2022

@kinarashah Sure. April patches will go in with 2.6.5? or out of band after 2.6.5?

@snasovich
Copy link
Collaborator Author

@rayandas , at this time we're shooting OOB before 2.6.5.

@rayandas
Copy link
Contributor

@snasovich the PR rancher/kontainer-driver-metadata#884 is merged. Which has “Enabled in v2.6.5”. Should I update it creating another PR?
@kinarashah

@kinarashah
Copy link
Member

@rayandas No worries, I have a PR open to update nginx, I can update the release information to out of band post v2.6.4.

@rayandas
Copy link
Contributor

Thanks @kinarashah

@rishabhmsra
Copy link
Contributor

Minimum rancher version checks - Pass

  • On rancher v2.6.4:

    • KDM pointing to dev-v2.6, cluster creation succeeds for k8s version v1.23.6-rancher1-1 with all network provider.
    • Network related checks look good.
  • On rancher v2.6.3:

    • KDM pointing to dev-v2.6, cluster creation succeeds for k8s version v1.22.9-rancher1-1 with all network provider.
    • Network related checks look good.
  • On rancher v2.6.0:

    • KDM pointing to dev-v2.6, cluster creation succeeds for k8s version v1.21.12-rancher1-1 with all network provider.
    • Network related checks look good.

@vivek-shilimkar
Copy link
Member

Fresh install checks and RKE checks looks good.

Provisioning Checks on Fresh Install of Rancher 2.6.5-rc3 - PASS

  1. KDM points to dev-v2.6 branch.
  2. Cluster creation succeeds for 1.23.6-rancher1-1, 1.22.9-rancher1-1, 1.21.12-rancher1-1 k8s versions with all network providers
  3. Network checks succeeded for all new clusters
  4. Ran validation tests for Canal cluster.
  5. Network checks and Canal checks pass.

RKE checks using 1.3.10-rc5 passed for v1.23.6-rancher1-1, 1.22.9-rancher1-1, 1.21.12-rancher1-1 k8s version.

RKE default k8s version -
dev-v2.6 should have the default as v1.22.9-rancher1-1

{
"RKEDefaultK8sVersions": {
"0.3": "v1.16.3-rancher1-1",
"default": "v1.22.9-rancher1-1"
}

@markusewalker
Copy link
Contributor

markusewalker commented Apr 26, 2022

TEST CASE
Upgrade RKE1 cluster from v1.22.9-rancher1-1 to v1.23.6-rancher1-1

TEST RESULT
PASS

VERIFICATION STEPS

  1. Setup Rancher v2.6-head and create a downstream Linode RKE1 cluster with the following:
    • 3 etcd
    • 2 cp
    • 3 worker
  2. Once up and running, created a test deployment and test daemon set in the new RKE1 cluster.
  3. Took snapshot of the RKE1 cluster.
  4. Upgraded cluster to v1.23.6-rancher1-1. Verified all of the nodes were updated, all of my workloads were Active and the cluster itself is listed as Active.
  5. Copied the cluster's kubeconfig and put it on a local client machine. From there, I ssh'ed into one of my etcd machines.
  6. Verified that the version is 3.5.3:
root@...:~# docker exec -it etcd sh
# etcd version
{"level":"info","ts":"2022-04-26T23:18:30.883Z","caller":"etcdmain/etcd.go:73","msg":"Running: ","args":["etcd","version"]}
{"level":"warn","ts":"2022-04-26T23:18:30.883Z","caller":"etcdmain/etcd.go:75","msg":"failed to verify flags","error":"'version' is not a valid flag"}
# etcdctl version
etcdctl version: 3.5.3
API version: 3.5
# 
  1. Back in Rancher UI, created post-upgrade test deployment and daemon set once more.
  2. Took a new snapshot at this version.
  3. Restored my v1.22.9-rancher1-1 snapshot.
  4. Verified my cluster was listed as Active along with all of my workloads. Additionally, all of my nodes were reverted back to version v1.22.9 and listed as Active. My post-upgrade test deployment and daemon set were not present as expected.

@Josh-Diamond
Copy link
Contributor

Josh-Diamond commented Apr 27, 2022

Test Cases - Etcd Backup/Restore RKE1 - PASS ✅

Test Case #1

With Docker on a single-node instance, and on rancher v2.6-7db4fe4e81939a5f6b420f303478a169cc35588a-head:

  1. Fresh install of rancher 2.6-head
  2. Provision a downstream RKE1 Custom cluster with 3 etcd, 2 cp, 3 wkr nodes and with k8s v1.22.9
  3. Once Active, deploy a couple workloads; [wk1, wk2]
  4. Take etcd snapshot - snap_A
  5. Deploy a couple more workloads; [wk3, wk4]
  6. Take another etcd snapshot - snap_B
  7. Restore cluster to snap_A with etcd only option selected
  8. Verified - cluster goes into Updating state
  9. Verified - cluster comes up Active
  10. Verified - wk1 + wk2 are only workloads available
  11. Verified - all components in System project page are Active
  12. Verified - Etcd version 3.5.3

Screenshot:

Etcd version
1229-version


Test Case #2

With Docker on a single-node instance, and on rancher v2.6-7db4fe4e81939a5f6b420f303478a169cc35588a-head:

  1. Fresh install of rancher 2.6-head
  2. Provision a downstream RKE1 Custom cluster with 3 etcd, 2 cp, 3 wkr nodes and with k8s v1.23.6
  3. Once Active, deploy a couple workloads; [wk1, wk2]
  4. Take etcd snapshot - snap_A
  5. Deploy a couple more workloads; [wk3, wk4]
  6. Take another etcd snapshot - snap_B
  7. Restore cluster to snap_A with etcd only option selected
  8. Verified - cluster goes into Updating state
  9. Verified - cluster comes up Active
  10. Verified - wk1 + wk2 are only workloads available
  11. Verified - all components in System project page are Active
  12. Verified - Etcd version 3.5.3

Screenshot:

Etcd version
1236-version


Test Case #3

With Docker on a single-node instance, and on rancher v2.6-7db4fe4e81939a5f6b420f303478a169cc35588a-head:

  1. Fresh install of rancher 2.6-head
  2. Provision a downstream RKE1 Custom cluster with 3 etcd, 2 cp, 3 wkr nodes and with k8s v1.21.12
  3. Once Active, deploy a couple workloads; [wk1, wk2]
  4. Take etcd snapshot - snap_A
  5. Upgrade to k8s v1.22.9
  6. Deploy a couple more workloads; [wk3, wk4]
  7. Take etcd snapshot - snap_B
  8. Restore cluster to snap_A with etcd, k8s, and config option selected
  9. Verified - cluster goes into Updating state
  10. Verified - cluster comes up Active
  11. Verified - cluster k8s version is 1.21.12
  12. Verified - all components in System project page are Active
  13. Verified - only wk1 + wk2 workloads are available in cluster
  14. Verified - Etcd version 3.4.16 with k8s v1.21.12 and when restored back to snap_B with k8s v1.22.9, etcd version is 3.5.3

Screenshots:

Etcd version - snap_A
scenario2-snap1

Etcd version - snap_B
Screen Shot 2022-04-26 at 8 35 05 PM

@vivek-shilimkar
Copy link
Member

Fresh install checks and RKE checks looks good.

Provisioning Checks on Fresh Install of Rancher 2.6.4 - PASS

  1. On rancher v2.6.4 KDM points to dev-v2.6 branch.
  2. Cluster creation succeeds for 1.23.6-rancher1-1, 1.22.9-rancher1-1, 1.21.12-rancher1-1 k8s versions with all network providers
  3. Network checks succeeded for all new clusters
  4. Ran validation tests for Canal cluster.
  5. Network checks and Canal checks pass.

RKE checks using 1.3.10-rc6 passed for v1.23.6-rancher1-1, 1.22.9-rancher1-1, 1.21.12-rancher1-1 k8s version.

Provisioning Checks after Upgrading to new KDM - PASS

  1. On latest released v2.6.4, KDM points to release-v2.6
  2. Cluster creation succeeds for old k8s versions with all network providers.
  3. Ran pre-upgrade checks
  4. Point KDM to dev-2.6 branch.
  5. Upgrade k8s to the latest patches 1.23.6-rancher1-1, 1.22.9-rancher1-1, 1.21.12-rancher1-1
  6. Ran post-upgrade checks
  7. Pre-Upgrade and Post-Upgrade checks pass.

@anupama2501
Copy link
Contributor

All checks look good. Closing the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/kdm feature/k8s-version team/hostbusters The team that is responsible for provisioning/managing downstream clusters + K8s version support team/infracloud
Projects
None yet
Development

No branches or pull requests

9 participants