Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Have k3s-upgrade bail out when user tries to downgrade a cluster #7537

Closed
cwayne18 opened this issue May 11, 2023 · 7 comments
Closed

Have k3s-upgrade bail out when user tries to downgrade a cluster #7537

cwayne18 opened this issue May 11, 2023 · 7 comments
Assignees
Labels
waiting-for-RC Issue is available to test only after we have an RC
Milestone

Comments

@cwayne18
Copy link
Collaborator

Currently, when a user tries to write a SUC plan to downgrade their k8s cluster, k3s-upgrade will go ahead and try to follow the plan and then fail. Rather than allowing downgrades (which we don't want to do), we should simply bail out from an attempted downgrade and alert the user that a downgrade is not supported.

ref: SURE-5230

@brandond brandond self-assigned this May 11, 2023
@brandond brandond added this to the v1.27.3+k3s1 milestone May 11, 2023
@brandond
Copy link
Contributor

brandond commented May 11, 2023

We should probably also refuse to "upgrade" when:

  • Performing an upgrade that is actually to an older release;from 1.25.10 to 1.26.0 for example. It’s a greater version according to strict version comparison rules, but will have older versions of packaged components, and we don’t support that. Upgrades should always be to a newer release.
  • Performing an upgrade that skips minors, like from 1.25.x to 1.27.x. You need to go to 1.26.x first.

@cwayne18
Copy link
Collaborator Author

@osodracnai if you're up for it I think this would be a good way to get some visibility into SUC/upgrades, want to pair with @brandond on it? cc @dereknola

@brandond brandond removed their assignment May 12, 2023
@osodracnai osodracnai self-assigned this May 15, 2023
@rancher-max rancher-max added the waiting-for-RC Issue is available to test only after we have an RC label Jul 12, 2023
@est-suse
Copy link
Contributor

est-suse commented Jul 26, 2023

Just updating for visibility - dev is aware of this current state

I am able to downgrade the cluster from 1.27.4-rc3+k3s1 to v1.26.7-rc3+k3s1

No error message is displayed and the upgrade finished without any issue

NAME             STATUS   ROLES                       AGE     VERSION
ip-172-31-0-95   Ready    control-plane,etcd,master   3m1s    v1.27.4-rc3+k3s1
ip-172-31-2-93   Ready    <none>                      2m19s   v1.27.4-rc3+k3s1
ip-172-31-5-40   Ready    control-plane,etcd,master   4m51s   v1.27.4-rc3+k3s1
ip-172-31-7-59   Ready    control-plane,etcd,master   3m17s   v1.27.4-rc3+k3s1

After applying the plan:


NAME             STATUS   ROLES                       AGE     VERSION
ip-172-31-0-95   Ready    control-plane,etcd,master   6m31s   v1.26.7-rc3+k3s1
ip-172-31-2-93   Ready    <none>                      5m49s   v1.26.7-rc3+k3s1
ip-172-31-5-40   Ready    control-plane,etcd,master   8m21s   v1.26.7-rc3+k3s1
ip-172-31-7-59   Ready    control-plane,etcd,master   6m47s   v1.26.7-rc3+k3s1
Defaulted container "upgrade" out of: upgrade, cordon (init)
+ upgrade
+ get_k3s_process_info
+ awk '{print $2}'
+ grep -E -v '(init|grep|channelserver|supervise-daemon)'
+ + psgrep -ef -E
 '( |/)k3s .*(server|agent)'
+ K3S_PID=5748
+ echo+ wc -w
 5748
+ '[' 1 '!=' 1 ]
+ '[' -z 5748 ]
+ echo+ wc -w
 5748
+ '[' 1 '!=' 1 ]
+ ps+  -p 5748 -o 'ppid='awk '{print $1}'

+ K3S_PPID=1
+ info 'K3S binary is running with pid 5748, parent pid 1'
+ echo '[INFO] ' 'K3S binary is running with pid 5748, parent pid 1'
[INFO]  K3S binary is running with pid 5748, parent pid 1
+ '[' 1 '!=' 1 ]
+ '[' 5748 '=' 1 ]
+ awk 'NR==1 {print $1}' /host/proc/5748/cmdline
+ K3S_BIN_PATH=/usr/local/bin/k3s
+ '[' -z /usr/local/bin/k3s ]
+ '[' '!' -e /host/usr/local/bin/k3s ]
+ return
+ replace_binary
+ NEW_BINARY=/opt/k3s
+ FULL_BIN_PATH=/host/usr/local/bin/k3s
+ '[' '!' -f /opt/k3s ]
[INFO]  Comparing old and new binaries
+ info 'Comparing old and new binaries'
+ echo '[INFO] ' 'Comparing old and new binaries'
+ sha256sum /opt/k3s /host/usr/local/bin/k3s
+ BIN_CHECKSUMS='e9a240d72181bbeccffe941055f63d7a1d7a1970f66954c45c1545442bd009ee  /opt/k3s
e9a240d72181bbeccffe941055f63d7a1d7a1970f66954c45c1545442bd009ee  /host/usr/local/bin/k3s'
+ '[' 0 '!=' 0 ]
+ uniq
+ awk '{print $1}'
+ echo 'e9a240d72181bbeccffe941055f63d7a1d7a1970f66954c45c1545442bd009ee  /opt/k3s
e9a240d72181bbeccffe941055f63d7a1d7a1970f66954c45c1545442bd009ee  /host/usr/local/bin/k3s'
+ wc -l
[INFO]  Binary already been replaced
+ BIN_COUNT=1
+ '[' 1 '=' 1 ]
+ info 'Binary already been replaced'
+ echo '[INFO] ' 'Binary already been replaced'
+ exit 0

Steps to validate:

Create 3 servers - 1 agent cluster

Apply: 

kubectl apply -f [https://github.com/rancher/system-upgrade-controller/releases/download/v0.13.0/system-upgrade-controller.yaml](https://github.com/rancher/system-upgrade-controller/releases/download/v0.10.0-dev.3/system-upgrade-controller.yaml)

Apply the plan:






apiVersion: upgrade.cattle.io/v1
kind: Plan
metadata:
  name: k3s-server
  namespace: system-upgrade
  labels:
    k3s-upgrade: server
spec:
  concurrency: 3
  version: v1.26.7-rc3+k3s1
  nodeSelector:
    matchExpressions:
      - {key: node-role.kubernetes.io/master, operator: In, values: ["true"]}
  serviceAccountName: system-upgrade
  cordon: true
  #drain:
  #  force: true
  upgrade:
    image: rancher/k3s-upgrade
---
apiVersion: upgrade.cattle.io/v1
kind: Plan
metadata:
  name: k3s-agent
  namespace: system-upgrade
  labels:
    k3s-upgrade: agent
spec:
  concurrency: 1
  version: v1.26.7-rc3+k3s1
  nodeSelector:
    matchExpressions:
      - {key: node-role.kubernetes.io/master, operator: NotIn, values: ["true"]}
  serviceAccountName: system-upgrade
  prepare:
    image: rancher/k3s-upgrade
    args: ["prepare", "k3s-server"]
  drain:
    force: true
  upgrade:
    image: rancher/k3s-upgrade

@est-suse
Copy link
Contributor

est-suse commented Jul 27, 2023

it can be tested after the release

@est-suse
Copy link
Contributor

est-suse commented Aug 4, 2023

Upgrading from 1.27.4 to 1.26.7

NAME               STATUS                     ROLES                       AGE   VERSION
ip-172-31-0-16     Ready,SchedulingDisabled   control-plane,etcd,master   19h   v1.27.4+k3s1
ip-172-31-10-23    Ready,SchedulingDisabled   control-plane,etcd,master   19h   v1.27.4+k3s1
ip-172-31-13-213   Ready,SchedulingDisabled   control-plane,etcd,master   19h   v1.27.4+k3s1
ip-172-31-2-13     Ready                      <none>                      19h   v1.27.4+k3s1
ubuntu@ip-172-31-13-213:~$ kubectl get pods -A
NAMESPACE        NAME                                                              READY   STATUS    RESTARTS   AGE
kube-system      coredns-77ccd57875-9ng74                                          1/1     Running   0          19h
kube-system      local-path-provisioner-957fdf8bc-9vwzn                            1/1     Running   0          19h
kube-system      metrics-server-648b5df564-wzbnh                                   1/1     Running   0          19h
kube-system      svclb-traefik-0bda8e84-hbjq8                                      2/2     Running   0          19h
kube-system      svclb-traefik-0bda8e84-jg94l                                      2/2     Running   0          19h
kube-system      svclb-traefik-0bda8e84-qkcs7                                      2/2     Running   0          19h
kube-system      svclb-traefik-0bda8e84-tfhjq                                      2/2     Running   0          19h
kube-system      traefik-64f55bb67d-4mm6s                                          1/1     Running   0          19h
system-upgrade   apply-k3s-server-on-ip-172-31-0-16-with-7af95590a5af8e8c3-2cdc6   0/1     Error     0          9m25s
system-upgrade   apply-k3s-server-on-ip-172-31-0-16-with-7af95590a5af8e8c3-gvm7n   0/1     Error     0          14m
system-upgrade   apply-k3s-server-on-ip-172-31-0-16-with-7af95590a5af8e8c3-vz6hs   0/1     Error     0          17m
system-upgrade   apply-k3s-server-on-ip-172-31-10-23-with-7af95590a5af8e8c-9xvwg   0/1     Error     0          14m
system-upgrade   apply-k3s-server-on-ip-172-31-10-23-with-7af95590a5af8e8c-h2hbt   0/1     Error     0          9m27s
system-upgrade   apply-k3s-server-on-ip-172-31-10-23-with-7af95590a5af8e8c-x9nmf   0/1     Error     0          17m
system-upgrade   apply-k3s-server-on-ip-172-31-13-213-with-7af95590a5af8e8-8j72v   0/1     Error     0          18m
system-upgrade   apply-k3s-server-on-ip-172-31-13-213-with-7af95590a5af8e8-crzj7   0/1     Error     0          17m
system-upgrade   apply-k3s-server-on-ip-172-31-13-213-with-7af95590a5af8e8-kctnj   0/1     Error     0          14m
system-upgrade   apply-k3s-server-on-ip-172-31-13-213-with-7af95590a5af8e8-mk4xr   0/1     Error     0          9m22s
system-upgrade   system-upgrade-controller-7c4b84d5d9-kkzr6                        1/1     Running   0          20m

@brandond
Copy link
Contributor

brandond commented Aug 4, 2023

I believe this is working as intended. If you configure the Plan to cordon the nodes, they won't uncordon until the plan is completed successfully. Since the downgrade is rejected, the plan will never succeed, and the nodes will remain cordoned.

The alternative would be for the version checks to silently fail, and indicate that the upgrade was successful - essentially making the downgrade a no-op, instead of failure. To me this is less good, as the administrator may not notice the fact that the plan didn't succeed until they think to check the versions, and note that it did not in fact switch to the version they asked for.

@rancher-max
Copy link
Contributor

Closing as we have decided this is working per the design and have opened docs issue to address common pitfalls.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
waiting-for-RC Issue is available to test only after we have an RC
Projects
Archived in project
Development

No branches or pull requests

6 participants