Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

upgrade k3s from 1.24.x to 1.25.x #7846

Closed
andriiraiskyi opened this issue Jul 3, 2023 · 7 comments
Closed

upgrade k3s from 1.24.x to 1.25.x #7846

andriiraiskyi opened this issue Jul 3, 2023 · 7 comments

Comments

@andriiraiskyi
Copy link

Environmental Info:
K3s Version:
1.24.8(but also tested on other)

Node(s) CPU architecture, OS, and Version:
Ubuntu 22.04.2 LTS
AMD EPYC 7000
CPU: 2
Memory: 8
AWS m5a.large

Cluster Configuration:
3 etcd nodes
2-3 master nodes(usually 3, but tested 2 also)

Describe the bug:
During upgrade from 1.24.8 to 1.25.11 cluster is not stable and in degraded state.
Doing cordon and drain one by one master nodes to upgrade to newer version.

NAME                          STATUS                     ROLES                  AGE     VERSION
ip-10-8-16-11.ec2.internal    Ready                      etcd                   4h34m   v1.24.8+k3s1
ip-10-8-16-138.ec2.internal   Ready                      control-plane,master   3m44s   v1.25.11+k3s1
ip-10-8-16-188.ec2.internal   Ready                      <none>                 4h33m   v1.24.8+k3s1
ip-10-8-16-229.ec2.internal   Ready                      etcd                   4h33m   v1.24.8+k3s1
ip-10-8-16-244.ec2.internal   Ready,SchedulingDisabled   control-plane,master   4h34m   v1.24.8+k3s1
ip-10-8-16-29.ec2.internal    Ready                      control-plane,master   4h34m   v1.24.8+k3s1
ip-10-8-17-107.ec2.internal   Ready                      etcd                   4h33m   v1.24.8+k3s1
ip-10-8-17-18.ec2.internal    Ready                      <none>                 4h30m   v1.24.8+k3s1
ip-10-8-17-48.ec2.internal    Ready                      <none>                 4h30m   v1.24.8+k3s1
ip-10-8-17-73.ec2.internal    Ready                      <none>                 4h28m   v1.24.8+k3s1
NAME                          STATUS   ROLES                  AGE     VERSION
ip-10-8-16-11.ec2.internal    Ready    etcd                   6h28m   v1.24.8+k3s1
ip-10-8-16-138.ec2.internal   Ready    control-plane,master   117m    v1.25.11+k3s1
ip-10-8-16-188.ec2.internal   Ready    <none>                 6h26m   v1.24.8+k3s1
ip-10-8-16-229.ec2.internal   Ready    etcd                   6h26m   v1.24.8+k3s1
ip-10-8-16-29.ec2.internal    Ready    control-plane,master   6h28m   v1.24.8+k3s1
ip-10-8-17-107.ec2.internal   Ready    etcd                   6h26m   v1.24.8+k3s1
ip-10-8-17-18.ec2.internal    Ready    <none>                 6h23m   v1.24.8+k3s1
ip-10-8-17-48.ec2.internal    Ready    <none>                 6h23m   v1.24.8+k3s1
ip-10-8-17-73.ec2.internal    Ready    <none>                 6h21m   v1.24.8+k3s1

During last node replacement

NAME                          STATUS                        ROLES                  AGE     VERSION
ip-10-8-16-11.ec2.internal    Ready                         etcd                   6h55m   v1.24.8+k3s1
ip-10-8-16-138.ec2.internal   NotReady                      control-plane,master   144m    v1.25.11+k3s1
ip-10-8-16-188.ec2.internal   Ready                         <none>                 6h54m   v1.24.8+k3s1
ip-10-8-16-229.ec2.internal   NotReady                      etcd                   6h54m   v1.24.8+k3s1
ip-10-8-16-29.ec2.internal    NotReady,SchedulingDisabled   control-plane,master   6h55m   v1.24.8+k3s1
ip-10-8-16-60.ec2.internal    Ready                         control-plane,master   24m     v1.25.11+k3s1
ip-10-8-17-107.ec2.internal   Ready                         etcd                   6h54m   v1.24.8+k3s1
ip-10-8-17-18.ec2.internal    NotReady                      <none>                 6h51m   v1.24.8+k3s1
ip-10-8-17-48.ec2.internal    NotReady                      <none>                 6h51m   v1.24.8+k3s1
ip-10-8-17-73.ec2.internal    Ready                         <none>                 6h49m   v1.24.8+k3s1

Until old revision still exists in cluster

NAME                          STATUS   ROLES                  AGE     VERSION
ip-10-8-16-11.ec2.internal    Ready    etcd                   7h16m   v1.24.8+k3s1
ip-10-8-16-138.ec2.internal   Ready    control-plane,master   165m    v1.25.11+k3s1
ip-10-8-16-188.ec2.internal   Ready    <none>                 7h14m   v1.24.8+k3s1
ip-10-8-16-229.ec2.internal   Ready    etcd                   7h14m   v1.24.8+k3s1
ip-10-8-16-60.ec2.internal    Ready    control-plane,master   44m     v1.25.11+k3s1
ip-10-8-17-107.ec2.internal   Ready    etcd                   7h14m   v1.24.8+k3s1
ip-10-8-17-18.ec2.internal    Ready    <none>                 7h11m   v1.24.8+k3s1
ip-10-8-17-48.ec2.internal    Ready    <none>                 7h11m   v1.24.8+k3s1
ip-10-8-17-73.ec2.internal    Ready    <none>                 7h9m    v1.24.8+k3s1

Steps To Reproduce:
explained on the above

Expected behavior:
Successful replacement of the nodes.

Additional context / logs:
On the middle of the replacement logs almost same as described here
#7123

time="2023-06-26T12:41:04Z" level=fatal msg="starting kubernetes: preparing server: failed to get CA certs: Get \"https://cd.e2e-ref-4735-cluster.e2e-ref-4735-cluster.somedomain.info:6443/cacerts\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
time="2023-06-26T12:23:47Z" level=info msg="Updating TLS secret for kube-system/k3s-serving (count: 26): map[listener.cattle.io/cn-10.11.0.1:10.11.0.1 listener.cattle.io/cn-10.8.16.12:10.8.16.12 listener.cattle.io/cn-10.8.16.205:10.8.16.205 listener.cattle.io/cn-10.8.16.212:10.8.16.212 listener.cattle.io/cn-10.8.16.225:10.8.16.225 listener.cattle.io/cn-10.8.16.28:10.8.16.28 listener.cattle.io/cn-10.8.16.70:10.8.16.70 listener.cattle.io/cn-10.8.17.15:10.8.17.15 listener.cattle.io/cn-10.8.17.50:10.8.17.50 listener.cattle.io/cn-127.0.0.1:127.0.0.1 listener.cattle.io/cn-__1-f16284:::1 listener.cattle.io/cn-cd.e2e-ref-4735-cluster.e2e-ref-47-cd15b1:cd.e2e-ref-4735-cluster.e2e-ref-4735-cluster.somedomain.info listener.cattle.io/cn-cp.e2e-ref-4735-cluster.e2e-ref-47-2032de:cp.e2e-ref-4735-cluster.e2e-ref-4735-cluster.somedomain.info listener.cattle.io/cn-ip-10-8-16-12.ec2.internal:ip-10-8-16-12.ec2.internal listener.cattle.io/cn-ip-10-8-16-205.ec2.internal:ip-10-8-16-205.ec2.internal listener.cattle.io/cn-ip-10-8-16-212.ec2.internal:ip-10-8-16-212.ec2.internal listener.cattle.io/cn-ip-10-8-16-225.ec2.internal:ip-10-8-16-225.ec2.internal listener.cattle.io/cn-ip-10-8-16-28.ec2.internal:ip-10-8-16-28.ec2.internal listener.cattle.io/cn-ip-10-8-16-70.ec2.internal:ip-10-8-16-70.ec2.internal listener.cattle.io/cn-ip-10-8-17-15.ec2.internal:ip-10-8-17-15.ec2.internal listener.cattle.io/cn-ip-10-8-17-50.ec2.internal:ip-10-8-17-50.ec2.internal listener.cattle.io/cn-kubernetes:kubernetes listener.cattle.io/cn-kubernetes.default:kubernetes.default listener.cattle.io/cn-kubernetes.default.svc:kubernetes.default.svc listener.cattle.io/cn-kubernetes.default.svc.cluster.local:kubernetes.default.svc.cluster.local listener.cattle.io/cn-localhost:localhost listener.cattle.io/fingerprint:SHA1=E78E87A1CD6FF30BD8B4066608868B967DAE3CD7]"
@brandond
Copy link
Member

brandond commented Jul 3, 2023

What is cd.e2e-ref-4735-cluster.e2e-ref-4735-cluster.somedomain.info? I don't see this listed as any of your nodes, so I suspect it's an external load-balancer? Why isn't this node able to connect to it - is it perhaps not performing health-checks, and is attempting to send connections to the node that is currently being upgraded?

@andriiraiskyi
Copy link
Author

@brandond this the endpoint for kube-api - it's network loadbalancer from AWS. The problem is - that I see that it logs, but node already in cluster and connected. And that error is gone - once old nodes(1.24.x) removed from cluster.

@brandond
Copy link
Member

brandond commented Jul 5, 2023

Just because the node is already in the cluster, doesn't mean that it skips validating the cluster connection on startup.

Are you using health-checks on that load-balancer? I suspect that it's trying to send the request to a node that is currently down for maintenance or otherwise unavailable. The fact that the error goes away once the old nodes have been removed from the LB would seem to confirm this.

@andriiraiskyi
Copy link
Author

@brandond it's not related definitely to the load balancer. I created new cluster - removed node from loadbalancer, but leave it inside k8s cluster.
I have simple health check if port is open on node - nothing else.
I compared certificate on new and old nodes(request-headers*,server-ca*,client-ca*) they are identical.

logs on old node 1.24.8:

time="2023-07-06T15:54:04Z" level=info msg="Active TLS secret kube-system/k3s-serving (ver=254687) (count 24): map[listener.cattle.io/cn-10.11.0.1:10.11.0.1 listener.cattle.io/cn-10.8.16.118:10.8.16.118 listener.cattle.io/cn-10.8.16.217:10.8.16.217 listener.cattle.io/cn-10.8.16.240:10.8.16.240 listener.cattle.io/cn-10.8.16.57:10.8.16.57 listener.cattle.io/cn-10.8.16.74:10.8.16.74 listener.cattle.io/cn-10.8.17.83:10.8.17.83 listener.cattle.io/cn-10.8.17.98:10.8.17.98 listener.cattle.io/cn-127.0.0.1:127.0.0.1 listener.cattle.io/cn-__1-f16284:::1 listener.cattle.io/cn-cd.e2e-ref-4822-cluster.e2e-ref-48-ef1b68:cd.e2e-ref-4822-cluster.e2e-ref-4822-cluster.somedomain.com listener.cattle.io/cn-cp.e2e-ref-4822-cluster.e2e-ref-48-9ed8df:cp.e2e-ref-4822-cluster.e2e-ref-4822-cluster.somedomain.com listener.cattle.io/cn-ip-10-8-16-118.ec2.internal:ip-10-8-16-118.ec2.internal listener.cattle.io/cn-ip-10-8-16-217.ec2.internal:ip-10-8-16-217.ec2.internal listener.cattle.io/cn-ip-10-8-16-240.ec2.internal:ip-10-8-16-240.ec2.internal listener.cattle.io/cn-ip-10-8-16-57.ec2.internal:ip-10-8-16-57.ec2.internal listener.cattle.io/cn-ip-10-8-16-74.ec2.internal:ip-10-8-16-74.ec2.internal listener.cattle.io/cn-ip-10-8-17-83.ec2.internal:ip-10-8-17-83.ec2.internal listener.cattle.io/cn-ip-10-8-17-98.ec2.internal:ip-10-8-17-98.ec2.internal listener.cattle.io/cn-kubernetes:kubernetes listener.cattle.io/cn-kubernetes.default:kubernetes.default listener.cattle.io/cn-kubernetes.default.svc:kubernetes.default.svc listener.cattle.io/cn-kubernetes.default.svc.cluster.local:kubernetes.default.svc.cluster.local listener.cattle.io/cn-localhost:localhost listener.cattle.io/fingerprint:SHA1=B2A32D142F57E1187E6560C59064625CDF28F5C8]"
time="2023-07-06T15:54:05Z" level=info msg="Updating TLS secret for kube-system/k3s-serving (count: 24): map[listener.cattle.io/cn-10.11.0.1:10.11.0.1 listener.cattle.io/cn-10.8.16.118:10.8.16.118 listener.cattle.io/cn-10.8.16.217:10.8.16.217 listener.cattle.io/cn-10.8.16.240:10.8.16.240 listener.cattle.io/cn-10.8.16.57:10.8.16.57 listener.cattle.io/cn-10.8.16.74:10.8.16.74 listener.cattle.io/cn-10.8.17.83:10.8.17.83 listener.cattle.io/cn-10.8.17.98:10.8.17.98 listener.cattle.io/cn-127.0.0.1:127.0.0.1 listener.cattle.io/cn-__1-f16284:::1 listener.cattle.io/cn-cd.e2e-ref-4822-cluster.e2e-ref-48-ef1b68:cd.e2e-ref-4822-cluster.e2e-ref-4822-cluster.somedomain.com listener.cattle.io/cn-cp.e2e-ref-4822-cluster.e2e-ref-48-9ed8df:cp.e2e-ref-4822-cluster.e2e-ref-4822-cluster.somedomain.com listener.cattle.io/cn-ip-10-8-16-118.ec2.internal:ip-10-8-16-118.ec2.internal listener.cattle.io/cn-ip-10-8-16-217.ec2.internal:ip-10-8-16-217.ec2.internal listener.cattle.io/cn-ip-10-8-16-240.ec2.internal:ip-10-8-16-240.ec2.internal listener.cattle.io/cn-ip-10-8-16-57.ec2.internal:ip-10-8-16-57.ec2.internal listener.cattle.io/cn-ip-10-8-16-74.ec2.internal:ip-10-8-16-74.ec2.internal listener.cattle.io/cn-ip-10-8-17-83.ec2.internal:ip-10-8-17-83.ec2.internal listener.cattle.io/cn-ip-10-8-17-98.ec2.internal:ip-10-8-17-98.ec2.internal listener.cattle.io/cn-kubernetes:kubernetes listener.cattle.io/cn-kubernetes.default:kubernetes.default listener.cattle.io/cn-kubernetes.default.svc:kubernetes.default.svc listener.cattle.io/cn-kubernetes.default.svc.cluster.local:kubernetes.default.svc.cluster.local listener.cattle.io/cn-localhost:localhost listener.cattle.io/fingerprint:SHA1=B2A32D142F57E1187E6560C59064625CDF28F5C8]"

log from new node 1.25.11:

time="2023-07-06T15:57:40Z" level=error msg="Failed to save TLS secret for kube-system/k3s-serving: Operation cannot be fulfilled on secrets \"k3s-serving\": the object has been modified; please apply your changes to the latest version and try again"
I0706 15:57:41.563380     853 request.go:690] Waited for 2.795809527s due to client-side throttling, not priority and fairness, request: PUT:https://127.0.0.1:6444/api/v1/namespaces/kube-system/secrets/k3s-serving
time="2023-07-06T15:57:42Z" level=info msg="Active TLS secret kube-system/k3s-serving (ver=260788) (count 24): map[listener.cattle.io/cn-10.11.0.1:10.11.0.1 listener.cattle.io/cn-10.8.16.118:10.8.16.118 listener.cattle.io/cn-10.8.16.217:10.8.16.217 listener.cattle.io/cn-10.8.16.240:10.8.16.240 listener.cattle.io/cn-10.8.16.57:10.8.16.57 listener.cattle.io/cn-10.8.16.74:10.8.16.74 listener.cattle.io/cn-10.8.17.83:10.8.17.83 listener.cattle.io/cn-10.8.17.98:10.8.17.98 listener.cattle.io/cn-127.0.0.1:127.0.0.1 listener.cattle.io/cn-__1-f16284:::1 listener.cattle.io/cn-cd.e2e-ref-4822-cluster.e2e-ref-48-ef1b68:cd.e2e-ref-4822-cluster.e2e-ref-4822-cluster.somedomain.com listener.cattle.io/cn-cp.e2e-ref-4822-cluster.e2e-ref-48-9ed8df:cp.e2e-ref-4822-cluster.e2e-ref-4822-cluster.somedomain.com listener.cattle.io/cn-ip-10-8-16-118.ec2.internal:ip-10-8-16-118.ec2.internal listener.cattle.io/cn-ip-10-8-16-217.ec2.internal:ip-10-8-16-217.ec2.internal listener.cattle.io/cn-ip-10-8-16-240.ec2.internal:ip-10-8-16-240.ec2.internal listener.cattle.io/cn-ip-10-8-16-57.ec2.internal:ip-10-8-16-57.ec2.internal listener.cattle.io/cn-ip-10-8-16-74.ec2.internal:ip-10-8-16-74.ec2.internal listener.cattle.io/cn-ip-10-8-17-83.ec2.internal:ip-10-8-17-83.ec2.internal listener.cattle.io/cn-ip-10-8-17-98.ec2.internal:ip-10-8-17-98.ec2.internal listener.cattle.io/cn-kubernetes:kubernetes listener.cattle.io/cn-kubernetes.default:kubernetes.default listener.cattle.io/cn-kubernetes.default.svc:kubernetes.default.svc listener.cattle.io/cn-kubernetes.default.svc.cluster.local:kubernetes.default.svc.cluster.local listener.cattle.io/cn-localhost:localhost listener.cattle.io/fingerprint:SHA1=B2A32D142F57E1187E6560C59064625CDF28F5C8]"

node list list from kubectl

NAME                          STATUS   ROLES                  AGE    VERSION         INTERNAL-IP   EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION    CONTAINER-RUNTIME
ip-10-8-16-217.ec2.internal   Ready    etcd                   173m   v1.24.8+k3s1    10.8.16.217   <none>        Ubuntu 22.04.2 LTS   5.19.0-1024-aws   containerd://1.6.8-k3s1
ip-10-8-16-240.ec2.internal   Ready    control-plane,master   174m   v1.24.8+k3s1    10.8.16.240   <none>        Ubuntu 22.04.2 LTS   5.19.0-1024-aws   containerd://1.6.8-k3s1
ip-10-8-16-57.ec2.internal    Ready    etcd                   174m   v1.24.8+k3s1    10.8.16.57    <none>        Ubuntu 22.04.2 LTS   5.19.0-1024-aws   containerd://1.6.8-k3s1
ip-10-8-16-74.ec2.internal    Ready    control-plane,master   58m    v1.25.11+k3s1   10.8.16.74    <none>        Ubuntu 22.04.2 LTS   5.19.0-1024-aws   containerd://1.7.1-k3s1
ip-10-8-17-12.ec2.internal    Ready    <none>                 170m   v1.24.8+k3s1    10.8.17.12    <none>        Ubuntu 22.04.2 LTS   5.19.0-1024-aws   containerd://1.6.8-k3s1
ip-10-8-17-23.ec2.internal    Ready    <none>                 168m   v1.24.8+k3s1    10.8.17.23    <none>        Ubuntu 22.04.2 LTS   5.19.0-1024-aws   containerd://1.6.8-k3s1
ip-10-8-17-39.ec2.internal    Ready    <none>                 173m   v1.24.8+k3s1    10.8.17.39    <none>        Ubuntu 22.04.2 LTS   5.19.0-1024-aws   containerd://1.6.8-k3s1
ip-10-8-17-52.ec2.internal    Ready    <none>                 170m   v1.24.8+k3s1    10.8.17.52    <none>        Ubuntu 22.04.2 LTS   5.19.0-1024-aws   containerd://1.6.8-k3s1
ip-10-8-17-83.ec2.internal    Ready    etcd                   173m   v1.24.8+k3s1    10.8.17.83    <none>        Ubuntu 22.04.2 LTS   5.19.0-1024-aws   containerd://1.6.8-k3s1
ip-10-8-17-98.ec2.internal    Ready    control-plane,master   174m   v1.24.8+k3s1    10.8.17.98    <none>        Ubuntu 22.04.2 LTS   5.19.0-1024-aws   containerd://1.6.8-k3s1

also secret k3s-serving contains old node in a list (which is 10.8.16.118):

{
  "kind": "Secret",
  "apiVersion": "v1",
  "metadata": {
    "name": "k3s-serving",
    "namespace": "kube-system",
    "uid": "a763bfe1-1eff-4295-9375-e13c6faf038c",
    "resourceVersion": "271174",
    "creationTimestamp": "2023-07-06T13:04:32Z",
    "labels": {
      "k8slens-edit-resource-version": "v1"
    },
    "annotations": {
      "listener.cattle.io/cn-10.11.0.1": "10.11.0.1",
      "listener.cattle.io/cn-10.8.16.118": "10.8.16.118",
      "listener.cattle.io/cn-10.8.16.217": "10.8.16.217",
      "listener.cattle.io/cn-10.8.16.240": "10.8.16.240",
      "listener.cattle.io/cn-10.8.16.57": "10.8.16.57",
      "listener.cattle.io/cn-10.8.16.74": "10.8.16.74",
      "listener.cattle.io/cn-10.8.17.83": "10.8.17.83",
      "listener.cattle.io/cn-10.8.17.98": "10.8.17.98",
      "listener.cattle.io/cn-127.0.0.1": "127.0.0.1",
      "listener.cattle.io/cn-__1-f16284": "::1",
      "listener.cattle.io/cn-cd.e2e-ref-4822-cluster.e2e-ref-48-ef1b68": "cd.e2e-ref-4822-cluster.e2e-ref-4822-cluster.somedomain.com",
      "listener.cattle.io/cn-cp.e2e-ref-4822-cluster.e2e-ref-48-9ed8df": "cp.e2e-ref-4822-cluster.e2e-ref-4822-cluster.somedomain.com",
      "listener.cattle.io/cn-ip-10-8-16-118.ec2.internal": "ip-10-8-16-118.ec2.internal",
      "listener.cattle.io/cn-ip-10-8-16-217.ec2.internal": "ip-10-8-16-217.ec2.internal",
      "listener.cattle.io/cn-ip-10-8-16-240.ec2.internal": "ip-10-8-16-240.ec2.internal",
      "listener.cattle.io/cn-ip-10-8-16-57.ec2.internal": "ip-10-8-16-57.ec2.internal",
      "listener.cattle.io/cn-ip-10-8-16-74.ec2.internal": "ip-10-8-16-74.ec2.internal",
      "listener.cattle.io/cn-ip-10-8-17-83.ec2.internal": "ip-10-8-17-83.ec2.internal",
      "listener.cattle.io/cn-ip-10-8-17-98.ec2.internal": "ip-10-8-17-98.ec2.internal",
      "listener.cattle.io/cn-kubernetes": "kubernetes",
      "listener.cattle.io/cn-kubernetes.default": "kubernetes.default",
      "listener.cattle.io/cn-kubernetes.default.svc": "kubernetes.default.svc",
      "listener.cattle.io/cn-kubernetes.default.svc.cluster.local": "kubernetes.default.svc.cluster.local",
      "listener.cattle.io/cn-localhost": "localhost",
      "listener.cattle.io/fingerprint": "SHA1=B2A32D142F57E1187E6560C59064625CDF28F5C8"
    },
    "managedFields": [
      {
        "manager": "deploy@ip-10-8-16-240.ec2.internal",
        "operation": "Update",
        "apiVersion": "v1",
        "time": "2023-07-06T13:04:32Z",
        "fieldsType": "FieldsV1",
        "fieldsV1": {
          "f:data": {},
          "f:metadata": {
            "f:annotations": {
              ".": {},
              "f:listener.cattle.io/cn-10.11.0.1": {},
              "f:listener.cattle.io/cn-10.8.16.240": {},
              "f:listener.cattle.io/cn-127.0.0.1": {},
              "f:listener.cattle.io/cn-__1-f16284": {},
              "f:listener.cattle.io/cn-ip-10-8-16-240.ec2.internal": {},
              "f:listener.cattle.io/cn-kubernetes": {},
              "f:listener.cattle.io/cn-kubernetes.default": {},
              "f:listener.cattle.io/cn-kubernetes.default.svc": {},
              "f:listener.cattle.io/cn-kubernetes.default.svc.cluster.local": {},
              "f:listener.cattle.io/cn-localhost": {}
            }
          },
          "f:type": {}
        }
      },
      {
        "manager": "deploy@ip-10-8-16-57.ec2.internal",
        "operation": "Update",
        "apiVersion": "v1",
        "time": "2023-07-06T13:04:42Z",
        "fieldsType": "FieldsV1",
        "fieldsV1": {
          "f:metadata": {
            "f:annotations": {
              "f:listener.cattle.io/cn-10.8.16.57": {},
              "f:listener.cattle.io/cn-cd.e2e-ref-4822-cluster.e2e-ref-48-ef1b68": {},
              "f:listener.cattle.io/cn-ip-10-8-16-57.ec2.internal": {}
            }
          }
        }
      },
      {
        "manager": "deploy@ip-10-8-17-98.ec2.internal",
        "operation": "Update",
        "apiVersion": "v1",
        "time": "2023-07-06T13:05:02Z",
        "fieldsType": "FieldsV1",
        "fieldsV1": {
          "f:metadata": {
            "f:annotations": {
              "f:listener.cattle.io/cn-10.8.17.98": {},
              "f:listener.cattle.io/cn-cp.e2e-ref-4822-cluster.e2e-ref-48-9ed8df": {},
              "f:listener.cattle.io/cn-ip-10-8-17-98.ec2.internal": {}
            }
          }
        }
      },
      {
        "manager": "deploy@ip-10-8-17-83.ec2.internal",
        "operation": "Update",
        "apiVersion": "v1",
        "time": "2023-07-06T13:05:33Z",
        "fieldsType": "FieldsV1",
        "fieldsV1": {
          "f:metadata": {
            "f:annotations": {
              "f:listener.cattle.io/cn-10.8.17.83": {},
              "f:listener.cattle.io/cn-ip-10-8-17-83.ec2.internal": {}
            }
          }
        }
      },
      {
        "manager": "deploy@ip-10-8-16-217.ec2.internal",
        "operation": "Update",
        "apiVersion": "v1",
        "time": "2023-07-06T13:05:45Z",
        "fieldsType": "FieldsV1",
        "fieldsV1": {
          "f:metadata": {
            "f:annotations": {
              "f:listener.cattle.io/cn-10.8.16.217": {},
              "f:listener.cattle.io/cn-ip-10-8-16-217.ec2.internal": {}
            }
          }
        }
      },
      {
        "manager": "node-fetch",
        "operation": "Update",
        "apiVersion": "v1",
        "time": "2023-07-06T15:30:55Z",
        "fieldsType": "FieldsV1",
        "fieldsV1": {
          "f:metadata": {
            "f:labels": {
              ".": {},
              "f:k8slens-edit-resource-version": {}
            }
          }
        }
      },
      {
        "manager": "k3s-supervisor@ip-10-8-16-74.ec2.internal",
        "operation": "Update",
        "apiVersion": "v1",
        "time": "2023-07-06T15:30:58Z",
        "fieldsType": "FieldsV1",
        "fieldsV1": {
          "f:data": {
            "f:tls.crt": {},
            "f:tls.key": {}
          },
          "f:metadata": {
            "f:annotations": {
              "f:listener.cattle.io/cn-10.8.16.118": {},
              "f:listener.cattle.io/cn-10.8.16.74": {},
              "f:listener.cattle.io/cn-ip-10-8-16-118.ec2.internal": {},
              "f:listener.cattle.io/cn-ip-10-8-16-74.ec2.internal": {},
              "f:listener.cattle.io/fingerprint": {}
            }
          }
        }
      }
    ]
  },
  "data": {
    "tls.crt": <removed for ticket>,
    "tls.key": <removed for ticket>
  },
  "type": "kubernetes.io/tls"
}

I tried upgrade etcd nodes first and than masters - nothing changed,
but found there another issue - once leader removed from etcd cluster - kubectl is not available until that node still running, once k3s stopped - everthing is fine, but I think it's another issue.

# kubectl get nodes
Error from server: etcdserver: server stopped

@brandond
Copy link
Member

brandond commented Jul 7, 2023

secret k3s-serving contains old node in a list

Yes, there is not currently any logic to prune CNs from the certificate

once leader removed from etcd cluster - kubectl is not available until that node still running

It looks like you're trying to run kubectl on a node that has been removed from the cluster? If so, this is expected behavior. If not, I'd be curious to see logs from the remaining etcd nodes. As long as you still have quorum, etcd leadership should not matter - leadership will be transferred to another node within milliseconds.

@andriiraiskyi
Copy link
Author

@brandond no, I'm trying to run from any master node, status of etcd looks fine from etcdctl

@brandond
Copy link
Member

brandond commented Jul 13, 2023

no, I'm trying to run from any master node, status of etcd looks fine from etcdctl
Error from server: etcdserver: server stopped

What you're reporting doesn't make any sense. The only reason that you should ever get server stopped from etcd is when it is deleted from the cluster, and is waiting for the service to be shut down cleanly. This is a terminal error as far as the apiserver is concerned; it should immediately either reconnect to etcd, or exit and cause the entire k3s process to restart (which would also cause it to reconnect to etcd).

Can you attach the complete k3s service logs from the etcd nodes in question?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

No branches or pull requests

3 participants