Update from version 1.24.10 to 1.25.7 results in Failed to save TLS secret for kube-system/k3s-serving #7123

dcarrion87 · 2023-03-20T22:53:51Z

Environmental Info:
K3s Version: 1.24.10 to 1.25.7

Node(s) CPU architecture, OS, and Version:

Linux ip-10-X-X-X 5.15.0-1026-aws #30-Ubuntu SMP Wed Nov 23 17:01:09 UTC 2022 aarch64 aarch64 aarch64 GNU/Linux

Cluster Configuration:

2 servers, 1 agent

Describe the bug:

We're testing upgrade between 1.24.10 and 1.25.7 on the servers. After upgrade these errors keep throwing repeatedly in logs:

Mar 20 22:46:23 ip-10-X-X-X k3s[3915]: time="2023-03-20T22:46:23Z" level=error msg="Failed to save TLS secret for kube-system/k3s-serving: Operation cannot be fulfilled on secrets \"k3s-serving\": the object has been modified; please apply your changes to the latest version and >
Mar 20 22:46:22 ip-10-X-X-X k3s[3915]: time="2023-03-20T22:46:22Z" level=info msg="Updating TLS secret for kube-system/k3s-serving (count: 24): map[listener.cattle.io/cn-10.X.X.X:10.X.X.X listener.cattle.io/cn-10.X.X.X:10.X.X.X listener.cattle.io/cn-10.X.X.X:10.X.X.X>
Mar 20 22:46:22 ip-10-X-X-X k3s[3915]: time="2023-03-20T22:46:22Z" level=info msg="Updating TLS secret for kube-system/k3s-serving (count: 24): map[listener.cattle.io/cn-10.X.X.X:10.X.X.X listener.cattle.io/cn-10.X.X.X:10.X.X.X listener.cattle.io/cn-10.X.X.X:10.X.X.X>
Mar 20 22:46:21 ip-10-X-X-X k3s[3915]: time="2023-03-20T22:46:21Z" level=info msg="Updating TLS secret for kube-system/k3s-serving (count: 24): map[listener.cattle.io/cn-10.X.X.X:10.X.X.X listener.cattle.io/cn-10.X.X.X:10.X.X.X listener.cattle.io/cn-10.X.X.X:10.X.X.X>
Mar 20 22:46:21 ip-10-X-X-X k3s[3915]: time="2023-03-20T22:46:21Z" level=info msg="Updating TLS secret for kube-system/k3s-serving (count: 24): map[listener.cattle.io/cn-10.X.X.X:10.X.X.X listener.cattle.io/cn-10.X.X.X:10.X.X.X listener.cattle.io/cn-10.X.X.X:10.X.X.X>
Mar 20 22:46:21 ip-10-X-X-X k3s[3915]: time="2023-03-20T22:46:21Z" level=info msg="Updating TLS secret for kube-system/k3s-serving (count: 24): map[listener.cattle.io/cn-10.X.X.X:10.X.X.X listener.cattle.io/cn-10.X.X.X:10.X.X.X listener.cattle.io/cn-10.X.X.X:10.X.X.X>

Steps To Reproduce:

Updated k3s binaries on server 1
Updated k3s binaries on server 2
Checked journalctl logs

The text was updated successfully, but these errors were encountered:

brandond · 2023-03-20T22:58:17Z

We're testing upgrade between 1.24.10 and 1.25.10 on the servers

Can you confirm the versions that you are upgrading from and to? There is no v1.25.10 yet; the latest GA 1.25 release available is v1.25.7+k3s1 - which is the version you mentioned elsewhere.

Mar 20 22:46:21 ip-10-X-X-X k3s[3915]: time="2023-03-20T22:46:21Z" level=info msg="Updating TLS secret for kube-system/k3s-serving (count: 24): map[listener.cattle.io/cn-10.X.X.X:10.X.X.X listener.cattle.io/cn-10.X.X.X:10.X.X.X listener.cattle.io/cn-10.X.X.X:10.X.X.X>

The logs are truncated at the end of the line; can you attach the complete logs from journald without any terminal-width truncation?

2 servers, 1 agent

What are you using for the datastore on your two server nodes?

Have you upgraded both servers, or just one?

dcarrion87 · 2023-03-20T23:06:54Z

1.24.10 to 1.25.7
Logs re-attached:

Mar 20 23:00:51 ip-X-X-4-252 k3s[3915]: time="2023-03-20T23:00:51Z" level=error msg="Failed to save TLS secret for kube-system/k3s-serving: Operation cannot be fulfilled on secrets \"k3s-serving\": the object has been modified; please apply your changes to the latest version and try again"
Mar 20 23:00:51 ip-X-X-4-252 k3s[3915]: time="2023-03-20T23:00:51Z" level=info msg="Updating TLS secret for kube-system/k3s-serving (count: 24): map[listener.cattle.io/cn-X.X.4.221:X.X.4.221 listener.cattle.io/cn-X.X.4.252:X.X.4.252 listener.cattle.io/cn-X.X.5.109:X.X.5.109listener.cattle.io/cn-X.X.5.194:X.X.5.194 listener.cattle.io/cn-X.X.5.69:X.X.5.69 listener.cattle.io/cn-X.X.5.84:X.X.5.84 listener.cattle.io/cn-X.X.6.32:X.X.6.32 listener.cattle.io/cn-127.0.0.1:127.0.0.1 listener.cattle.io/cn-172.17.0.1:172.17.0.1 listener.cattle.io/cn-__1-f16284:::1 listener.cattle.io/cn-host.docker.internal:host.docker.internal listener.cattle.io/cn-ip-X-X-4-221:ip-X-X-4-221 listener.cattle.io/cn-ip-X-X-4-252:ip-X-X-4-252 listener.cattle.io/cn-ip-X-X-5-109:ip-X-X-5-109 listener.cattle.io/cn-ip-X-X-5-194:ip-X-X-5-194 listener.cattle.io/cn-ip-X-X-5-69:ip-X-X-5-69 listener.cattle.io/cn-ip-X-X-5-84:ip-X-X-5-84 listener.cattle.io/cn-ip-X-X-6-32:ip-X-X-6-32 listener.cattle.io/cn-kubernetes:kubernetes listener.cattle.io/cn-kubernetes.default:kubernetes.default listener.cattle.io/cn-kubernetes.default.svc:kubernetes.default.svc listener.cattle.io/cn-kubernetes.default.svc.cluster.local:kubernetes.default.svc.cluster.local listener.cattle.io/cn-localhost:localhost listener.cattle.io/cn-REDACTED-57fe667ded5ca-319b95:REDACTED-57fe667ded5caa94.elb.ap-southeast-2.amazonaws.com listener.cattle.io/fingerprint:SHA1=A4F0DBA2A022067940113F2F8B2174F53F70BE63]"

Datastore is Postgres
Both servers have been upgraded
I have just tried restarting both again now and it's still throwing that error endlessly (seconds apart).

brandond · 2023-03-20T23:16:49Z

Can you confirm that the cli flags on both servers (tls-san values in particular) are in sync?

Have you tried stopping one of the servers for a period of time so that the other can start up and successfully update the secret?

Due to the fact that you've posted just a single redacted log line from a single server I can't really tell what it's trying to change on the cert. Can you compare the logged certificate annotations between the two servers to see what they are alternating trying to set?

dcarrion87 · 2023-03-20T23:25:35Z

Is that something we should be accomodating when upgrading the servers?

The upgrade was done with only one server with that version for 5 minutes. I thought it was supported to have all server running during upgrade?

Appreciate it's difficult with redacted values. Can confirm tls san values are in sync. Config below.

#  -----  Ansible Managed  -----  #

datastore-endpoint: postgres://REDACTED
token: REDACTED
tls-san: REDACTED.elb.ap-southeast-2.amazonaws.com
agent-token: REDACTED
etcd-disable-snapshots: true
cluster-cidr: 172.16.0.0/16
service-cidr: 172.17.0.0/16
flannel-backend: none
disable-network-policy: true
write-kubeconfig: /root/.kube/config
disable: traefik
node-taint: CriticalAddonsOnly=true:NoExecute
kube-scheduler-arg:
- "config=/etc/rancher/k3s/scheduler-config.yaml"
kube-apiserver-arg:
- "enable-admission-plugins=AlwaysPullImages"
- "service-account-jwks-uri=https://REDACTED.s3.amazonaws.com/REDACTED-cluster/openid/v1/jwks"
- "service-account-issuer=https://REDACTED.s3.amazonaws.com/REDACTED-cluster"

dcarrion87 · 2023-03-21T00:01:16Z

@brandond could you point me at any docos that talks about what k3s-serving secret and what it's actually wanting to do?

dcarrion87 · 2023-03-21T00:13:12Z

How odd, it's now stopped... Restarted k3s a few more times and it's fine...

brandond · 2023-03-21T00:24:15Z

There aren't any docs about this specific implementation detail, but that secret is used to store the dynamically-generated server certificate for the apiserver/supervisor listener on port 6443. The certificate is updated with SAN entries for any requested names and addresses, as well as any names or addresses requested by clients. The observed behavior suggests that there were some hostnames or addresses that both nodes were attempting to add, but were unable to do so due to reoccurring conflicts.

Its hard to tell specifically what they were conflicting on without looking at full unredacted logs.

The upgrade was done with only one server with that version for 5 minutes. I thought it was supported to have all server running during upgrade?

yes, that's fine. We haven't really touched anything in this space in quite a while, so I'm not sure what exactly would be causing this. I suggested stopping one of the servers for a bit because that would break the conflict cycle and allow the other node to do whatever it wanted to do.

caroline-suse-rancher · 2023-04-18T16:24:02Z

Closing since the problem resolved. Please reopen if it re-emerges.

oivindoh · 2023-07-05T21:53:56Z

In case I stumble upon this again - also hit this issue in a four node cluster with all the nodes trying to apply seemingly identical updates, after two nodes had been restarted with 1.25.10. Eventually decided to manually upgrade k3s by replacing the binary and restart remaining nodes. This seemed to clear it up.

sidewinder12s · 2023-11-30T19:32:16Z

Also hit this on the 1.24 -> 1.25 upgrade. Once we got all control plane nodes moved over to 1.25 it went away, so I am assuming some part of the secret/SAN changed between 1.24 and 1.25 with the nodes fighting over it until the last 1.24 node went away.

caroline-suse-rancher closed this as completed Apr 18, 2023

andriiraiskyi mentioned this issue Jul 3, 2023

upgrade k3s from 1.24.x to 1.25.x #7846

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update from version 1.24.10 to 1.25.7 results in Failed to save TLS secret for kube-system/k3s-serving #7123

Update from version 1.24.10 to 1.25.7 results in Failed to save TLS secret for kube-system/k3s-serving #7123

dcarrion87 commented Mar 20, 2023 •

edited

Loading

brandond commented Mar 20, 2023 •

edited

Loading

dcarrion87 commented Mar 20, 2023 •

edited

Loading

brandond commented Mar 20, 2023

dcarrion87 commented Mar 20, 2023 •

edited

Loading

dcarrion87 commented Mar 21, 2023 •

edited

Loading

dcarrion87 commented Mar 21, 2023

brandond commented Mar 21, 2023 •

edited

Loading

caroline-suse-rancher commented Apr 18, 2023

oivindoh commented Jul 5, 2023

sidewinder12s commented Nov 30, 2023

Update from version 1.24.10 to 1.25.7 results in Failed to save TLS secret for kube-system/k3s-serving #7123

Update from version 1.24.10 to 1.25.7 results in Failed to save TLS secret for kube-system/k3s-serving #7123

Comments

dcarrion87 commented Mar 20, 2023 • edited Loading

brandond commented Mar 20, 2023 • edited Loading

dcarrion87 commented Mar 20, 2023 • edited Loading

brandond commented Mar 20, 2023

dcarrion87 commented Mar 20, 2023 • edited Loading

dcarrion87 commented Mar 21, 2023 • edited Loading

dcarrion87 commented Mar 21, 2023

brandond commented Mar 21, 2023 • edited Loading

caroline-suse-rancher commented Apr 18, 2023

oivindoh commented Jul 5, 2023

sidewinder12s commented Nov 30, 2023

dcarrion87 commented Mar 20, 2023 •

edited

Loading

brandond commented Mar 20, 2023 •

edited

Loading

dcarrion87 commented Mar 20, 2023 •

edited

Loading

dcarrion87 commented Mar 20, 2023 •

edited

Loading

dcarrion87 commented Mar 21, 2023 •

edited

Loading

brandond commented Mar 21, 2023 •

edited

Loading