-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update from version 1.24.10 to 1.25.7 results in Failed to save TLS secret for kube-system/k3s-serving #7123
Comments
Can you confirm the versions that you are upgrading from and to? There is no v1.25.10 yet; the latest GA 1.25 release available is
The logs are truncated at the end of the line; can you attach the complete logs from journald without any terminal-width truncation?
What are you using for the datastore on your two server nodes? Have you upgraded both servers, or just one? |
|
Can you confirm that the cli flags on both servers (tls-san values in particular) are in sync? Have you tried stopping one of the servers for a period of time so that the other can start up and successfully update the secret? Due to the fact that you've posted just a single redacted log line from a single server I can't really tell what it's trying to change on the cert. Can you compare the logged certificate annotations between the two servers to see what they are alternating trying to set? |
Is that something we should be accomodating when upgrading the servers? The upgrade was done with only one server with that version for 5 minutes. I thought it was supported to have all server running during upgrade? Appreciate it's difficult with redacted values. Can confirm tls san values are in sync. Config below.
|
@brandond could you point me at any docos that talks about what |
How odd, it's now stopped... Restarted k3s a few more times and it's fine... |
There aren't any docs about this specific implementation detail, but that secret is used to store the dynamically-generated server certificate for the apiserver/supervisor listener on port 6443. The certificate is updated with SAN entries for any requested names and addresses, as well as any names or addresses requested by clients. The observed behavior suggests that there were some hostnames or addresses that both nodes were attempting to add, but were unable to do so due to reoccurring conflicts. Its hard to tell specifically what they were conflicting on without looking at full unredacted logs.
yes, that's fine. We haven't really touched anything in this space in quite a while, so I'm not sure what exactly would be causing this. I suggested stopping one of the servers for a bit because that would break the conflict cycle and allow the other node to do whatever it wanted to do. |
Closing since the problem resolved. Please reopen if it re-emerges. |
In case I stumble upon this again - also hit this issue in a four node cluster with all the nodes trying to apply seemingly identical updates, after two nodes had been restarted with 1.25.10. Eventually decided to manually upgrade k3s by replacing the binary and restart remaining nodes. This seemed to clear it up. |
Also hit this on the 1.24 -> 1.25 upgrade. Once we got all control plane nodes moved over to 1.25 it went away, so I am assuming some part of the secret/SAN changed between 1.24 and 1.25 with the nodes fighting over it until the last 1.24 node went away. |
Environmental Info:
K3s Version: 1.24.10 to 1.25.7
Node(s) CPU architecture, OS, and Version:
Linux ip-10-X-X-X 5.15.0-1026-aws #30-Ubuntu SMP Wed Nov 23 17:01:09 UTC 2022 aarch64 aarch64 aarch64 GNU/Linux
Cluster Configuration:
2 servers, 1 agent
Describe the bug:
We're testing upgrade between 1.24.10 and 1.25.7 on the servers. After upgrade these errors keep throwing repeatedly in logs:
Steps To Reproduce:
The text was updated successfully, but these errors were encountered: