Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x509: certificate has expired or is not yet valid #5163

Closed
1 task
kingsd041 opened this issue Feb 25, 2022 · 34 comments
Closed
1 task

x509: certificate has expired or is not yet valid #5163

kingsd041 opened this issue Feb 25, 2022 · 34 comments
Assignees
Milestone

Comments

@kingsd041
Copy link

Environmental Info:
K3s Version: v1.22.6+k3s1 and v1.23.4+k3s1

Node(s) CPU architecture, OS, and Version: ubuntu 1804

Cluster Configuration: 1 server

Describe the bug:

Cannot rotate k3s-serving certificate after restarting k3s

Steps To Reproduce:

  1. Make sure all pods are up and running
root@ip-172-31-15-171:~# kubectl get pods -A
NAMESPACE     NAME                                      READY   STATUS      RESTARTS   AGE
kube-system   coredns-96cc4f57d-xpppw                   1/1     Running     0          70s
kube-system   local-path-provisioner-84bb864455-lkc65   1/1     Running     0          70s
kube-system   helm-install-traefik-crd--1-6mw65         0/1     Completed   0          70s
kube-system   helm-install-traefik--1-qbr25             0/1     Completed   1          70s
kube-system   svclb-traefik-hxggr                       2/2     Running     0          40s
kube-system   metrics-server-ff9dbcb6c-txhfq            1/1     Running     0          70s
kube-system   traefik-55fdc6d984-c28rn                  1/1     Running     0          40s
  1. current date is: Fri Feb 25 02:37:07 UTC 2022
  2. Change the os date to after the certificate expires:
root@ip-172-31-15-171:~# timedatectl set-ntp no
root@ip-172-31-15-171:~# date -s 20230303
Fri Mar  3 00:00:00 UTC 2023
root@ip-172-31-15-171:~# date
Fri Mar  3 00:00:03 UTC 2023
  1. Restart K3s
systemctl restart k3s
  1. After a few minutes, query the k3s-serving expiration time
root@ip-172-31-15-171:~# kubectl --insecure-skip-tls-verify  get secret -n kube-system k3s-serving -o jsonpath='{.data.tls\.crt}' | base64 -d | openssl x509 -noout -text | grep Not

            Not Before: Feb 25 02:34:03 2022 GMT
            Not After : Feb 25 02:34:04 2023 GMT
  1. At this point, kubectl cannot be used due to an expired certificate
root@ip-172-31-15-171:~# kubectl get nodes
Unable to connect to the server: x509: certificate has expired or is not yet valid: current time 2023-03-03T00:04:06Z is after 2023-02-25T02:34:04Z

Expected behavior:

Restart k3s to automatically rotate certificates

Actual behavior:

Restart k3s, k3s-serving does not automatically rotate.

But I can manually rotate the k3s-serving certificate by:

kubectl --insecure-skip-tls-verify delete secret k3s-serving -n kube-system
rm -rf /var/lib/rancher/k3s/server/tls/dynamic-cert.json
systemctl restart k3s

Additional context / logs:

Backporting

  • Needs backporting to older releases
@brandond
Copy link
Contributor

Generally it's best to stop k3s before jumping the time forward; there are bits of Kubernetes that don't react well to time skips like that. We've tested certificate renewal on startup extensively in the past and I would be disappointed if it regressed; can you confirm that this still is an issue if K3s is stopped when you move the time forward?

@kingsd041
Copy link
Author

@brandond I changed the os time to 20230222 (certificate expires in less than 90 days). After restarting k3s, k3s-serving still does not rotate

root@ip-172-31-13-124:~# date -s 20230222
Wed Feb 22 00:00:00 UTC 2023
root@ip-172-31-13-124:~# date
Wed Feb 22 00:00:16 UTC 2023
root@ip-172-31-13-124:~# systemctl restart k3s

root@ip-172-31-13-124:~# for i in `ls /var/lib/rancher/k3s/server/tls/*.crt`; do echo $i; openssl x509 -enddate -noout -in $i; done
/var/lib/rancher/k3s/server/tls/client-admin.crt
notAfter=Feb 22 00:00:33 2024 GMT
/var/lib/rancher/k3s/server/tls/client-auth-proxy.crt
notAfter=Feb 22 00:00:33 2024 GMT
/var/lib/rancher/k3s/server/tls/client-ca.crt
notAfter=Feb 23 06:37:01 2032 GMT
/var/lib/rancher/k3s/server/tls/client-controller.crt
notAfter=Feb 22 00:00:33 2024 GMT
/var/lib/rancher/k3s/server/tls/client-k3s-cloud-controller.crt
notAfter=Feb 22 00:00:33 2024 GMT
/var/lib/rancher/k3s/server/tls/client-k3s-controller.crt
notAfter=Feb 22 00:00:33 2024 GMT
/var/lib/rancher/k3s/server/tls/client-kube-apiserver.crt
notAfter=Feb 22 00:00:33 2024 GMT
/var/lib/rancher/k3s/server/tls/client-kube-proxy.crt
notAfter=Feb 22 00:00:33 2024 GMT
/var/lib/rancher/k3s/server/tls/client-scheduler.crt
notAfter=Feb 22 00:00:33 2024 GMT
/var/lib/rancher/k3s/server/tls/request-header-ca.crt
notAfter=Feb 23 06:37:01 2032 GMT
/var/lib/rancher/k3s/server/tls/server-ca.crt
notAfter=Feb 23 06:37:01 2032 GMT
/var/lib/rancher/k3s/server/tls/serving-kube-apiserver.crt
notAfter=Feb 22 00:00:33 2024 GMT
root@ip-172-31-13-124:~# kubectl --insecure-skip-tls-verify  get secret -n kube-system k3s-serving -o jsonpath='{.data.tls\.crt}' | base64 -d | openssl x509 -noout -text | grep Not
            Not Before: Feb 25 06:37:01 2022 GMT
            Not After : Feb 25 06:37:01 2023 GMT

@brandond
Copy link
Contributor

brandond commented Feb 25, 2022

You appear to still have K3s running while moving the time forward? Stop it before changing the time.

@kingsd041
Copy link
Author

@brandond Hi, I followed your suggestion to re-test and still have the same problem, the steps are as follows:

  1. Launch k3s
 curl -sfL https://get.k3s.io | sh -
  1. Make sure all pods are running
root@ip-172-31-7-59:~# kubectl get pods -A
NAMESPACE     NAME                                      READY   STATUS      RESTARTS   AGE
kube-system   local-path-provisioner-84bb864455-fx2wd   1/1     Running     0          70s
kube-system   coredns-96cc4f57d-ptmht                   1/1     Running     0          70s
kube-system   helm-install-traefik-crd--1-pq9rs         0/1     Completed   0          70s
kube-system   helm-install-traefik--1-j9k6x             0/1     Completed   1          70s
kube-system   svclb-traefik-4tc9g                       2/2     Running     0          41s
kube-system   metrics-server-ff9dbcb6c-fdwjj            1/1     Running     0          70s
kube-system   traefik-55fdc6d984-dv49p                  1/1     Running     0          42s
  1. Current time: Tue Mar 1 06:28:24 UTC 2022
  2. Stop k3s
systemctl stop k3s
  1. Change the OS date to after the certificate expires
root@ip-172-31-7-59:~# timedatectl set-ntp no
root@ip-172-31-7-59:~# date -s 20230303
Fri Mar  3 00:00:00 UTC 2023
root@ip-172-31-7-59:~# date
Fri Mar  3 00:00:01 UTC 2023
  1. Start k3s

Then, query the k3s-serving certificate expiration time again, still no rotation

root@ip-172-31-7-59:~# kubectl --insecure-skip-tls-verify  get secret -n kube-system k3s-serving -o jsonpath='{.data.tls\.crt}' | base64 -d | openssl x509 -noout -text | grep Not
            Not Before: Mar  1 06:26:50 2022 GMT
            Not After : Mar  1 06:26:50 2023 GMT

@venkyfff
Copy link

venkyfff commented Mar 2, 2022

I am facing same issue in v1.20.0-rc5+k3s1

@YibiaoWu
Copy link

I also found the same problem on k3s version 1.22.6

@niusmallnan
Copy link
Contributor

I am facing the same issue in v1.23.8+k3s, I did some digging and these clues might be useful.

Normally, K3s use CertificateRenewDays to determine whether a renewed certificate is required. After a new certificate is generated, it will be stored in memory, file, and k8s secret.

I added a line to the log which shows that the new certificate has been generated. Compared to fingerprint, I'm sure this is the new certificate.

diff --git a/factory/gen.go b/factory/gen.go
index e620c5f..0bb4f86 100644
--- a/factory/gen.go
+++ b/factory/gen.go
@@ -187,6 +188,7 @@ func (t *TLS) generateCert(secret *v1.Secret, cn ...string) (*v1.Secret, bool, e
        secret.Data[v1.TLSCertKey] = certBytes
        secret.Data[v1.TLSPrivateKeyKey] = keyBytes
        secret.Annotations[fingerprint] = fmt.Sprintf("SHA1=%X", sha1.Sum(newCert.Raw))
+       logrus.Infof("generateCert fingerprint %s", secret.Annotations[fingerprint])

        return secret, true, nil
 }

level=info msg="generateCert fingerprint SHA1=0290514BBED867747CCC20314D93B571F2EA582E"

However, when the secret was finally updated, the new certificate was not written. Fingerprints are still old:

$ journalctl -u k3s | grep "Updating TLS secret for kube-system/k3s-serving"
level=info msg="Updating TLS secret for kube-system/k3s-serving (count: 10): map[listener.cattle.io/cn-10.43.0.1:10.43.0.1 listener.cattle.io/cn-127.0.0.1:127.0.0.1 listener.cattle.io/cn-172.31.30.159:172.31.30.159 listener.cattle.io/cn-__1-f16284:::1 listener.cattle.io/cn-ip-172-31-30-159:ip-172-31-30-159 listener.cattle.io/cn-kubernetes:kubernetes listener.cattle.io/cn-kubernetes.default:kubernetes.default listener.cattle.io/cn-kubernetes.default.svc:kubernetes.default.svc listener.cattle.io/cn-kubernetes.default.svc.cluster.local:kubernetes.default.svc.cluster.local listener.cattle.io/cn-localhost:localhost listener.cattle.io/fingerprint:SHA1=166934922354B5491A77F6714E1875B7FB0E8EB7]"

@brandond
I retrieved some PRs and I suspect this bug was introduced by rancher/dynamiclistener#49.
The code in factory/gen.go removes checks for fingerprints. As a result, the new certificate is not updated to the secret.

@brandond
Copy link
Contributor

brandond commented Jul 14, 2022

Right, that change in behavior is discussed in the docstring for the function.

// If the merge would not add any CNs to the additional Secret, the additional
// Secret is returned, to allow for certificate rotation/regeneration.

Checking the fingerprint annotation should not be necessary since the second (newer) secret is always returned by the merge function if the new secret contains all the CNs from the current secret, as is the case when updating a certificate to extend its duration.

@smiggiddy
Copy link

I just moved to k3s and did a fresh install. I'm facing the cert issue as above on v1.23.8+k3s2.
After I restart the k3s.service . I can regain some functionality, but I'm not sure what's going on really.

@niusmallnan
Copy link
Contributor

@brandond Thanks for your reply, you are right. There is really no need to check the fingerprint.

I have some new discoveries. If the k3s-serving certificate needs to be renewed, saveInK8s is executed twice at startup.

  1. Since initComplete is not completed, the certificate is not actually written to the secret.
  2. Due to tls-storage OnChange, it seems that it is only an execution triggered after informer sync, and does not really update the renewed certificate.

Maybe initComplete completes quickly in some setups, so the first saveInK8s completes the write. However, I can reproduce the problem 100% on my AWS t3a.medium instance.

@brandond
Copy link
Contributor

@niusmallnan how are you are you reproducing this? Are you changing the system time or something else to simulate an expired cert? For me, simply restarting the process will reliably renew the certificate for me when it has less than 90 days of validity remaining.

@niusmallnan
Copy link
Contributor

@brandond I reproduce this way #5163 (comment)
Not sure if CPU power is related, I can reproduce it on AWS t3a.medium.

@brandond
Copy link
Contributor

Hmm, interesting. Clearly we need a bit more testing in this area.

Jul 19 13:53:12 debian01.lan.khaus k3s[30586]: time="2022-07-19T13:53:12-07:00" level=info msg="Updating TLS secret for kube-system/k3s-serving (count: 10): map[listener.cattle.io/cn-10.0.1.227:10.0.1.227 listener.cattle.io/cn-10.43.0.1:10.43.0.1 listener.cattle.io/cn-127.0.0.1:127.0.0.1 listener.cattle.io/cn-__1-f16284:::1 listener.cattle.io/cn-debian01.lan.khaus:debian01.lan.khaus listener.cattle.io/cn-kubernetes:kubernetes listener.cattle.io/cn-kubernetes.default:kubernetes.default listener.cattle.io/cn-kubernetes.default.svc:kubernetes.default.svc listener.cattle.io/cn-kubernetes.default.svc.cluster.local:kubernetes.default.svc.cluster.local listener.cattle.io/cn-localhost:localhost listener.cattle.io/fingerprint:SHA1=6DE892F043766D0B02A768010DB6BFB5B8357208]"
Jul 19 13:53:13 debian01.lan.khaus k3s[30586]: time="2022-07-19T13:53:13-07:00" level=info msg="Active TLS secret kube-system/k3s-serving (ver=239) (count 10): map[listener.cattle.io/cn-10.0.1.227:10.0.1.227 listener.cattle.io/cn-10.43.0.1:10.43.0.1 listener.cattle.io/cn-127.0.0.1:127.0.0.1 listener.cattle.io/cn-__1-f16284:::1 listener.cattle.io/cn-debian01.lan.khaus:debian01.lan.khaus listener.cattle.io/cn-kubernetes:kubernetes listener.cattle.io/cn-kubernetes.default:kubernetes.default listener.cattle.io/cn-kubernetes.default.svc:kubernetes.default.svc listener.cattle.io/cn-kubernetes.default.svc.cluster.local:kubernetes.default.svc.cluster.local listener.cattle.io/cn-localhost:localhost listener.cattle.io/fingerprint:SHA1=6DE892F043766D0B02A768010DB6BFB5B8357208]"
Jan 01 00:00:19 debian01.lan.khaus k3s[33158]: time="2024-01-01T00:00:19-08:00" level=info msg="certificate CN=k3s,O=k3s will expire in -165.463341 days at 2023-07-19 20:53:07 +0000 UTC"
Jan 01 00:00:19 debian01.lan.khaus k3s[33158]: time="2024-01-01T00:00:19-08:00" level=info msg="certificate CN=k3s,O=k3s signed by CN=k3s-server-ca@1658263986: notBefore=2022-07-19 20:53:06 +0000 UTC notAfter=2024-12-31 08:00:19 +0000 UTC"
Jan 01 00:00:24 debian01.lan.khaus k3s[33158]: time="2024-01-01T00:00:24-08:00" level=info msg="Updating TLS secret for kube-system/k3s-serving (count: 10): map[listener.cattle.io/cn-10.0.1.227:10.0.1.227 listener.cattle.io/cn-10.43.0.1:10.43.0.1 listener.cattle.io/cn-127.0.0.1:127.0.0.1 listener.cattle.io/cn-__1-f16284:::1 listener.cattle.io/cn-debian01.lan.khaus:debian01.lan.khaus listener.cattle.io/cn-kubernetes:kubernetes listener.cattle.io/cn-kubernetes.default:kubernetes.default listener.cattle.io/cn-kubernetes.default.svc:kubernetes.default.svc listener.cattle.io/cn-kubernetes.default.svc.cluster.local:kubernetes.default.svc.cluster.local listener.cattle.io/cn-localhost:localhost listener.cattle.io/fingerprint:SHA1=6DE892F043766D0B02A768010DB6BFB5B8357208]"

@brandond
Copy link
Contributor

Ah, I see. The locally cached secret is used initially, but when the apiserver comes up, the datastore secret is merged into it, and replaces it - which is what we want in general, to ensure that the datastore certificate is used in favor of the one from the local node. However, we shouldn't do this if the one from the datastore is expired.

@niusmallnan
Copy link
Contributor

I use this workaround to get the k3s-serving certificate renew.

kubectl --insecure-skip-tls-verify delete secret -n kube-system k3s-serving
rm -f /var/lib/rancher/k3s/server/tls/dynamic-cert.json

systemctl restart k3s

@LarsBingBong

This comment was marked as off-topic.

@brandond
Copy link
Contributor

@LarsBingBong no, what you're reporting does not have anything to do with the dynamic certificate expiring. Please open a new issue.

@ryankurte
Copy link

ryankurte commented Jul 22, 2022

i'm seeing what i -think- might be a related problem, running on a celeron N5095 under truenas scale (it's plausible this is part of truenas, but, it occurs when starting the k3s service manually too).

k3s came up and generated the certificate and was working okay, then i corrected my timezone settings and restarted and now the certificate is not yet valid, causing k3s to panic on boot. so there's definitely no regenerating certificates happening and i cant use kubectl to clear out existing secrets.

the panic i'm seeing is:
Jul 22 18:55:59 truenas k3s[1697126]: E0722 18:55:59.543714 1697126 runtime.go:76] Observed a panic: F0722 18:55:59.543496 1697126 controller.go:170] Unable to perform initial IP allocation check: unable to refresh the service IP block: Get "https://127.0.0.1:6444/api/v1/services": x509: certificate has expired or is not yet valid: current time 2022-07-22T18:55:59+12:00 is before 2022-07-22T07:45:55Z
Jul 22 18:55:59 truenas k3s[1697126]: goroutine 5142 [running]:
Jul 22 18:55:59 truenas k3s[1697126]: k8s.io/apimachinery/pkg/util/runtime.logPanic({0x56dece0, 0xc000d43480})
Jul 22 18:55:59 truenas k3s[1697126]:         /root/go/pkg/mod/github.com/k3s-io/kubernetes/staging/src/k8s.io/apimachinery@v1.23.5-k3s1/pkg/util/runtime/runtime.go:74 +0x7d
Jul 22 18:55:59 truenas k3s[1697126]: k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0x1})
Jul 22 18:55:59 truenas k3s[1697126]:         /root/go/pkg/mod/github.com/k3s-io/kubernetes/staging/src/k8s.io/apimachinery@v1.23.5-k3s1/pkg/util/runtime/runtime.go:48 +0x75
Jul 22 18:55:59 truenas k3s[1697126]: panic({0x56dece0, 0xc000d43480})
Jul 22 18:55:59 truenas k3s[1697126]:         /usr/lib/go-1.17/src/runtime/panic.go:1038 +0x215
Jul 22 18:55:59 truenas k3s[1697126]: k8s.io/klog/v2.(*loggingT).output(0x93311a0, 0x3, 0x0, 0xc0005147e0, 0x0, {0x7779b63, 0x1}, 0xc000d43460, 0x0)
Jul 22 18:55:59 truenas k3s[1697126]:         /root/go/pkg/mod/github.com/k3s-io/klog/v2@v2.30.0-k3s1/klog.go:982 +0x625
Jul 22 18:55:59 truenas k3s[1697126]: k8s.io/klog/v2.(*loggingT).printf(0x4, 0x0, 0x0, {0x0, 0x0}, {0x602f074, 0x31}, {0xc000d43460, 0x1, 0x1})
Jul 22 18:55:59 truenas k3s[1697126]:         /root/go/pkg/mod/github.com/k3s-io/klog/v2@v2.30.0-k3s1/klog.go:753 +0x1c5
Jul 22 18:55:59 truenas k3s[1697126]: k8s.io/klog/v2.Fatalf(...)
Jul 22 18:55:59 truenas k3s[1697126]:         /root/go/pkg/mod/github.com/k3s-io/klog/v2@v2.30.0-k3s1/klog.go:1513
Jul 22 18:55:59 truenas k3s[1697126]: k8s.io/kubernetes/pkg/controlplane.(*Controller).Start(0xc000b6a000)
Jul 22 18:55:59 truenas k3s[1697126]:         /root/go/pkg/mod/github.com/k3s-io/kubernetes@v1.23.5-k3s1/pkg/controlplane/controller.go:170 +0x56e
Jul 22 18:55:59 truenas k3s[1697126]: k8s.io/kubernetes/pkg/controlplane.(*Controller).PostStartHook(...)
Jul 22 18:55:59 truenas k3s[1697126]:         /root/go/pkg/mod/github.com/k3s-io/kubernetes@v1.23.5-k3s1/pkg/controlplane/controller.go:135
Jul 22 18:55:59 truenas k3s[1697126]: k8s.io/apiserver/pkg/server.runPostStartHook.func1({0xc000e507d0, {0xc001cf7500, 0xc00377a1e8}, 0xc00093a840}, {0xc00130c900, 0xc0012a3080}, 0xc00006ff70)
Jul 22 18:55:59 truenas k3s[1697126]:         /root/go/pkg/mod/github.com/k3s-io/kubernetes/staging/src/k8s.io/apiserver@v1.23.5-k3s1/pkg/server/hooks.go:198 +0x8e
Jul 22 18:55:59 truenas k3s[1697126]: k8s.io/apiserver/pkg/server.runPostStartHook({0x5f76c3f, 0x14}, {0xc000e507d0, {0xc001cf7500, 0xc000323d50}, 0xc00093a840}, {0xc00130c900, 0xc0012a3080})
Jul 22 18:55:59 truenas k3s[1697126]:         /root/go/pkg/mod/github.com/k3s-io/kubernetes/staging/src/k8s.io/apiserver@v1.23.5-k3s1/pkg/server/hooks.go:199 +0x85
Jul 22 18:55:59 truenas k3s[1697126]: created by k8s.io/apiserver/pkg/server.(*GenericAPIServer).RunPostStartHooks
Jul 22 18:55:59 truenas k3s[1697126]:         /root/go/pkg/mod/github.com/k3s-io/kubernetes/staging/src/k8s.io/apiserver@v1.23.5-k3s1/pkg/server/hooks.go:165 +0xf6

@brandond
Copy link
Contributor

brandond commented Jul 22, 2022

That particular crash is deep inside the Kubernetes code; it looks like they don't expect that sort of error.

I will say that our cert renewal code checks for certificates to be expired, it does not check to see if they are from the future. I am not sure I'm interested in fixing that particular case; don't set your system clock backwards.

@ryankurte
Copy link

ryankurte commented Jul 22, 2022

thanks for the response! i waited it out so am back running now ^_^

I am not sure I'm interested in fixing that particular case;

it's probably not something that comes up a lot in the world of real computers, but it is reasonably common for embedded devices not to have RTCs so time travel is part of the experience. most often this is in the -forward- direction which should already be handled but it can happen backwards if you're unlucky, and you certainly can't expect the system time to be correct at boot.

don't set your system clock backwards.

well i didn't -intend- to set the system clock backwards, but my timezone was wrong and i live in the future (GMT+12) 😅

@xzycn
Copy link
Contributor

xzycn commented Jul 28, 2022

Hi, I'm facing the same issue now:

Jul 28 11:43:46 k3s-master k3s[2198409]: E0728 11:43:46.946770 2198409 authentication.go:63] "Unable to authenticate the request" err="[x509: certificate has expired or is not yet valid: current time 2022-07-28T11:43:46+08:00 is after 2022-07-27T12:50:00Z, verifying certificate SN=6732235054609479369, SKID=, AKID=58:8D:E1:8E:CE:CB:F4:0C:10:12:90:C9:25:64:0F:2C:EF:CD:3E:55 failed: x509: certificate has expired or is not yet valid: current time 2022-07-28T11:43:46+08:00 is after 2022-07-27T12:50:00Z]"

I have run the cluster normally almost one year.

K3S version:v1.21.3+k3s1

@brandond
Copy link
Contributor

@xzycn please see the steps at #5163 (comment)

@xzycn
Copy link
Contributor

xzycn commented Jul 29, 2022

@brandond
We should do that manually every a few months?
I only restart the k3s-server(only systemctl restart k3s), then I can use kubectl again, but the client-certificate-data and client-key-data changed(so we should change kubeconfig).Meanwhile, I see some errors on the other worker node:

level=warning msg="Unable to watch for tunnel endpoints: the server has asked for the client to provide credentials (get endpoints)

After I restart k3s-agent on the worker node, the errors disappear. Yes, I should repeat the action on every node.

Is there any official docs about the certificates in k3s cluster?

@xzycn
Copy link
Contributor

xzycn commented Jul 29, 2022

After I restart the k3s-server,the certificates seem haven't changed(Not Before is last year),as follows show:
image

but,there are some changes in tls folder:

image

@xzycn
Copy link
Contributor

xzycn commented Jul 29, 2022

Another question:
When the certificates error appears on the worker node, we can still upgrade a deployment that assigin Pod on the worker node?
The test result is:yes , the Pod will recreate and get a new IP,in my opinion,the communication between agent and server has broken, how can the agent to upgrade a deployment?

Thank you in advance :)

@eradnab
Copy link

eradnab commented Aug 18, 2022

Problem exists on 1.23.9+k3s1 as well.

The workaround works:

kubectl --insecure-skip-tls-verify delete secret -n kube-system k3s-serving
sudo rm -f /var/lib/rancher/k3s/server/tls/dynamic-cert.json
sudo systemctl restart k3s

I see a potential fix (#5951) has been merged to 1.23. When do you think a release with this fix in place will be made please?

Also is there a way I can install the master 1.23.x version so I can test and report back it's fixed on a VM?

@bguzman-3pillar
Copy link

Validated on version v1.24.4-rc1+k3s1

$ k3s -v
k3s version v1.24.4-rc1+k3s1 (c3f830e9)
go version go1.18.1

Environment Details

Infrastructure

  • [ x ] Cloud

Node(s) CPU architecture, OS, and Version:

$ cat /etc/os-release
NAME="Ubuntu"
VERSION="20.04.4 LTS (Focal Fossa)"

Cluster Configuration:

1 server

Testing Steps

  1. Install K3S curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION=v1.24.4-rc1+k3s1 INSTALL_K3S_EXEC="server" sh -
  2. Make sure all pods are running
$ kubectl get pods -A
NAMESPACE     NAME                                      READY   STATUS      RESTARTS   AGE
kube-system   coredns-b96499967-6qvbc                   1/1     Running     0          3m2s
kube-system   helm-install-traefik-crd-527jx            0/1     Completed   0          3m2s
kube-system   helm-install-traefik-hbkvs                0/1     Completed   1          3m2s
kube-system   local-path-provisioner-7b7dc8d6f5-6mrrm   1/1     Running     0          3m2s
kube-system   metrics-server-668d979685-x8wzw           1/1     Running     0          3m2s
kube-system   svclb-traefik-74309c0a-5f7kt              2/2     Running     0          2m46s
kube-system   traefik-7cd4fcff68-x4vb5                  1/1     Running     0          2m46s
  1. See current time
$ date
Mon Aug 22 19:51:17 UTC 2022
  1. Stop k3s
systemctl stop k3s
  1. Change the OS date to after the certificate expires
$ sudo timedatectl set-ntp no
$ sudo date -s 20230808
Tue Aug  8 00:00:00 UTC 2023
$ date
Tue Aug  8 00:00:05 UTC 2023
  1. Start K3S
sudo systemctl start k3s
$ kubectl --insecure-skip-tls-verify  get secret -n kube-system k3s-serving -o jsonpath='{.data.tls\.crt}' | base64 -d | openssl x509 -noout -text | grep Not
            Not Before: Aug 22 19:47:50 2022 GMT
            Not After : Aug  7 00:00:50 2024 GMT

Validation Results:
Message Unable to connect to the server: x509: certificate has expired or is not yet valid:...is not longer present

$ kubectl get nodes -A 
NAME               STATUS   ROLES                       AGE    VERSION
ip-172-31-47-144   Ready    control-plane,etcd,master   350d   v1.24.4-rc1+k3s1

@eradnab
Copy link

eradnab commented Aug 23, 2022

I have tested 1.23.10-rc1+k3s1 and the problem has been resolved. Thanks.

@marcelstoer
Copy link

In my case I found the following.

  • /var/lib/rancher/k3s/server/cred/api-server.kubeconfig points to
    • client-certificate: /var/lib/rancher/k3s/server/tls/client-kube-apiserver.crt
    • client-key: /var/lib/rancher/k3s/server/tls/client-kube-apiserver.key
  • Those certificates and keys are valid
[root@k8s-master cred]# openssl x509  -in /var/lib/rancher/k3s/server/tls/client-kube-apiserver.crt   -text -noout
...
            Not Before: Mar  3 11:32:28 2020 GMT
            Not After : Jul  1 01:00:02 2023 GMT 
  • Yet, when I queried the API server it returned an expired certificate (Mar 23 23:29:31 2023 GMT)
[root@k8s-master ~]# curl -vvv https://localhost:6443
* About to connect() to localhost port 6443 (#0)
*   Trying ::1...
* Connected to localhost (::1) port 6443 (#0)
* Initializing NSS with certpath: sql:/etc/pki/nssdb
*   CAfile: /etc/pki/tls/certs/ca-bundle.crt
  CApath: none
* Server certificate:
* 	subject: CN=k3s,O=k3s
* 	start date: Mar 03 11:32:28 2020 GMT
* 	expire date: Mar 23 23:29:31 2023 GMT
* 	common name: k3s
* 	issuer: CN=k3s-server-ca@1583235148
* NSS error -8181 (SEC_ERROR_EXPIRED_CERTIFICATE)
* Peer's Certificate has expired.
* Closing connection 0

--> contrary to my belief that certificate is the (generated) one from /var/lib/rancher/k3s/server/tls/dynamic-cert.json (-> data.tls.crt) and not the one referenced in the api-server.kubeconfig
--> the only remedy was the one documented by @niusmallnan at #5163 (comment)

@brandond
Copy link
Contributor

brandond commented Mar 24, 2023

@marcelstoer you're not looking at the server certificate for the apiserver. As suggested by the filename, that is a client certificate. The server cert and key used by the listener are stored in the secret and cached locally in the json file. Can you confirm what version of K3s you're using?

@marcelstoer
Copy link

Thanks for the clarification Brad. I observed this on ancient v1.17.15+k3s1 installation.

@brandond
Copy link
Contributor

Ah yes, that is ancient, and I would expect that version to still be affected by this issue. It has long been fixed on newer releases.

@marcelstoer
Copy link

marcelstoer commented Mar 25, 2023

So far I've been reluctant to upgrade this installation as I'm really not an k3s expert (or k8s for that matter). However, according to https://docs.k3s.io/upgrades/manual#manually-upgrade-k3s-using-the-binary it should be as easy as replacing the k3s binary. What can go wrong...

@brandond
Copy link
Contributor

brandond commented Mar 26, 2023

Step through each Kubernetes minor one at a time, and be ready to upgrade any user manifests that use old API versions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests