kubelet and kube-proxy fail to reload certificates when they are updated #46287

Spindel · 2017-05-23T13:32:30Z

What keywords did you search in Kubernetes issues before filing this one? (If you have found any duplicates, you should instead reply there.): tls certificate reload

Is this a BUG REPORT or FEATURE REQUEST? (choose one): BUG REPORT

Kubernetes version (use kubectl version):

Client Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.1", GitCommit:"b0b7a323cc5a4a2019b2e9520c21c7830b7f708e", GitTreeState:"clean", BuildDate:"2017-04-25T14:48:12Z", GoVersion:"go1.8.1", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.1+coreos.0", GitCommit:"9212f77ed8c169a0afa02e58dce87913c6387b3e", GitTreeState:"clean", BuildDate:"2017-04-04T00:32:53Z", GoVersion:"go1.7.5", Compiler:"gc", Platform:"linux/amd64"}

Environment:

Cloud provider or hardware configuration:
Digital Ocean / custom setup
OS (e.g. from /etc/os-release):
coreos VERSION=1353.7.0
Kernel (e.g. uname -a):
Linux coreos01.kub.do.modio.se 4.9.24-coreos Unit test coverage in Kubelet is lousy. (~30%) #1 SMP Wed Apr 26 21:44:23 UTC 2017 x86_64 Intel(R) Xeon(R) CPU E5-2650L v3 @ 1.80GHz GenuineIntel GNU/Linux
Install tools:
Ansible / CoreOS getting started guide
Others:

What happened:
We came up to our scheduled update of TLS client certificate, TLS certs were updated properly, but kubelet and kube-proxy keep the old Certs in memory. This causes them to fail when communicating with APIserver.

What you expected to happen:
kubelet & kube proxy should reload the certificates from disk.

How to reproduce it (as minimally and precisely as possible):
Generate a cert with a short lifetime, set up your cluster, wait a while, and then replace the cert with a longer lived one.

Anything else we need to know:
We're attempting to run with short lived client certificates. This has shown some issues with how kubernetes handles it, and will likely cause others to have hard to debug problems in the future.

The text was updated successfully, but these errors were encountered:

cmluciano · 2017-05-24T14:52:31Z

/sig cluster-lifecycle

alkar · 2017-11-03T15:37:52Z

Experiencing the same issue.

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.1", GitCommit:"f38e43b221d08850172a9a4ea785a86a3ffa3b3a", GitTreeState:"clean", BuildDate:"2017-10-12T00:44:36Z", GoVersion:"go1.9.1", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.1+coreos.0", GitCommit:"59359d9fdce74738ac9a672d2f31e9a346c5cece", GitTreeState:"clean", BuildDate:"2017-10-12T21:53:13Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}

FarhadF · 2017-11-12T12:15:23Z

Experiencing the same issue, Just tried to replace the certificates with new one.

Client Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.2", GitCommit:"bdaeafa71f6c7c04636251031f93464384d54963", GitTreeState:"clean", BuildDate:"2017-10-24T19:48:57Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.2", GitCommit:"bdaeafa71f6c7c04636251031f93464384d54963", GitTreeState:"clean", BuildDate:"2017-10-24T19:38:10Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}

FarhadF · 2017-11-12T12:59:37Z

My work around:

Replace certificates with new ones
change systemd service execstart to one liner:

ExecStart=/usr/bin/kubelet   --kubeconfig=/etc/kubelet/kubeconfig   --allow-privileged=true   --cluster-dns=10.96.0.10   --cluster-domain=cluster.local   --container-runtime=docker   --docker=unix:///var/run/docker.sock   --network-plugin=cni   --serialize-image-pulls=false   --tls-cert-file=/etc/kubernetes/k2.pem   --tls-private-key-file=/etc/kubernetes/k2-key.pem   --cni-conf-dir=/etc/cni/net.d   --cni-bin-dir=/opt/cni/bin   --v=2

I was using \ to have multi lines, you can even use your oneline at terminal for fast verification.
3. systemctl daemon-reload
4. openssl s_client -connect <nodename>:10250 -showcerts
you should see cert chain and no selfsigned errors and I also verified new certificate with the respective file:

Certificate chain
 0 s:/CN=k2
   i:/CN=kube-ca
Server certificate
subject=/CN=k2
issuer=/CN=kube-ca

fejta-bot · 2018-02-10T13:39:08Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2018-03-12T14:25:31Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
/remove-lifecycle stale

fejta-bot · 2018-04-11T14:41:55Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

george-angel · 2018-04-11T14:46:15Z

/reopen
/remove-lifecycle rotten

k8s-ci-robot · 2018-04-11T14:46:17Z

@george-angel: you can't re-open an issue/PR unless you authored it or you are assigned to it.

In response to this:

/reopen
/remove-lifecycle rotten

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

george-angel · 2018-04-23T09:52:25Z

Can someone please re-open this issue? Its quiet a significant one for us. We currently need trello cards with due time set a year ahead with a reminder to restart kube-proxy.

george-angel · 2018-08-06T10:38:35Z

Ping

dims · 2019-08-01T23:15:53Z

/reopen

k8s-ci-robot · 2019-08-01T23:15:55Z

@dims: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

riking · 2019-08-13T15:33:56Z

Is this a duplicate of #4672 ? Or possibly a subset (it calls out kubelet / kube-proxy specifically).

george-angel · 2019-08-13T15:40:17Z

It can be considered a subset, depending on how #4672 unfolds, currently its very broad.

fejta-bot · 2019-11-11T16:35:48Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

george-angel · 2019-11-11T20:27:09Z

/remove-lifecycle stale

fejta-bot · 2020-02-09T20:48:42Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

george-angel · 2020-02-10T08:48:39Z

/remove-lifecycle stale

kfox1111 · 2024-01-22T15:32:25Z

Semi related, I'm also interested in seeing if either this mechanism or direct integration with spire is possible for the certs.

shaneutt · 2024-01-22T15:51:04Z

Gotcha! Well I don't know of anyone right this moment that has the bandwidth/priority to take this one on (thus why it's been continually going stale), but if you think you might have some time to spare I would encourage you to not worry about being a kubelet or kube-proxy expert. Reach out to the community (Slack, Zooms) and let them know you're trying to learn how to get this one fixed: I expect you'll find the community can provide some level of assistance (even if they themselves might not have the bandwidth/priority to take the whole thing on).

shaneutt · 2024-03-22T08:39:01Z

While we're still open to someone with capacity taking this one on, since it's been some time without anyone who can it seems the previous lifecycle was accurate:

/lifecycle rotten

If you're interested in picking this one up, let us know and we will support you!

kfox1111 · 2024-03-22T11:53:33Z

It hasn't risen to the top of the todo list yet especially since it needs so much searching out for help and its not clear who can do so to me. (sig-auth, sig-node, sig-cluster-lifecycle, other?)

If we could identify some developers with knowledge of roughly what needs to be done, to which places, that would go a long way.

shaneutt · 2024-03-22T16:35:11Z

Is it something you'd like to put on the agenda for our next SIG Network meeting and come talk about it with us so we can start figuring it out?

aojea · 2024-03-23T05:07:10Z

@kfox1111 can you expand more on how are you doing today to update the certificates in kube-proxy? also to know how do you deploy kube-proxy, are you using a daemonset?
It will be useful to understand better your workflow to come up with the best solution

kfox1111 · 2024-03-23T15:42:27Z

@shaneutt https://docs.google.com/document/d/1_w77-zG_Xj0zYvEMfQZTQ-wPP4kXkpGD8smVtW_qqWM/edit Mar 28 at 9am pst? I think I can make that, if so.

@aojea I'm not yet doing anything but deploying with kubeadm. Ideally though, I'd like to be able to use a spire chain of trust for the cluster, attesting with either tpms on bare metal, or a cloud based node attestor for vms. They have already done a lot of the heavy lifting. We just need a machanism to get the certs from spire to k8s.

This would have some big benefits:

much shorter lived certificates, rolled automatically (hours or days)
node attestation. Rather then an initial bootstrap join token like thing, stronger mechanisms like tpms can be used to prove identity. No need to ssh in (how do you validate ssh host sig?) and copy join material from control plane to node. It can be automatic and safe.
periodic reattestation. On certificate refresh, continued poof of identity can be done. For example, handshaking with the tpm again to ensure it is still on the same node.
kubeadm has had a long standing issue with kubelet server certs being self signed. The same mechanism can be used to give kubelet (and maybe some other services) proper certificates in a validatable chain of trust.

kfox1111 · 2024-03-28T16:45:28Z

Was referred to sig-auth from sig-network on this issue.

aojea · 2024-03-28T16:48:29Z

@enj we were discussing this issue today during sig network meeting and our understanding was that this is related about how client-go load the certificates so we are moving this to sig auth

/sig auth

kfox1111 · 2024-03-28T16:50:36Z

client-go is 1/2 of the issue.

The server being able to use updated certificates without restart is also important and does not use client-go.

stlaz · 2024-04-08T16:09:14Z

per triage:
Someone from the auth team should have a look and see what the issue is based on the comments in the issue. The original comment does not explain which exact certificates are not being reloaded.

george-angel · 2024-04-09T04:19:21Z

Is tlsCertFile as part of KubeletConfiguration. We set this to 7d TTL and refresh daily. Kubelet needs to be restarted to use the new certificate.
clientCAFile as part of:

authentication:
  x509:
    clientCAFile: "/etc/kubernetes/ssl/ca.pem"

config in the same KubeletConfiguration file. This has 1 - 2 yr validity for us, and again Kubelet (and kube-proxy) need a restart to use this.

k8s-triage-robot · 2024-05-09T05:05:29Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot · 2024-05-09T05:05:34Z

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen

Mark this issue as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

kfox1111 · 2024-05-10T16:28:29Z

Still an issue. Please reopen

k8s-ci-robot · 2024-05-10T16:48:39Z

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

shaneutt · 2024-05-23T16:24:29Z

/assign @aroradaman

zhangweikop · 2024-06-06T23:06:20Z

client-go is 1/2 of the issue.

The server being able to use updated certificates without restart is also important and does not use client-go.

Yes.
Recent change in kubelet 124574 is to resolve the second half.

kfox1111 · 2024-06-08T15:44:18Z

Awesome. :)

Does it do the CA's too or just the client/server certs?

zhangweikop · 2024-06-09T20:36:20Z

Awesome. :)

Does it do the CA's too or just the client/server certs?

Not CA. Only certs.

From what I know for kubelet:
Both the server tls config and client-go tls config doesn't do CA File dynamic reload as of today

k8s-triage-robot · 2024-07-09T21:27:46Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot · 2024-07-09T21:27:50Z

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen

Mark this issue as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot added the sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. label May 24, 2017

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 10, 2018

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Mar 12, 2018

k8s-ci-robot closed this as completed Apr 11, 2018

k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Apr 11, 2018

k8s-ci-robot reopened this Aug 1, 2019

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 11, 2019

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 11, 2019

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 9, 2020

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 10, 2020

k8s-ci-robot added the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Mar 22, 2024

k8s-ci-robot added the sig/auth Categorizes an issue or PR as relevant to SIG Auth. label Mar 28, 2024

k8s-ci-robot closed this as not planned Won't fix, can't repro, duplicate, stale May 9, 2024

thockin reopened this May 10, 2024

k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label May 10, 2024

k8s-ci-robot assigned aroradaman May 23, 2024

k8s-ci-robot closed this as not planned Won't fix, can't repro, duplicate, stale Jul 9, 2024

kubelet and kube-proxy fail to reload certificates when they are updated #46287

kubelet and kube-proxy fail to reload certificates when they are updated #46287

Comments

Spindel commented May 23, 2017

cmluciano commented May 24, 2017

alkar commented Nov 3, 2017

FarhadF commented Nov 12, 2017

FarhadF commented Nov 12, 2017 • edited Loading

fejta-bot commented Feb 10, 2018

fejta-bot commented Mar 12, 2018

fejta-bot commented Apr 11, 2018

george-angel commented Apr 11, 2018

k8s-ci-robot commented Apr 11, 2018

george-angel commented Apr 23, 2018

george-angel commented Aug 6, 2018

dims commented Aug 1, 2019

k8s-ci-robot commented Aug 1, 2019

riking commented Aug 13, 2019 • edited Loading

george-angel commented Aug 13, 2019

fejta-bot commented Nov 11, 2019

george-angel commented Nov 11, 2019

fejta-bot commented Feb 9, 2020

george-angel commented Feb 10, 2020

kfox1111 commented Jan 22, 2024

shaneutt commented Jan 22, 2024

shaneutt commented Mar 22, 2024

kfox1111 commented Mar 22, 2024

shaneutt commented Mar 22, 2024

aojea commented Mar 23, 2024

kfox1111 commented Mar 23, 2024

kfox1111 commented Mar 28, 2024

aojea commented Mar 28, 2024 • edited Loading

kfox1111 commented Mar 28, 2024

stlaz commented Apr 8, 2024

george-angel commented Apr 9, 2024

k8s-triage-robot commented May 9, 2024

k8s-ci-robot commented May 9, 2024

kfox1111 commented May 10, 2024

k8s-ci-robot commented May 10, 2024

shaneutt commented May 23, 2024

zhangweikop commented Jun 6, 2024 • edited Loading

kfox1111 commented Jun 8, 2024

zhangweikop commented Jun 9, 2024

k8s-triage-robot commented Jul 9, 2024

k8s-ci-robot commented Jul 9, 2024

FarhadF commented Nov 12, 2017 •

edited

Loading

riking commented Aug 13, 2019 •

edited

Loading

aojea commented Mar 28, 2024 •

edited

Loading

zhangweikop commented Jun 6, 2024 •

edited

Loading