/srv/kubernetes/kubelet-server.crt expired, did not auto-renew #15970

darintay · 2023-09-27T19:13:10Z

/kind bug

1. What kops version are you running? The command kops version, will display
this information.
1.25.4

2. What Kubernetes version are you running? kubectl version will print the
version if a cluster is running or provide the Kubernetes version specified as
a kops flag.
1.23.5

3. What cloud provider are you using?
AWS

4. What commands did you run? What is the simplest way to reproduce this issue?
A node that was running for 400+ days had its kubelet-server.crt certificate expire, which broke all pods on the node.

$ sudo openssl x509 -enddate -noout -in /srv/kubernetes/kubelet-server.crt
notAfter=Sep 27 06:52:10 2023 GMT

Is something supposed to be auto-renewing this certificate? I couldn't see anything in the kubelet logs about it. I know that kubelets have certificate rotation (https://kubernetes.io/docs/tasks/tls/certificate-rotation/) but I don't know if that is supposed to be covering this file, or if it's something on the kops side.

Not sure if there's an easy way to test/reproduce this due to the duration of these certs.

(I know ideally I'd be doing control plane upgrades frequently enough that this doesn't matter, but would like to sort this out for if/when that doesn't happen again in future)

The text was updated successfully, but these errors were encountered:

johngmyers · 2023-09-28T21:26:41Z

No, kops expects you to update nodes at least every 455 days.

darintay · 2023-09-29T04:10:21Z

OK, good to at least know that's expected behavior, thanks.

doryer · 2023-12-03T15:03:46Z

@johngmyers rotateCertificates is not used for this kind of use-case?

aviadhaham · 2023-12-09T12:34:26Z

Hey @darintay @johngmyers, following the above:

my company has inherited k8s clusters made and maintained by kOps, and one of them (fortunately, the non-production one) started having this issue of nodes going offline due to the expiration of the kubelet-server.crt, as it seems.

I’m wondering what’s the best approach here?

I noticed you mentioned that a manual kops update cluster is needed every at least 455 days, but, what should I do if this cluster wasn’t updated in time, and some of its nodes already have their certificates expired?
When I try to update, and part of the procedure is cluster validation, it fails due to the problematic nodes (that their certificate expired) that their validation status returns as below:

node "ip-172-23-45-127.ec2.internal" of role "node" is not ready

We're pretty stuck here and we have more clusters (production) that we afraid that would have the same issue, and we won't know how to handle it properly.

Thank you in advance!

schwing · 2024-02-27T18:34:17Z

@aviadhaham This may be too late, but I'll comment here in case anyone else runs into this issue.

The --cloudonly flag is necessary when the control plane is down.

If you're attempting to recover without also running an update to limit the number of moving parts during recovery, you'll need the --force flag to tell kops to update even if no updates are required.

Fixing the control plane first before moving on to nodes is a good idea, so an example of doing that first: kops rolling-update cluster --instance-group-roles master --cloudonly --force --yes. Once the control plane is healthy and the Kubernetes API is working again, do similar for the node instance group role or individual instance groups.

Of course, updating more often in the future to avoid this should be a priority--but this will get things running again so you can focus on updating.

k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Sep 27, 2023

darintay closed this as completed Sep 29, 2023

hostops mentioned this issue Mar 12, 2024

Wrong kube-proxy mount bind propagation causes node certificates to not update itself and expire #16400

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

/srv/kubernetes/kubelet-server.crt expired, did not auto-renew #15970

/srv/kubernetes/kubelet-server.crt expired, did not auto-renew #15970

darintay commented Sep 27, 2023

johngmyers commented Sep 28, 2023

darintay commented Sep 29, 2023

doryer commented Dec 3, 2023

aviadhaham commented Dec 9, 2023

schwing commented Feb 27, 2024 •

edited

Loading

/srv/kubernetes/kubelet-server.crt expired, did not auto-renew #15970

/srv/kubernetes/kubelet-server.crt expired, did not auto-renew #15970

Comments

darintay commented Sep 27, 2023

johngmyers commented Sep 28, 2023

darintay commented Sep 29, 2023

doryer commented Dec 3, 2023

aviadhaham commented Dec 9, 2023

schwing commented Feb 27, 2024 • edited Loading

schwing commented Feb 27, 2024 •

edited

Loading