Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Certificate and access management for edge computing #2342

Closed
sraillard opened this issue Oct 5, 2020 · 10 comments
Closed

Certificate and access management for edge computing #2342

sraillard opened this issue Oct 5, 2020 · 10 comments
Assignees
Milestone

Comments

@sraillard
Copy link

sraillard commented Oct 5, 2020

Is your feature request related to a problem? Please describe.
Certificate CA is issued for 10 years and others certificates are issued for 1 year.
When the k3s server is restarted, these certificates are renewed if their validity is under 90 days.
In edge computing, you may have single node that you don't want to reboot and if you deploy a lot of nodes, you may not want to reboot them at regular interval to renew the certificates if there are working fine.

Describe the solution you'd like
For this kind of nodes, is-it possible to change the default validity period for each of the certificates issued (CA and other certificate)? In that case, we can choose a less risky interval than 1 year.

Describe alternatives you've considered
For remote scripting using the API server, if the admin certificate is expired (after one year), we'll not be able to interact with k3s API server. Even if the k3s server is restarted and certificates are renewed, we'll need to copy from the k3s server the new admin certificate. In version prior to 1.19, the admin user had a password and not a certificate to authenticate. Will it be possible to add an option during installation so we can set an admin password? Then we can use again the password authentication (that will never expires) instead of certificate.

Additional context
I have tried to add this line:

6e3cc80fffe8da7e33c1dfd65a6451f3,admin,admin,system:masters

at the end of the file /var/lib/rancher/k3s/server/cred/passwd and then restart the k3s server, but it doesn't seem to work.

@brandond
Copy link
Contributor

brandond commented Oct 5, 2020

Support for Basic authentication has been removed from upstream Kubernetes. Note that this is not just deprecated or disabled, but deleted from the codebase: kubernetes/kubernetes#89069

K3s has retained some basic auth code in the interim, but it is only used for bootstrapping new cluster members. In the future this will probably go away in favor of token auth or something else more in line with upstream.

@sraillard
Copy link
Author

Ok, good to know that the basic auth has been removed. As the token auth (or any equivalent) may not be ready soon, is there a way or a setting to increase the period of validity of the X509 certificates created during the k3s setup?

We already had some practical cases where edge servers have been running more than 10 years on field (yes it's possible) with theirs certificates expired, this isn't a good situation.

@brandond
Copy link
Contributor

brandond commented Oct 5, 2020

You have servers that have been up without a reboot or service restart for 10 years? All you have to do is schedule a periodic restart of the k3s service and the certs will get updated. Keep in mind that upstream Kubernetes itself has a much shorter support lifecycle so you're really pushing things with more than a couple years.

If you want to generate an admin client certificate that is valid for longer than the stock one, you can do so using the cert and key at /var/lib/rancher/k3s/server/tls/client-ca.* The certificate subject should be as described here: https://kubernetes.io/docs/reference/access-authn-authz/authentication/#x509-client-certs

For example, the default admin certificate has a subject of O=system:masters, CN=system:admin - this corresponds to a username of system:admin who is a member of the system:masters group. You can use the same username and group, or add custom RBAC roles to your clusters and use whatever usernames and groups you prefer.

@sraillard
Copy link
Author

Running for 10 years, yes, but with reboots of course! I understand that if we want to keep k3s running, we need to restart the k3s service like every week or month, so all the generated certificates will be renewed.

The problem is the admin certificate with 1 year validity, I guess it will be renewed but we have to get each time the new one. The best option seems to be creating a new admin certificate, like you recommend, with a longer validity.

I have seen some examples here #684 on how to sign the new certificate and you gave me the information about the values to put in the subject (but I think I can copy the subject of the default admin certificate to get the same rights).

@sraillard
Copy link
Author

Small bump here, just to know if there is way to push the expiration date of all certificates issued when k3s is installed. This will solve all the issue (no need to restart k3s, no need to issue a new admin cert).

@sraillard
Copy link
Author

I was able to get an admin certificate valid 10 years using these commands:

openssl x509 -x509toreq -in /var/lib/rancher/k3s/server/tls/client-admin.crt -out /var/lib/rancher/k3s/server/tls/client-admin.csr -signkey /var/lib/rancher/k3s/server/tls/client-admin.key
openssl x509 -req -in /var/lib/rancher/k3s/server/tls/client-admin.csr -CA /var/lib/rancher/k3s/server/tls/client-ca.crt -CAkey /var/lib/rancher/k3s/server/tls/client-ca.key -CAcreateserial -days 3650 | base64 -w0
cat /var/lib/rancher/k3s/server/tls/client-admin.key | base64 -w0
cat /var/lib/rancher/k3s/server/tls/server-ca.crt | base64 -w0

A CSR is extracted from the current admin certificate and used for creating a new certificate signed by the CA with a 10 years validity. Also the private key and the CA certificate for the API server are printed, these are needed for a remote connection.

But I'm asking again, if a setting could exist to make all the certificates created during the installation valid for 10 years, that would be great and solved all theses certificate issues.

@stale
Copy link

stale bot commented Jul 31, 2021

This repository uses a bot to automatically label issues which have not had any activity (commit/comment/label) for 180 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the bot can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the bot will automatically close the issue in 14 days. Thank you for your contributions.

@stale stale bot added the status/stale label Jul 31, 2021
@stale stale bot closed this as completed Aug 14, 2021
@lyuma
Copy link

lyuma commented May 25, 2022

Thank you, @sraillard . your comment was extremely helpful after I ran into this on my 1-year-old cluster.

k3s set the default client cert for root to expire after a year or so. This leads to a problem where you can't administer your own cluster because your root account's cert is expired. There's nothing as frustrating as being locked out of your own server. So I had to figure out how to fix this.

How to manually renew k3s certs.

find /var/lib/rancher/k3s/server/tls -name '*.crt' | while read fn; do echo $fn ===========; openssl x509 -text -in $fn | grep Not\ After; done.

Once you find the expired certs (they will be ones which expire one year after you set up your cluster), here is the script I ran:

#!/bin/bash
set -
set -x
[ -e "$1.csr" ] || openssl x509 -x509toreq -in "$1.crt" -out "$1.csr" -signkey "$1.key"
mv -i "$1.crt" "$1.crt.bak.$(date +%s)"
openssl x509 -req -in "$1.csr" -CA "client-ca.crt" -CAkey "client-ca.key" -CAcreateserial -days 3650 -out "$1.crt"
cat "client-ca.crt" >> "$1.crt"

you can use this process to update the client cert for /root/.kube/config as follows:

  1. Because we need to modify /root/.kube/config, I ran sudo -i to login as a root shell.
  2. cat /var/lib/rancher/k3s/server/tls/client-admin.key | base64 -w0; echo
  3. cat /var/lib/rancher/k3s/server/tls/client-admin.crt /var/lib/rancher/k3s/server/tls/server-ca.crt | base64 -w0; echo
  4. edit .kube/config and update client-key-data: and client-certificate-data: respectively.

Note that etcd/client.crt actually uses its own server-ca.key instead of client-ca within the etcd folder, so you need a slight modification of the script for this one. I don't know what any of these certs do so I just did all of them

Finally, client-k3s-controller.crt and client-kube-proxy.crt must be copied from /var/lib/rancher/k3s/server/tls to /var/lib/rancher/k3s/agent

(Note that replacing only the client-admin cert as in sraillard's post will allow basic kubectl functions to work, such as creating and deleting pods. However, pod creation may be stuck at "Pending", and kubectl logs and kubectl exec require other certs to function as well. k3s sets up some daemons / cron jobs that periodically manage the cluster, and that will error-spam until all the certs are replaced. This is why we must replace all the certs, as I show.)

@rancher-max
Copy link
Contributor

Most of this has already been validated per @mdrahman-suse. When he returns from PTO he can update the status and steps for individual scenarios. One more scenario being tested now by @est-suse.

@mdrahman-suse
Copy link

The validation of this feature is completed as per the scope below but not limited to, will continue running additional as per of validating #7081
The docs provided a good resource for feature usage
Tests:

  • Example script usability
  • Cert rotation on new cluster (single node, HA setup)
  • Cert rotation on upgraded cluster (single node, HA setup)
  • rotate-ca command flag usages
  • Error scenarios

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

6 participants