diff --git a/docs/cluster_spec.md b/docs/cluster_spec.md index 53bdd7eae5538..b5783744eaf5b 100644 --- a/docs/cluster_spec.md +++ b/docs/cluster_spec.md @@ -508,7 +508,7 @@ To prepare the customized client-ca file on master nodes, the user can either us In the case that the user would use a customized client-ca file, it is common that the kubernetes CA (`/srv/kubernetes/ca/crt`) need to be appended to the end of the client-ca file. One way to append the ca.crt to the end of the customized client-ca file is to write an [kop-hook](https://kops.sigs.k8s.io/cluster_spec/#hooks) to do the append logic. -Kops will have [CA rotation](https://kops.sigs.k8s.io/rotate-secrets/) feature soon, which would refresh the kubernetes cert files, including the ca.crt. If a customized client-ca file is used, when kops cert rotation happens, the user is responsible to update the ca.crt in the customized client-ca file. The refresh ca.crt logic can also be achieved by writing a kops hook. +Kops has a [CA rotation](operations/rotate-secrets.md) feature, which refreshes the Kubernetes certificate files, including the ca.crt. If a customized client-ca file is used, when kOps cert rotation happens, the user is responsible for updating the ca.crt in the customized client-ca file. The refresh ca.crt logic can also be achieved by writing a kops hook. See also [Kubernetes certificates](https://kubernetes.io/docs/concepts/cluster-administration/certificates/) diff --git a/docs/operations/rotate-secrets.md b/docs/operations/rotate-secrets.md new file mode 100644 index 0000000000000..296e05251f6be --- /dev/null +++ b/docs/operations/rotate-secrets.md @@ -0,0 +1,248 @@ +# How to rotate all secrets / credentials + +There are two types of credentials managed by kOps: + +* "secrets" are symmetric credentials. + +* "keypairs" are pairs of X.509 certificates and their corresponding private keys. + The exceptions are "service-account" keypairs, which are stored as + certificate and private key pairs, but do not use any part of the certificates + other than the public keys. + + Keypairs are grouped into named "keysets", according to their use. For example, + the "kubernetes-ca" keyset is used for the cluster's Kubernetes general CA. + Each keyset has a single primary keypair, which is the one whose private key + is used. The remaining, secondary keypairs are either trusted or distrusted. + The trusted keypairs, including the primary keypair, have their certificates + included in relevant trust stores. + +## Rotating keypairs + +{{ kops_feature_table(kops_added_default='1.22') }} + +You may gracefully rotate keypairs of keysets that are either Certificate Authorities +or are "service-account" by performing the following procedure. Other keypairs will be +automatically reissued by a non-dryrun `kops update cluster` when their issuing +CA is rotated. + +### Create and stage new keypair + +Create a new keypair for each keyset that you are going to rotate. +Then update the cluster and perform a rolling update. +To stage all rotatable keysets, run: + +```shell +kops create keypair all +kops update cluster --yes +kops rolling-update cluster --yes +``` + +#### Rollback procedure: + +A failure at this stage is unlikely. To roll back this change: + +* Use `kops get keypairs` to get the IDs of the newly created keysets. +* Then use `kops distrust keypair` to distrust each of them by keyset and ID. +* Then use `kops update cluster --yes` +* Then use `kops rolling-update cluster --yes` + +### Export and distribute new kubeconfig certificate-authority-data + +If you are rotating the Kubernetes general CA ("kubernetes-ca" or "all") and +you are not using a load balancer for the Kubernetes API with its own separate +certificate, export a new kubeconfig with the new CA certificate +included in the `certificate-authority-data` field for the cluster: + +```shell +kops export kubecfg +``` + +Distribute the new `certificate-authority-data` to all clients of that cluster's +Kubernetes API. + +#### Rollback procedure: + +To roll back this change, distribute the previous kubeconfig `certificate-authority-data`. + +### Promote the new keypairs + +Promote the new keypairs to primary with: + +```shell +kops promote keypair all +kops update cluster --yes +kops rolling-update cluster --force --yes +``` + +As of the writing of this document, rolling-update will not necessarily identify all +relevant nodes as needing update, so should be invoked with the `--force` flag. + +#### Rollback procedure: + +The most likely failure at this stage would be a client of the Kubernetes API that +did not get the new `certificate-authority-data` and thus do not trust the +new TLS server certificate. + +To roll back this change: + +* Use `kops get keypairs` to get the IDs of the previous primary keysets, + most likely by identifying the issue dates. +* Then use `kops promote keypair` to promote each of them by keyset and ID. +* Then use `kops update cluster --yes` +* Then use `kops rolling-update cluster --force --yes` + +### Export and distribute new kubeconfig admin credentials + +If you are rotating the Kubernetes general CA ("kubernetes-ca" or "all") and +have kubeconfigs with cluster admin credentials, export new kubeconfigs +with new admin credentials for the cluster: + +```shell +kops export kubecfg --admin=DURATION +``` + +where `DURATION` is the desired lifetime of the admin credential. + +Distribute the new credentials to all clients that require them. + +#### Rollback procedure: + +To roll back this change, distribute the previous kubeconfig admin credentials. + +### Distrust the previous keypairs + +Remove trust in the previous keypairs with: + +```shell +kops distrust keypair all +kops update cluster --yes +kops rolling-update cluster --yes +``` + +#### Rollback procedure: + +The most likely failure at this stage would be a client of the Kubernetes API that +is still using a credential issued by the previous keypair. + +To roll back this change: + +* Use `kops get keypairs --distrusted` to get the IDs of the previously trusted keysets, + most likely by identifying the distrust dates. +* Then use `kops trust keypair` to trust each of them by keyset and ID. +* Then use `kops update cluster --yes` +* Then use `kops rolling-update cluster --force --yes` + +### Export and distribute new kubeconfig certificate-authority-data + +If you are rotating the Kubernetes general CA ("kubernetes-ca" or "all") and +you are not using a load balancer for the Kubernetes API with its own separate +certificate, export a new kubeconfig with the previous CA certificate +removed from the `certificate-authority-data` field for the cluster: + +```shell +kops export kubecfg +``` + +Distribute the new `certificate-authority-data` to all clients of that cluster's +Kubernetes API. + +#### Rollback procedure: + +To roll back this change, distribute the previous kubeconfig `certificate-authority-data`. + +## Rotating encryptionconfig + +See [the Kubernetes documentation](https://kubernetes.io/docs/tasks/administer-cluster/encrypt-data/#rotating-a-decryption-key) +for information on how to gracefully rotate keys in the encryptionconfig. + +Use `kops create secret encryptionconfig --force` to update the encryptionconfig secret. +Following that, use `kops update cluster --yes` and `kops rolling-update cluster --yes`. + +## Rotating other secrets + +[TODO: cilium_encryptionconfig, dockerconfig, weave_encryptionconfig] + +## Legacy procedure + +The following is the procedure to rotate secrets and keypairs in kOps versions +prior to 1.22. + +**This is a disruptive procedure.** + +### Delete all secrets + +Delete all secrets & keypairs that kOps is holding: + +```shell +kops get secrets | grep '^Secret' | awk '{print $2}' | xargs -I {} kops delete secret secret {} + +kops get secrets | grep '^Keypair' | awk '{print $2}' | xargs -I {} kops delete secret keypair {} +``` + +### Recreate all secrets + +Now run `kops update` to regenerate the secrets & keypairs. +``` +kops update cluster +kops update cluster --yes +``` + +kOps may fail to recreate all the keys on first try. If you get errors about ca key for 'ca' not being found, run `kops update cluster --yes` once more. + +### Force cluster to use new secrets + +Now you will have to remove the etcd certificates from every master. + +Find all the master IPs. One easy way of doing that is running + +``` +kops toolbox dump +``` + +Then SSH into each node and run + +``` +sudo find /mnt/ -name server.* | xargs -I {} sudo rm {} +sudo find /mnt/ -name me.* | xargs -I {} sudo rm {} +``` + +You need to reboot every node (using a rolling-update). You have to use `--cloudonly` because the keypair no longer matches. + +``` +kops rolling-update cluster --cloudonly --force --yes +``` + +Re-export kubecfg with new settings: + +``` +kops export kubecfg +``` + +### Recreate all service accounts + +Now the service account tokens will need to be regenerated inside the cluster: + +`kops toolbox dump` and find a master IP + +Then `ssh admin@${IP}` and run this to delete all the service account tokens: + +```shell +# Delete all service account tokens in all namespaces +NS=`kubectl get namespaces -o 'jsonpath={.items[*].metadata.name}'` +for i in ${NS}; do kubectl get secrets --namespace=${i} --no-headers | grep "kubernetes.io/service-account-token" | awk '{print $1}' | xargs -I {} kubectl delete secret --namespace=$i {}; done + +# Allow for new secrets to be created +sleep 60 + +# Bounce all pods to make use of the new service tokens +pkill -f kube-controller-manager +kubectl delete pods --all --all-namespaces +``` + +### Verify the cluster is back up + +The last command from the previous section will take some time. Meanwhile you can check validation to see the cluster gradually coming back online. + +``` +kops validate cluster --wait 10m +``` diff --git a/docs/releases/1.22-NOTES.md b/docs/releases/1.22-NOTES.md index f4807ec28acf4..001f876a8fa25 100644 --- a/docs/releases/1.22-NOTES.md +++ b/docs/releases/1.22-NOTES.md @@ -28,6 +28,9 @@ spec: This feature may be temporarily disabled by turning off the `TerraformManagedFiles` feature flag using `export KOPS_FEATURE_FLAGS="-TerraformManagedFiles"`. +* kOps now implements graceful rotation of its Certificate Authorities and the service + account signing key. See the documentation on [How to rotate all secrets / credentials](../operations/rotate-secrets.md) + * New clusters running Kubernetes 1.22 will have AWS EBS CSI driver enabled by default. # Breaking changes diff --git a/docs/rotate-secrets.md b/docs/rotate-secrets.md deleted file mode 100644 index 1f9618cb3bfb2..0000000000000 --- a/docs/rotate-secrets.md +++ /dev/null @@ -1,81 +0,0 @@ -# How to rotate all secrets / credentials - -**This is a disruptive procedure.** - -## Delete all secrets - -Delete all secrets & keypairs that kOps is holding: - -```shell -kops get secrets | grep '^Secret' | awk '{print $2}' | xargs -I {} kops delete secret secret {} - -kops get secrets | grep '^Keypair' | awk '{print $2}' | xargs -I {} kops delete secret keypair {} -``` - -## Recreate all secrets - -Now run `kops update` to regenerate the secrets & keypairs. -``` -kops update cluster -kops update cluster --yes -``` - -kOps may fail to recreate all the keys on first try. If you get errors about ca key for 'ca' not being found, run `kops update cluster --yes` once more. - -## Force cluster to use new secrets - -Now you will have to remove the etcd certificates from every master. - -Find all the master IPs. One easy way of doing that is running - -``` -kops toolbox dump -``` - -Then SSH into each node and run - -``` -sudo find /mnt/ -name server.* | xargs -I {} sudo rm {} -sudo find /mnt/ -name me.* | xargs -I {} sudo rm {} -``` - -You need to reboot every node (using a rolling-update). You have to use `--cloudonly` because the keypair no longer matches. - -``` -kops rolling-update cluster --cloudonly --force --yes -``` - -Re-export kubecfg with new settings: - -``` -kops export kubecfg -``` - -## Recreate all service accounts - -Now the service account tokens will need to be regenerated inside the cluster: - -`kops toolbox dump` and find a master IP - -Then `ssh admin@${IP}` and run this to delete all the service account tokens: - -```shell -# Delete all service account tokens in all namespaces -NS=`kubectl get namespaces -o 'jsonpath={.items[*].metadata.name}'` -for i in ${NS}; do kubectl get secrets --namespace=${i} --no-headers | grep "kubernetes.io/service-account-token" | awk '{print $1}' | xargs -I {} kubectl delete secret --namespace=$i {}; done - -# Allow for new secrets to be created -sleep 60 - -# Bounce all pods to make use of the new service tokens -pkill -f kube-controller-manager -kubectl delete pods --all --all-namespaces -``` - -## Verify the cluster is back up - -The last command from the previous section will take some time. Meanwhile you can check validation to see the cluster gradually coming back online. - -``` -kops validate cluster --wait 10m -``` diff --git a/mkdocs.yml b/mkdocs.yml index d22af404dc4a5..da8f63413bf53 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -85,6 +85,7 @@ nav: - GPU setup: "gpu.md" - Label management: "labels.md" - Secret management: "secrets.md" + - Rotate Secrets: "operations/rotate-secrets.md" - Service Account Token Volume: "operations/service_account_token_volumes.md" - Moving from a Single Master to Multiple HA Masters: "single-to-multi-master.md" - Running kOps in a CI environment: "continuous_integration.md" @@ -131,7 +132,6 @@ nav: - Egress Proxy: "http_proxy.md" - Node Authorization: "node_authorization.md" - Node Resource Allocation: "node_resource_handling.md" - - Rotate Secrets: "rotate-secrets.md" - Terraform: "terraform.md" - Authentication: "authentication.md" - Contributing: