Skip to content

Commit

Permalink
Add documentation for keypair rotation
Browse files Browse the repository at this point in the history
  • Loading branch information
johngmyers committed Jul 10, 2021
1 parent 0bc0004 commit 2da21ab
Show file tree
Hide file tree
Showing 5 changed files with 254 additions and 83 deletions.
2 changes: 1 addition & 1 deletion docs/cluster_spec.md
Original file line number Diff line number Diff line change
Expand Up @@ -508,7 +508,7 @@ To prepare the customized client-ca file on master nodes, the user can either us

In the case that the user would use a customized client-ca file, it is common that the kubernetes CA (`/srv/kubernetes/ca/crt`) need to be appended to the end of the client-ca file. One way to append the ca.crt to the end of the customized client-ca file is to write an [kop-hook](https://kops.sigs.k8s.io/cluster_spec/#hooks) to do the append logic.

Kops will have [CA rotation](https://kops.sigs.k8s.io/rotate-secrets/) feature soon, which would refresh the kubernetes cert files, including the ca.crt. If a customized client-ca file is used, when kops cert rotation happens, the user is responsible to update the ca.crt in the customized client-ca file. The refresh ca.crt logic can also be achieved by writing a kops hook.
Kops has a [CA rotation](operations/rotate-secrets.md) feature, which refreshes the Kubernetes certificate files, including the ca.crt. If a customized client-ca file is used, when kOps cert rotation happens, the user is responsible for updating the ca.crt in the customized client-ca file. The refresh ca.crt logic can also be achieved by writing a kops hook.

See also [Kubernetes certificates](https://kubernetes.io/docs/concepts/cluster-administration/certificates/)

Expand Down
249 changes: 249 additions & 0 deletions docs/operations/rotate-secrets.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,249 @@
# How to rotate all secrets / credentials

There are two types of credentials managed by kOps:

* "secrets" are symmetric credentials.

* "keypairs" are pairs of X.509 certificates and their corresponding private keys.
The exceptions are "service-account" keypairs, which are stored as
certificate and private key pairs, but do not use any part of the certificates
other than the public keys.

Keypairs are grouped into named "keysets", according to their use. For example,
the "kubernetes-ca" keyset is used for the cluster's Kubernetes general CA.
Each keyset has a single primary keypair, which is the one whose private key
is used. The remaining, secondary keypairs are either trusted or distrusted.
The trusted keypairs, including the primary keypair, have their certificates
included in relevant trust stores.

## Rotating keypairs

{{ kops_feature_table(kops_added_default='1.22') }}

You may gracefully rotate keypairs of keysets that are either Certificate Authorities
or are "service-account" by performing the following procedure. Other keypairs will be
automatically reissued by a non-dryrun `kops update cluster` when their issuing
CA is rotated.

### Create and stage new keypair

Create a new keypair for each keyset that you are going to rotate.
Then update the cluster and perform a rolling update.
To stage all rotatable keysets, run:

```shell
kops create keypair all
kops update cluster --yes
kops rolling-update cluster --yes
```

#### Rollback procedure:

A failure at this stage is unlikely. To roll back this change:

* Use `kops get keypairs` to get the IDs of the newly created keysets.
* Then use `kops distrust keypair` to distrust each of them by keyset and ID.
* Then use `kops update cluster --yes`
* Then use `kops rolling-update cluster --yes`

### Export and distribute new kubeconfig certificate-authority-data

If you are rotating the Kubernetes general CA ("kubernetes-ca" or "all") and
you are not using a load balancer for the Kubernetes API with its own separate
certificate, export a new kubeconfig with the new CA certificate
included in the `certificate-authority-data` field for the cluster:

```shell
kops export kubecfg
```

Distribute the new `certificate-authority-data` to all clients of that cluster's
Kubernetes API.

#### Rollback procedure:

To roll back this change, distribute the previous kubeconfig `certificate-authority-data`.

### Promote the new keypairs

Promote the new keypairs to primary with:

```shell
kops promote keypair all
kops update cluster --yes
kops rolling-update cluster --force --yes
```

As of the writing of this document, rolling-update will not necessarily identify all
relevant nodes as needing update, so should be invoked with the `--force` flag.

#### Rollback procedure:

The most likely failure at this stage would be a client of the Kubernetes API that
did not get the new `certificate-authority-data` and thus do not trust the
new TLS server certificate.

To roll back this change:

* Use `kops get keypairs` to get the IDs of the previous primary keysets,
most likely by identifying the issue dates.
* Then use `kops promote keypair` to promote each of them by keyset and ID.
* Then use `kops update cluster --yes`
* Then use `kops rolling-update cluster --force --yes`

### Export and distribute new kubeconfig admin credentials

If you are rotating the Kubernetes general CA ("kubernetes-ca" or "all") and
have kubeconfigs with cluster admin credentials, export new kubeconfigs
with new admin credentials for the cluster:

```shell
kops export kubecfg --admin=DURATION
```

where `DURATION` is the desired lifetime of the admin credential.

Distribute the new credentials to all clients that require them.

#### Rollback procedure:

To roll back this change, distribute the previous kubeconfig admin credentials.

### Distrust the previous keypairs

Remove trust in the previous keypairs with:

```shell
kops distrust keypair all
kops update cluster --yes
kops rolling-update cluster --yes
```

#### Rollback procedure:

The most likely failure at this stage would be a client of the Kubernetes API that
is still using a credential issued by the previous keypair.

To roll back this change:

* Use `kops get keypairs --distrusted` to get the IDs of the previously trusted keysets,
most likely by identifying the distrust dates.
* Then use `kops trust keypair` to trust each of them by keyset and ID.
[TODO - implement this command]
* Then use `kops update cluster --yes`
* Then use `kops rolling-update cluster --force --yes`

### Export and distribute new kubeconfig certificate-authority-data

If you are rotating the Kubernetes general CA ("kubernetes-ca" or "all") and
you are not using a load balancer for the Kubernetes API with its own separate
certificate, export a new kubeconfig with the previous CA certificate
removed from the `certificate-authority-data` field for the cluster:

```shell
kops export kubecfg
```

Distribute the new `certificate-authority-data` to all clients of that cluster's
Kubernetes API.

#### Rollback procedure:

To roll back this change, distribute the previous kubeconfig `certificate-authority-data`.

## Rotating encryptionconfig

See [the Kubernetes documentation](https://kubernetes.io/docs/tasks/administer-cluster/encrypt-data/#rotating-a-decryption-key)
for information on how to gracefully rotate keys in the encryptionconfig.

Use `kops create secret encryptionconfig --force` to update the encryptionconfig secret.
Following that, use `kops update cluster --yes` and `kops rolling-update cluster --yes`.

## Rotating other secrets

[TODO: cilium_encryptionconfig, dockerconfig, weave_encryptionconfig]

## Legacy procedure

The following is the procedure to rotate secrets and keypairs in kOps versions
prior to 1.22.

**This is a disruptive procedure.**

### Delete all secrets

Delete all secrets & keypairs that kOps is holding:

```shell
kops get secrets | grep '^Secret' | awk '{print $2}' | xargs -I {} kops delete secret secret {}

kops get secrets | grep '^Keypair' | awk '{print $2}' | xargs -I {} kops delete secret keypair {}
```

### Recreate all secrets

Now run `kops update` to regenerate the secrets & keypairs.
```
kops update cluster
kops update cluster --yes
```

kOps may fail to recreate all the keys on first try. If you get errors about ca key for 'ca' not being found, run `kops update cluster --yes` once more.

### Force cluster to use new secrets

Now you will have to remove the etcd certificates from every master.

Find all the master IPs. One easy way of doing that is running

```
kops toolbox dump
```

Then SSH into each node and run

```
sudo find /mnt/ -name server.* | xargs -I {} sudo rm {}
sudo find /mnt/ -name me.* | xargs -I {} sudo rm {}
```

You need to reboot every node (using a rolling-update). You have to use `--cloudonly` because the keypair no longer matches.

```
kops rolling-update cluster --cloudonly --force --yes
```

Re-export kubecfg with new settings:

```
kops export kubecfg
```

### Recreate all service accounts

Now the service account tokens will need to be regenerated inside the cluster:

`kops toolbox dump` and find a master IP

Then `ssh admin@${IP}` and run this to delete all the service account tokens:

```shell
# Delete all service account tokens in all namespaces
NS=`kubectl get namespaces -o 'jsonpath={.items[*].metadata.name}'`
for i in ${NS}; do kubectl get secrets --namespace=${i} --no-headers | grep "kubernetes.io/service-account-token" | awk '{print $1}' | xargs -I {} kubectl delete secret --namespace=$i {}; done

# Allow for new secrets to be created
sleep 60

# Bounce all pods to make use of the new service tokens
pkill -f kube-controller-manager
kubectl delete pods --all --all-namespaces
```

### Verify the cluster is back up

The last command from the previous section will take some time. Meanwhile you can check validation to see the cluster gradually coming back online.

```
kops validate cluster --wait 10m
```
3 changes: 3 additions & 0 deletions docs/releases/1.22-NOTES.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,9 @@ spec:
This feature may be temporarily disabled by turning off the `TerraformManagedFiles` feature flag
using `export KOPS_FEATURE_FLAGS="-TerraformManagedFiles"`.

* kOps now implements graceful rotation of its Certificate Authorities and the service
account signing key. See the documentation on [How to rotate all secrets / credentials](../operations/rotate-secrets.md)

* New clusters running Kubernetes 1.22 will have AWS EBS CSI driver enabled by default.

# Breaking changes
Expand Down
81 changes: 0 additions & 81 deletions docs/rotate-secrets.md

This file was deleted.

2 changes: 1 addition & 1 deletion mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,7 @@ nav:
- GPU setup: "gpu.md"
- Label management: "labels.md"
- Secret management: "secrets.md"
- Rotate Secrets: "operations/rotate-secrets.md"
- Service Account Token Volume: "operations/service_account_token_volumes.md"
- Moving from a Single Master to Multiple HA Masters: "single-to-multi-master.md"
- Running kOps in a CI environment: "continuous_integration.md"
Expand Down Expand Up @@ -131,7 +132,6 @@ nav:
- Egress Proxy: "http_proxy.md"
- Node Authorization: "node_authorization.md"
- Node Resource Allocation: "node_resource_handling.md"
- Rotate Secrets: "rotate-secrets.md"
- Terraform: "terraform.md"
- Authentication: "authentication.md"
- Contributing:
Expand Down

0 comments on commit 2da21ab

Please sign in to comment.