Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Deprecated] Proposal: Encrypt secrets (and potentially others) at rest #454

Closed

Conversation

andrewsykim
Copy link
Member

@andrewsykim andrewsykim commented Mar 14, 2017

Deprecated in favour of #607

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Mar 14, 2017
@smarterclayton smarterclayton self-assigned this Mar 14, 2017
@smarterclayton
Copy link
Contributor

Nit: break lines at 100 or 120 char to make line reviewing easier.

@smarterclayton smarterclayton changed the title Encryption Proposal Proposal: Encrypt secrets (and potentially others) at rest Mar 15, 2017

## Abstract

The scope of this proposal is to ensure that resources can be encrypted at the datastore layer. Encrypting data at REST and via third party vendors is a desired feature but outside the scope of this proposal. Encryption will be optional for any resource but we suspect it will be used for the Secret resource in most cases.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused by this line. Usually I would read "encryption at rest" as data encrypted in the datastore, when it's on disk. But you've uppercased 'rest' and said it won't be supported. Are you referring to some network level encryption that won't be supported?

It also says that encryption via third party vendors is outside of scope, but if I understand the rest of the doc, it's only the implementation of an encryption provider for third party vendors that's outside of scope. Is my understanding correct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, just the implementation is outside the scope. When we develop the built in provider, it should be behind a pluggable interface.

I think my understanding of "encryption at rest" may be wrong here. My understanding of "encryption at rest" is that HTTP calls between a client and the server will have secrets encrypted. This requires a lot more work since the kubelet now needs to have the encryption keys as well. For this proposal we meant that the secret stored in etcd will be encrypted. I'll make sure to clarify that.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can say encryption over the network is out of scope. API users should be using TLS connections. The kubelet can do that and there is work in flight to improve it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we have an end to end encrypted communication story today.


How encryption keys are delivered to the machine running the Kubernetes apiserver is of relevance - we assume that the encryption at rest pattern is loosely coupled to how those keys are delivered and secured on disk.

In general, full disk encryption of the volumes storing etcd data is generally preferred - this proposal focuses on scenarios where additional protection is desired against malicious parties gaining read access to the etcd API or a running etcd instance without access to memory of the etcd process.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also protects against access to etcd backups allowing access to privileged data.

## High level design
Before a resource is written to etcd and before it is read, an encryption provider will take the plaintext data and encrypt it.
These providers will be able to be created and turned on depending on the users needs or requirements and will adhere to an encryption interface.
This interface will provide the abstraction to allow various encryption mechanisms to be implemented, as well as for the method of encryption to be rotated over time.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean an API mechanism for rotating the key, analogous to how there is a Vault API to rotate the key? (https://www.vaultproject.io/api/secret/transit/index.html#rotate-key)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly not API mechanism. At a minimum all at rest data stored with some encryption patterns MUST support key rotation.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. does this mean that every secret could potentially be encrypted with a different provider?
  2. with regards to rotation and encryption mechanism ... if this is not triggered from the API, is it at least exposed on the object? (like e.g. this key has been in use for 35d, it is a 128bit AES key, and is encrypted using AES-GCM)
  3. how do you ensure that key rotation is indeed implemented / supported?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated key rotation might be desirable but IMO not necessary. What we did in swarm was piggyback on transitions from an "unlocked" cluster to a "locked" cluster, rotating both the KEK and the DEK (and doing a snapshot w/ the new DEK). This allows users to force a full rotation of keys/re-encryption of data, as a byproduct of the operation that will need it the most (transitioning from a state where a KEK is on disk to a state where the KEK only lives in memory)

These providers will be able to be created and turned on depending on the users needs or requirements and will adhere to an encryption interface.
This interface will provide the abstraction to allow various encryption mechanisms to be implemented, as well as for the method of encryption to be rotated over time.
It should be possible for the distribution of keys to the apiserver to be separated out (injected in by a higher level security process) or to be directly
requested by the provider implementation. For the first iteration, a default provider will be developed and will run as part of the kube-apiserver.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the first iteration, a default provider that handles encryption in-process using a locally stored key will be developed

Is that what you have in mind?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, lets resolve how the key is going to get delivered for the default implementation. Are we only supporting a file on disk? What do we do if it is missing or the wrong key? Should we support passing it in "manually" somehow? If so, will the cluster still work while we're waiting on the key?

For the near term, we identify a single encryption provider out of the box:

### AES-GCM provider
The simplest possible provider is an AES-GCM encrypter/decrypter using AEAD, where we create a unique nonce on each new write to etcd, use that as the IV for AES-GCM of the value (the JSON or protobuf data) along with a set of authenticated data to create the ciphertext, and then on decryption use the nonce and the authenticated data to decode. The keys come from configuration on the local disk (potentially decrypted at startup time using a stronger password protected key, or in the future from an alternative source).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is authenticated data?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Authenticated data is part of the AEAD definition and ensures that someone with access to ciphertext can't trick the server into using the value in a different context. https://en.wikipedia.org/wiki/Authenticated_encryption

The AD we are proposing is the etcd key - that means an attacker with access to etcd can't take a cipher text from one key, put it under another key (under the attacker's control) and then see the contents in the API (where the apiserver decodes it for that user).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that sounds like sensible and secure defaults for a "default" provider 👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation started here:
kubernetes/kubernetes#41939

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this question is coming late to the game, but was encrypting the whole etcd database ever considered? I think it would make attacks like this harder and delivers similar security benefits. Docker seems to do it at this level:
https://docs.docker.com/engine/swarm/swarm_manager_locking/

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this came up before in discussion, but the overhead was thought to be too much to encrypt everything.

We also discussed at one point storing the database on an encrypted volume, but that doesn't fit all deployment scenarios and breaks many setups.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick question, if etcd had encryption-at-rest support, would this be necessary? Also, if this is a normal question asked by new people like me :-) , could we add a section to the document explaining why that direction was not taken?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We haven't found it to be too much overhead in Swarm. Also greatly simplifies the model to just encrypt everything, as there are definitely other important items in the store that can get the benefit of the authenticity as well as encryption.


To enable encryption a user will issue a PUT to an endpoint such as `/rotate`. If this is the first time this API has ever been called the API server
will generate a key (unecrypted DEK), encrypt it with the KEK in slot 1, and encrypt all secrets with the DEK. If this is the second + N time the API
has been called the API server will encrypt the DEK with the KEK in slot N+1 and do a compare-and-swap on the DEK stored in etcd.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How many DEKs will there be? One per data item, or one per database?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One per database was my initial thinking.

rotate API endpoint eventually. To account for failure scenarios during key rotation, the old and new DEK will be stored in etcd during the rotation.

## Master Configuration
In order to enable encryption, a user must first create a KEK DB file and tell the API server to use it with --key-encryption-key-db-path=/path/to/kekdb/file. The file will be a simple YAML file that lists all of the keys:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about not making this flag specific to a database? The database is an implementation detail that may only be relevant to the default encryption provider. If the flag is just something like --encryption-provider=/path/to/encryption-provider-config.yml then for the default implementation config file can include the database configuration, but for an implementation backed by Vault, the config file would contain something like a host:port and some authentication credentials.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any objections to having the encryption provider config file also contain the resources to encrypt?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How's something like this?

kind: EncryptionProvider
version: crypto/v1alpha
metadata:
  name: kube-secrets-encryption
spec:
  provider: default
  resources:
   - v1/Secrets
  keys:
   - foo
   - bar
   - baz

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure on this one. It might make sense for the resources to encrypt to be outside the encryption provider. Would putting the resources inside the encryption provider mean the encryption provider would have to be configured for all resources so that it could decide if it should or shouldn't encrypt?

* A local HSM implementation that retrieves the keys from the secure enclave prior to reusing the AES-GCM implementation (initialization of keys only)
* Exchanging a local temporary token for the actual decryption tokens from a networked secret vault
* Decrypting the AES-256 keys from disk using asymmetric encryption combined with a user input password

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Sending the data over the network to a key management system for encryption and decryption (Google KMS, Amazon KMS, Hashicorp Vault w/ Transit backend)

* Decrypting the AES-256 keys from disk using asymmetric encryption combined with a user input password

### Backwards Compatibility
Once a user encrypts any resource in etcd, they are locked to that Kubernetes version and higher unless they choose to manually decrypt that resource in etcd. This will be discouraged. It will be highly recommended that users discern if their Kubernetes cluster is on a stable version before enabling encryption.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this a general thing about transitioning encryption providers? Other than the basic enabling an encryption provider on an existing cluster, there doesn't seem to be any discussion of support for transitioning encryption providers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this requires section on these two topics:

  1. How rotation in storage happens
  2. How encryption is enabled

Ideally 1 and 2 should follow the same pattern (adding a new key is a lot like turning on encryption at first).


It should be easy to introduce new variations on the basic provider - such as those that require a password via STDIN on startup to decrypt the keys on disk, or wish to exchange a temporary token on disk for the actual keys from a third party security server.

We expect that the provider implementation may therefore be layered or composed and that as much as possible providers should be separated behind interfaces.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean layered or composed? Like, multiple encryption providers at the same time?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multiple encryption providers at the same time only for migration purposes.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So are you envisioning a chain, something like:

REST Request -> ... -> Encryption Provider 1 -> Encryption Provider 2 -> etcd

Or more like something (where only one of the encryption providers is invoked):

REST Request -> ... -> (Encryption Provider 1 | Encryption Provider 2) -> etcd

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing the second, but this needs to be highlighted in the document.

@@ -0,0 +1,204 @@
# Encryption
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @andrewsykim, this is an interesting proposal. I'm doing some work to figure out the direction for secrets on the google side this quarter, it would be great to discuss this over VC with @smarterclayton and @jcbsmpsn.

I'm particularly interested in the use cases driving building this into kubernetes vs. taking advantage of the various external stores. Especially given there's a key management system for every cloud, and ones like vault that are already brokers for many different authn/z types and storage backends. If we could easily authenticate to a store that already handles key storage, encryption, rotation, and logging, would we still build it into kubernetes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You raise some good questions, I do believe there's been some talk around this already in google docs.

It's true that most clouds do support key management systems out of the box, hence why we want the encryption interface to be pluggable for those use cases. @smarterclayton may have stronger/better reasonings than I do on this but I had thought we want to support a builtin solution for this because:

  1. We need to support this for bare metal setups
  2. Using third party vendors like Vault would require support for the storage backends it depends on.
  3. Using third party tools like Vault means any pods that require Secrets heavily depend on the availability of those vendors. This adds operational overhead for cluster admins.

For 2) and 3) I can see Kubernetes supporting a highly available cluster add-on for Vault that would work in most setups (similar to kube-dns, not required but highly recommended and a highly available setup is provided by the kubernetes community).

For 2) etcd is an optional storage backend for Vault so we may not have to worry about adding an additional storage backend, but there are some issues we should address if we go down that path (e.g. Vault putting too much load on etcd servers will make master server unavailable).

Would love to discuss this with you in more details, feel free to ping me on kubernetes slack to setup a time :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally I think the direction should be to make it as easy as possible to use a pluggable external secret store, and kubernetes should avoid re-implementing features, and accruing maintenance cost, of those stores.

Adding encryption and associated key rotation as described here is a step down this re-implementation path. Probably followed by very detailed audit logging, followed by fancier authn/authz features. We need to draw a line under how far we want to travel down this path. If we want to include this, we should be explicit that it's as far as we're going re-inventing the wheel.

There's some danger that adding encryption here will tip some people into using k8s secrets when really the risk profile of what they are storing means they should be moving to an external store that has stronger audit trail guarantees and various other security features. Larger organizations will probably be motivated to use external stores regardless, because they are already storing other secrets there and just want to centralize on one system.

Re plugging in other stores, I think the right way to do this is to give pods (and perhaps nodes) identities they can use to talk to the stores directly, but this is out of scope for this proposal.

Re HA: Even if you are air-gapped and running your own vault installation you can make it HA with some work. The kubernetes piece of availability might be providing some flexibility in allowing nodes to access and cache HA secrets vs. the normal least-privilege path where only the pod can retrieve the secret. But this is also out of scope here.

I'm definitely still interested in meeting to talk about this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Secrets are so fundamental to applications, and the diversity of key stores so broad, that we need to be able to do two things:

  1. Store secrets and help distribute them to pods, in a basic and secure fashion
  2. Enable deep integration with sophisticated key store, management, and audit solutions

So those are the 80% and the 20%. The 80% has to work - the 20% has to be correct. It has to be possible to transition from 1 to 2. The items on the secrets roadmap roughly talk about enabling the remainder of the 80% and the transition to the 20%.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a big interest in this as well dealing with secrets. Happy to join in on your discussions @andrewsykim @destijl @jcbsmpsn

In discussing with @smarterclayton, like he mentions a bit the next comment, we wanted to find a way to initially encrypt things. Then deal with more complex integrations further down the line (i.e. secret management).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I think we're all in agreement to get this done and we only have a few weeks left before 1.7 code freeze.

out, err := aead.Open(nil, nonce, cipherText, authenticatedData)
```

## Key Generation & Rotation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this section is somewhat optional to accomplish the primary goal.

  1. store details encrypted
  2. enable rotation of keys
  3. make keys easy to manage

I don't think 3 is required to accomplish 1-2 - I'd prefer to see the proposal reflect that (focus on the requirements for rotation, then talk about future work to reduce the complexity). Rotation requires a set of core patterns (behavior in the storage) and then should have clear layers for key management. Key management is not core.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So:

  1. How to do rotation
  2. How to do so with keys passed in externally
  3. How to do so with a managed set of internal keys
  4. How to do so with an external KSM correctly and safely.

I'm most concerned with 1-2 and the implications of 4.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would taking an API driven approach for key rotation be part of "3. make keys easy to manage" or "2. enable rotation of keys"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i.e. are we okay with users having to manually restart api servers after adding a new KEK?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My impression from the last meeting was that we're okay with users having to restart servers themselves to reduce the complexity of encryption. Just making sure I understood correctly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in stage 1, restarting services themselves is the best approach, followed by layering more powerful options down the road.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Restating so I understand, the thinking is, key rotation will be implemented, probably be accomplished by modifying config files and restarting the API server to pick up changes. Making it easy/pleasant is future work.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this process meant for rotating the KEK only? Or also for DEKs?
I agree that for rotating KEKs, a restart first is fine.

@philips
Copy link
Contributor

philips commented Apr 10, 2017

Overall I think this proposal looks good. Although, I did review an early draft and help with the rotation/rekey stuff :)

Would love another set of eyes from @ericchiang or @lpabon

KEKs should never be stored in etcd and in most cases be on the same volume as the apiserver. This provides consistency for the apiserver but still protects against compromises to the etcd volume.

In order to take an API driven approach for key rotation, new API objects will be defined:
* Key Encryption Key (KEK) - key used to unlock the Data Encryption Key. Stored on API server nodes.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of 'Stored on API server nodes' how about 'Stored on API server nodes or remotely depending on the encryption provider implementation.'

I think for HSM and Google/Amazon KSM encryption providers, the HSM/KSM will hold the KEK.


To enable encryption a user will issue a PUT to an endpoint such as `/rotate`. If this is the first time this API has ever been called the API server
will generate a key (unecrypted DEK), encrypt it with the KEK in slot 1, and encrypt all secrets with the DEK. If this is the second + N time the API
has been called the API server will encrypt the DEK with the KEK in slot N+1 and do a compare-and-swap on the DEK stored in etcd. There will be one DEK per database.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this is already what you are thinking and I've just misunderstood, but can you outline the procedure for rotating the DEK as well, not just the procedure for re-encrypting the same DEK with the new KEK?

Or is that what you mean in the next paragraph where it talks about the new and the old DEK. That there will be multiple DEKs stored in etcd (and I don't mean the same DEK encrypted with different KEKs).

In multi-tenant Kube clusters secrets tend to have the highest load factor (there are 20-40 resources types per namespace, but most resources
only have 1 instance where secrets might have 3-9 instances across 10k namespaces). Writes are uncommon, creates usually happen
only when a namespace is created, and reads are somewhat common.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the meeting @smarterclayton indicated that there are two secrets access patterns that are most common. I think it was getting and listing. Can we include that so that alternate encryption provider implementers can see why keeping most encryption/decryption operations in process in the API server is preferable?

This must be done within the Kubernetes storage interfaces - we will introduce a new API to the Kube storage layer that transforms the serialized object
into the desired at-rest form and provides hints as to whether no-op updates should still persist (when key rotation is in effect).
```go
// ValueTransformer allows a string value to be transformed before being read from or written to the underlying store. The methods
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you mean byte slice value instead of string value?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, originally was strings in etcd2. Now byte slice.


It should be possible to separate the mechanism by which we encrypt the data in etcd from the mechanism whereby the keys to perform that mechanism are loaded onto the appropriate master. The provider MAY perform dynamic retrieval of keys from a hardware security module, retrieve a set of keys from a cloud-native security provider, or prompt the user for a password to decrypt the encryption keys from rest on disk, or some combination of the above.

During encryption, only a single provider is required. During decryption, multiple providers or keys may be in use (when migrating from an older version of a provider, or when rotating keys), and thus the ValueTransformer implementation must be able to delegate to the appropriate provider.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you imagine on-the-fly decryption and reencrypting when rotating keys which is triggered when accessing a resource?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only when writing a new version of the resource. The design of the API today is that we migrate at rest content by trigging a no-op PUT - if the encoded data is != the underlying data in etcd we would reencrypt. In the write path, both the API layer (the plaintext) and the cipher process can signal that a write is required, so either content changes or a new configured encryption key can be used to ensure that a write is performed.


Each encryption provider will have a unique string identifier to ensure versioning of contents on disk and to allow future schemes to be replaced.

It should be possible to separate the mechanism by which we encrypt the data in etcd from the mechanism whereby the keys to perform that mechanism are loaded onto the appropriate master. The provider MAY perform dynamic retrieval of keys from a hardware security module, retrieve a set of keys from a cloud-native security provider, or prompt the user for a password to decrypt the encryption keys from rest on disk, or some combination of the above.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would you see prompt for a password being used?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With swarm, either you call an API to unlock the swarm w/ the corresponding key, or the user provides it:

➜  ~ docker swarm update --autolock=true
Swarm updated.
To unlock a swarm manager after it restarts, run the `docker swarm unlock`
command and provide the following key:

    SWMKEY-1-G5+GnzpF6hlgsakXUjPV0Pf4XsXgBoa9CLEQmn2mpOE

Please remember to store this key in a password manager, since without it you
will not be able to restart the manager.
[ RESTART Docker ]
➜  ~ docker swarm unlock
Please enter unlock key:

// must be able to undo the transformation caused by the other.
type ValueTransformer interface {
// TransformFromStorage may transform the provided data from its underlying storage representation or return an error.
// Stale is true if the object on disk is stale and a write to etcd should be issued, even if the contents of the object
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mind describing stale state a little more? Maybe an example section? How would the implementation determine such state?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stale usually means "the data is encrypted with an older primary key"

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, thanks

For the near term, we identify a single encryption provider out of the box:

### AES-GCM provider
The simplest possible provider is an AES-GCM encrypter/decrypter using AEAD, where we create a unique nonce on each new write to etcd, use that as the IV for AES-GCM of the value (the JSON or protobuf data) along with a set of authenticated data to create the ciphertext, and then on decryption use the nonce and the authenticated data to decode. The keys come from configuration on the local disk (potentially decrypted at startup time using a stronger password protected key, or in the future from an alternative source).
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick question, if etcd had encryption-at-rest support, would this be necessary? Also, if this is a normal question asked by new people like me :-) , could we add a section to the document explaining why that direction was not taken?


The provider will be assigned a versioned identifier to uniquely pair the implementation with the data at rest, such as “k8s-aes-gcm-v1”. Any implementation that attempts to decode data associated with this provider id must follow a known structure and apply a specific algorithm.

The provider would take a set of keys and unique key identifiers from the command line, with the key values stored on disk. One key is identified as the write key, all others are used to decrypt data from previous keys. Keys must be rotated more often than every 2^32 writes
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could from the command line be expanded? Seems vague.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 2^32? And are you referring to the soft block limit or to the nonce limit? IIRC, nonce limit is dependent on the size of the payload being encrypted.

If we're using 96bit IVs we should have guarantees with a higher nonce limit than 2^32, but we should probably ping lvh to see what he things.


## Key Generation & Rotation

To account for easy key rotation, an additional layer of abstraction is introduced where keys used for resource encryption is encrypted by another set of keys (Key Encryption Keys aka KEK).
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a great idea for any application. Does it need to be part of Kubernetes? Can it be its own pkg and be included by Kubernetes?

}
```

To enable encryption a user will issue a PUT to an endpoint such as `/rotate`. If this is the first time this API has ever been called the API server
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably one of the key usability definitions in this paper. Could this be expanded in its own section?

rotate API endpoint eventually. To account for failure scenarios during key rotation, the old and new DEK will be stored in etcd during the rotation.

## Master Configuration
In order to enable encryption, a user must first create a KEK DB file and tell the API server to use it with --key-encryption-key-db-path=/path/to/kekdb/file. The file will be a simple YAML file that lists all of the keys:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I think user I think the user of Kubernetes. I think user here means the administrator of Kubernetes, right?


It should be easy to introduce new variations on the basic provider - such as those that require a password via STDIN on startup to decrypt the keys on disk, or wish to exchange a temporary token on disk for the actual keys from a third party security server.

We expect that the provider implementation may therefore be layered or composed and that as much as possible providers should be separated behind interfaces.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing the second, but this needs to be highlighted in the document.

@smarterclayton
Copy link
Contributor

Etcd at rest storage would not defend against a client that gained read access to etcd.


The provider would take a set of keys and unique key identifiers from the command line, with the key values stored on disk. One key is identified as the write key, all others are used to decrypt data from previous keys. Keys must be rotated more often than every 2^32 writes

The provider should use the recommended Go defaults for all crypto settings unless otherwise noted. We should use AES-256 keys (32 bytes).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

- baz
```

The user will also need to specify the encryption provider and the resources to encrypt as follows:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is overly complicated. k8s admins shouldn't have to choose which resources to encrypt. Either we should pick which resources should be encrypted or they should all be encrypted.


Allowing sensitive data to be encrypted adheres to best practices as well as other requirements such as HIPAA.

How encryption keys are delivered to the machine running the Kubernetes apiserver is of relevance - we assume that the encryption at rest pattern is loosely coupled to how those keys are delivered and secured on disk.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could use a similar pattern to Swarm secrets for lock/unlock of the cluster: https://docs.docker.com/engine/swarm/swarm_manager_locking/ -- this pattern really helps for having an option to keep the key-encrypting keys protected.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Building at least the UX for this in the first version allows for expansion to something like secret sharing / key splitting a la https://github.com/hashicorp/vault and https://github.com/cloudflare/redoctober


## Abstract

The scope of this proposal is to ensure that resources can be encrypted at the datastore layer. Encrypting data over the network and via third party vendors is a desired feature but outside the scope of this proposal. There is future work to be done to enable end to end encryption, until then clients of the kubernetes API should be using TLS connections. Encryption will be optional for any resource but we suspect it will be used for the Secret resource in most cases.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should really strive to have the secret resource come encrypted by default instead of being optional. IMHO "we suspect it will be used for the Secret resource in most cases" isn't good enough.


How encryption keys are delivered to the machine running the Kubernetes apiserver is of relevance - we assume that the encryption at rest pattern is loosely coupled to how those keys are delivered and secured on disk.

In general, full disk encryption of the volumes storing etcd data is preferred - this proposal focuses on scenarios where additional protection is desired against malicious parties gaining read access to the etcd API or its backups or a running etcd instance without access to memory of the etcd process.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A quick comment here that the recommendation should probably include encrypted swap (no swap is also common in production deployments).

These providers will be able to be created and turned on depending on the users needs or requirements and will adhere to an encryption interface.
This interface will provide the abstraction to allow various encryption mechanisms to be implemented, as well as for the method of encryption to be rotated over time.
It should be possible for the distribution of keys to the apiserver to be separated out (injected in by a higher level security process) or to be directly
requested by the provider implementation. For the first iteration, a default provider that handles encryption in-process using a locally stored key will be developed.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the plans on the default mode? locally stored key by default?

When the storage layer of Kubernetes is initialized for some resource, an implementation of this interface that manages encryption will be passed down. Other resources can use a no-op provider by default.

## Encryption Provider
An encryption provider implements the ValueTransformer interface. Out of the box this proposal will implement encryption using a standard AES-GCM performing AEAD, using the standard Go library for AES-GCM, with the key configuration provided at process startup.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend not using AES-GCM, and using XSalsa20 and Poly1305 instead. The software implementation of the latter in general CPUs is faster and general implementations of GCM are vulnerable to cache-timing attacks. Also the higher level interface of https://godoc.org/golang.org/x/crypto/nacl/secretbox is probably what we want to use. A lot less room for error with just Open() and Seal()

## Encryption Provider
An encryption provider implements the ValueTransformer interface. Out of the box this proposal will implement encryption using a standard AES-GCM performing AEAD, using the standard Go library for AES-GCM, with the key configuration provided at process startup.

Other encryption or key distribution implementations are possible such as AWS KMS, Google Cloud KMS and Hashicorp Vault just for example depending on the environment the cluster is deployed into as well as its capabilities.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we ever want to have a remote encryption API for every write due to performance reasons. The right thing to do here might be to have the Data Encryption Key encrypted at rest by a Key Encrypting Key, and ask for the decryption at boot from the remote system (KMS, etc).

@destijl
Copy link
Member

destijl commented May 8, 2017

Hi all, Andrew mentioned he didn't have time to devote to this PR and he was happy for us to take it over to move it forward. So I've split out this content into a new PR:
#607

so we can get it to resolution. Andrew thanks for the help and your understanding here. If you could modify the description to mark this as deprecated and point to the new one that would be appreciated.

Thanks,
Greg

@andrewsykim andrewsykim changed the title Proposal: Encrypt secrets (and potentially others) at rest [Deprecated] Proposal: Encrypt secrets (and potentially others) at rest May 11, 2017
@k8s-github-robot k8s-github-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Aug 15, 2017
@php-coder
Copy link
Contributor

Let's close it.

@castrojo
Copy link
Member

This change is Reviewable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet