Join GitHub today
GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.Sign up
Client Side Encrypted Snapshot Repositories #41910
This concerns the encryption of snapshot data before it leaves the nodes.
Amazon and Azure, which support client side encryption, allow the keys to be managed by the client (us) or by their Key Management Service (Vault-like). They both use the Envelope Encryption method; each blob is individually AES-256 encrypted with a randomly generated (locally) key, and this key (Data/Content Encryption Key) is also encrypted with another Master Key (locally or by the Vault service) and then stored alongside the blob in its metadata. The envelope encryption facilitates Master Key rotation because only the small (D/C)EK key has to be re-encrypted, rather than the complete blob.
On the ES side we discussed on having a single fixed URN key handler at the repository settings level.
I believe this is the rough picture of the puzzle that we need to put together.
We oscillated between implementation alternatives, and I will lay out the one which I think is favorable. Whatever solution we initially implement, given that the Master Key identifier is an URN we can multiplex multiple implementations for the same repository type.
We mirror the Envelope Encryption algorithm, employed by Amazon and Azure, at the
In the opposite corner, there could be this alternative:
We discussed this today, in our weekly team meeting, but got into extra time pondering the alternatives.
Yet we settled that we don't need to support moving snapshots between repositories.
I would like to kindly ask the distributed team for any input.
what do you mean by
It seems to me for the first option we could "simply" pass a secure setting for the current encryption key to
That said, I like the first option much better than doing some SDK specific thing just for S3. In the end it seems like that is probably less effort maintenance-wise long term since relying on the SDK's implementations of this completely puts us at the mercy of whatever changes happen with that. Plus, as you point out, working with the SDKs only will be tricky to test and not cover the FS repository.
I would point out one thing though (sorry if this was already discussed, just ignore this if it was :)):
The snapshot mechanism uses blob names as part of it's logic somewhat extensively. Even if we client side encrypt every blob, we'd still be leaking the following information:
Not sure if that's a compliance problem, but that would certainly be something that would be challenging to not leak via the blob names.
That's all I have for now. Happy to help review you work though :)
Yes, that's the first option I was trying to describe. What I mean when I say we duplicate code, is that the "crypto logic" (the envelope encryption, AES algorithm, all that) will most likely be very similar (on purpose) to what the SDK already does.
I think that's a very thoughtful observation, and that it should definitely get in the docs. I don't believe there are regulations for that, and we are not aiming for a specific compliance target, but I'm no expert either. Maybe @joshbressers is more knowledgeable in this regard? I propose we clearly acknowledge this limitation in the docs and act on it only if we get specific requests.
Glad to hear! Thank you!
Ideally we don't want to leak any metadata, but I know sometimes it's unavoidable.
We probably won't run afoul of any compliance standards here. We could see some interest from certain sensitive customers, but generally their concern revolves around leaking names more than this sort of metadata.
I think it would be preferable to implement this ourselves and not rely on the blob-store libraries to do it.
Ultimately, we need this for multiple repository types, and we could use the cloud SDKs for it, but we would still need to build & verify it for each provider, which wouldn't gain us very much over just building it ourselves.