Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encrypted blob store repository - take I #46170

Conversation

albertzaharovits
Copy link
Contributor

@albertzaharovits albertzaharovits commented Aug 30, 2019

This is the first iteration at at the client-side encrypted snapshot repository. I would much appreciate the @elastic/es-distributed confirmation about the approach in the code (I have specific questions) and I also have some questions about the next iteration.

To create the encrypted repository use the following command:

curl -X PUT "localhost:9200/_snapshot/my_fs_encrypted_backup?pretty" -H 'Content-Type: application/json' -d'
{
  "type": "encrypted",
  "settings": {
    "delegate_type": "fs",
    "password": "password",
    "location": "/tmp/encrypted_snapshot"
  }
}
'
  • delegate_type can be any type backed by a BlobStoreRepository: fs, url, azure, gcs, s3.
  • password will be stored in the keystore but it's easier to have it like this for now

Q1:
How do you feel about the approach of EncryptedRepository extending BlobStoreRepository and also containing an BlobStoreRepository ? Especially about the lifecycle methods:

    @Override
    protected void doStart() {
        this.delegatedRepository.start();
        super.doStart();
    }

    @Override
    protected void doStop() {
        super.doStop();
        this.delegatedRepository.stop();
    }

    @Override
    protected void doClose() {
        super.doClose();
        this.delegatedRepository.close();
    }

Q2:
How do you feel about having the encryption metadata (right now containing only the encrypted data encryption key) as a separate blob inside a blob container named from the master encryption key. Having a different blob for every data key helps with rotation in the presence of the eventual consistency issue (rotating keys creates a new file) and avoids partial updates for blobs.
But it does incur one extra call round trip to write the metadata blob - can we overlap the read/write ops, can two writeBlob orreadBlob be called simultaneously on the same BlobStoreReposity?

In addition I have another questions for the follow-up implementation:

To guard against chosen cyphertext attacks (manipulations of the cypher text result in predictable manipulations in the plaintext) that AES schemes are vulnerable to (in various proportions, depending on the mode) it is preferable we use authenticated encryption. This produces an "authentication tag" (a HMAC over the cyphertext) that we must verify before "trusting" the decryption result. For that we either have to buffer the cypher text, before decrypting, and after the tag validation succeeds. We can buffer this in memory or spool to disk in a temporary file. What approach would you favor? I would lean for the spool to disk. (This implementation does in-memory buffering which is disastruous performance wise).
And a related question: given that we would have to buffer (in memory or on disk), what's a chunk size we should use out of the box? Smaller chunk size requires less temporary buffers but increases the API counts.

No key rotation consideration in this.
This could be a separate plugin or as part of the xpack module (current), I don't think it matters for now.

Relates: #41910

@albertzaharovits albertzaharovits added >feature :Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs :Security/Security Security issues without another label labels Aug 30, 2019
@albertzaharovits albertzaharovits self-assigned this Aug 30, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-security

Copy link
Member

@original-brownbear original-brownbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @albertzaharovits I will take a proper look as soon as I find some time for it (next week I'm afraid).
In general the appraoch looks fine and like what we discussed in the past. I found one issue with the blob sizes that we'll need to address though.

}
Cipher cipher = Cipher.getInstance("AES/GCM/NoPadding");
cipher.init(Cipher.ENCRYPT_MODE, dataEncryptionKey, gcmParameterSpec);
this.delegatedBlobContainer.writeBlob(blobName, new CipherInputStream(inputStream, cipher), blobSize, failIfAlreadyExists);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems wrong (and like it'll be a problem potentially!). The blobSize for the purposes of cloud stores like S3 must be the exact number of bytes that will be written. This currently works with the FsRepository because it doesn't use the size here, but it will write partial (since the encrypted bytes are more than the unencrypted) data with S3 and such.

Can we guess the size up front here in some form (via padding magic or so ... I'm admittedly not that knowledgeable here)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch! We can compute the cyphertext length from the padding and mode we use. It's not complicated. I will make the changes.


@Override
public Map<String, BlobMetaData> listBlobs() throws IOException {
return this.delegatedBlobContainer.listBlobs();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't work so easily. We need the correct metadata be returned for each blob here (i.e. the correct size, but now it returns the encrypted size)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'll make the length computation in this case as well. Thanks!

@original-brownbear
Copy link
Member

As far as Q1 is concerned:

How do you feel about the approach of EncryptedRepository extending BlobStoreRepository and also containing an BlobStoreRepository ?

I think this is fine. In the end, currently neither doStart nor doStop really do much/anything in production code for any implementation ... the optics of wrapping those method migth not be great but I can't see a practical issue with it so it's fine with me.

But it does incur one extra call round trip to write the metadata blob - can we overlap the read/write ops, can two writeBlob orreadBlob be called simultaneously on the same BlobStoreReposity?

Yea sure we can concurrently read and write to the repository, we do it all the time by using the SNAPSHOT threadpool to parallelize uploads. Just make sure to not block in whatever things you code up and you're good to (and likely should :)) parallelize these operations.

To guard against chosen cyphertext attacks (manipulations of the cypher text result in predictable manipulations in the plaintext) that AES schemes are vulnerable to (in various proportions, depending on the mode) it is preferable we use authenticated encryption. This produces an "authentication tag" (a HMAC over the cyphertext) that we must verify before "trusting" the decryption result. For that we either have to buffer the cypher text, before decrypting, and after the tag validation succeeds. We can buffer this in memory or spool to disk in a temporary file. What approach would you favor? I would lean for the spool to disk. (This implementation does in-memory buffering which is disastruous performance wise).
And a related question: given that we would have to buffer (in memory or on disk), what's a chunk size we should use out of the box? Smaller chunk size requires less temporary buffers but increases the API counts.

I might lack some knowledge here, but why does a smaller chunk size in terms of the HMAC validation lead to more API calls? Can't we still write all the blocks for one unencrypted file (that are separately validated) into a single file and verify them as we stream them back to the client?

@albertzaharovits
Copy link
Contributor Author

albertzaharovits commented Sep 2, 2019

Thanks a lot for the very fast feedback @original-brownbear !

Yea sure we can concurrently read and write to the repository, we do it all the time by using the SNAPSHOT threadpool to parallelize uploads. Just make sure to not block in whatever things you code up and you're good to (and likely should ) parallelize these operations.

I now see the example of BlobStoreRepository#snapshotFile being called on the SNAPSHOT threadpool, thanks!
But I worry that all the calls are done using the same client of the same BlobStore instance.

To be clearer, for example:

        @Override
        public void writeBlob(String blobName, InputStream inputStream, long blobSize, boolean failIfAlreadyExists) throws IOException {
                try (InputStream stream = new ByteArrayInputStream(wrappedDataEncryptionKey)) {
                    this.encryptionMetadataBlobContainer.writeBlob(blobName, stream, wrappedDataEncryptionKey.length, failIfAlreadyExists);
                }
                this.delegatedBlobContainer.writeBlob(blobName, new CipherInputStream(inputStream, cipher), blobSize, failIfAlreadyExists);
        }

Calling writeBlob on encryptionMetadataBlobContainer and delegatedBlobContainer boils down to the same BlobStore instance and cloud client.

The metadata blob is small, therefore unnecessary (because the calls are eventually serialized) forking on a threadpool might be detrimental to performance.

I will try using the threadpool and compare against no thread pool, we'll see!

why does a smaller chunk size in terms of the HMAC validation lead to more API calls? Can't we still write all the blocks for one unencrypted file (that are separately validated) into a single file and verify them as we stream them back to the client?

I think you're suggesting that a single blob be a concatenation of several "encryption call instances". I think this is an option still on the table, there is nothing in the crypto precluding this. But the code would be more complex (we have to implement streaming ourselves, to recognize "segment" boundaries) and the AES algorithm already works by blocks (and it is "on-line" i.e. streaming ) and the BlobStoreRepository also has the blob notion. Introducing our own intermediate blocks/segments (i.e. encryption calls) would work but would surely be inelegant and adding complexity on the crypto code which I would abstain from. But it's still on the table I'll think about it, thanks!

@original-brownbear
Copy link
Member

@albertzaharovits npnp :)

But I worry that all the calls are done using the same client of the same BlobStore instance.

This is not an issue, the existing code already assumes these are threadsafe :)

The metadata blob is small, therefore unnecessary (because the calls are eventually serialized) forking on a threadpool might be detrimental to performance.

I think it very much depends on the situation here. Keep in mind that on cloud backed repositories we can have something like 100ms in latency for a write routinely.
For operations that write a large number of blobs I would agree that parallelizing the write of a and its metadata doesn't get you much because you can only do so many writes in parallel anyway.
Buf for operations that only write a small number of blobs (all the things master does) it would nicely lower the overall time that e.g. finalizing a snapshot would take.

That said, I wouldn't invest too much time into this. It might not be possible to parallelize non-segment blob writes at the moment without an API change anyway. Probably something to optimize in the future by moving the API of Repository to something async for all operations first.

@original-brownbear
Copy link
Member

@albertzaharovits ah sorry forgot one of my comments :)

On the whole HMAC topic:

Can't we simply verify this via the InputStream we create from an encrypted blob? As we stream we can keep calculating the checksum/hash (forgive me for not knowing the correct terminology here :)) and then when we read the last byte (we can wrap the original stream in some way to "read 1 byte ahead" or so to make this work nicely) we check if it matches with expectations and throw if it doesn't?
That wouldn't require any buffering would it?

@albertzaharovits
Copy link
Contributor Author

albertzaharovits commented Sep 2, 2019

Can't we simply verify this via the InputStream we create from an encrypted blob? As we stream we can keep calculating the checksum/hash (forgive me for not knowing the correct terminology here :)) and then when we read the last byte (we can wrap the original stream in some way to "read 1 byte ahead" or so to make this work nicely) we check if it matches with expectations and throw if it doesn't?
That wouldn't require any buffering would it?

Yes, this is a very good option, and it is a close option 2.

For popularity/practicality reasons we have converged around AES encryption in the GCM mode. This mode has MAC calculation built-in, it is supposed to be hardware accelerated (if the JVM JIT compiles the code that does the decryption, and if using the SunJCE crypto provider), and we have guidance on errors to avoid in the usage in the NIST SP800-38D specification.
Unfortunately this same specification recommends against "interpreting" decrypted cypher text before validating the authentication tag, which is the reason for the internal buffering of the GaloisCounterMode cipher from SunJCE. As you pointed out, in a separate discussion we can be assured of it in FileRestoreContext, that a partial blob with an IOException at the end, when verifying the authentication tag, will not get "interpreted" (This made redundant the temporary spooling file for decrypted cryphertext that I am asking about). But there is no way to turn the internal buffering off in the GaloisCounterMode cipher in the SunJCE provider.

Without recurring to buffering we have two options, that I will explore them both to gauge the performance:

  • Use a GCM implementation from the Bouncy Castle Crypto provider that does not employ the internal buffering.
  • Use a different mode, CBC, which does not offer authentication, and wrap a HMAC computation on the CipherText to obtain the authentication feature.

GCM is uber popular and it minimizes the code we have to write that deals with crypto so I am favoring it as a first option. A close second is your suggestion to do our own HMAC on a ciphertext from an AES mode that does not do authentication.

}
Cipher cipher = Cipher.getInstance(ENCRYPTION_MODE, BC_PROV);
cipher.init(Cipher.ENCRYPT_MODE, dataEncryptionKey, ivParameterSpec);
this.delegatedBlobContainer.writeBlob(blobName, new CipherInputStream(inputStream, cipher), blobSize + GCM_TAG_BYTES_LENGTH,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CipherInputStream does not support marking (i.e. CipherInputStream.markSupported() returns false) which means that failed requests to S3 / GCS can't be auto-retried.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Urgh good find ... I guess we could create our own cipher stream that supports it, but then we'd either have to buffer the encrypted data to disk or reencrypt the whole thing on retry right? (maybe there's a better way that I don't see?)

Copy link
Contributor

@tvernum tvernum Sep 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If those repositories (blob stores) only require reseting back to the beginning of the stream (to retry the whole request) then we could probably handle it with reencryption.
But if we want to mark an arbitrary point in the stream and reset to it, then re-encryption probably isn't going to be an option because you can't reset a Cipher to an artibrary point, and that means any checksums will be thrown out because the Cipher will think it has processed more data than was actually written to the blob store.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But if we want to mark an arbitrary point in the stream and reset to it, then re-encryption probably isn't going to be an option because you can't reset a Cipher to an artibrary point, and that means any checksums will be thrown out because the Cipher will think it has processed more data than was actually written to the blob store.

My thinking was to simply reset back to 0 to get a fresh Cipher, but then just dump the bytes up to whatever point x we actually want to reset to and only start producing the bytes starting from x to have a CPU-expensive by IO-cheap way of implementing resetting to an arbitrary point. Wouldn't that work?

Copy link
Contributor Author

@albertzaharovits albertzaharovits Sep 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for raising this point @ywelsch !

I understand that this is a problem for upload (hence encryption). Decryption is the CipherInputStream wrapping the cloud client's stream so it should not be affected. Please correct me if I'm mistaken.

I looked over the S3 SDK sources and I believe AwsChunkedEncodingInputStream is where an input stream which does not support mark/reset will be memory buffered during an update. This chunking is recommended for buffers larger than 100MB. Note that this chunking is different from the chunking we do at the blob level; this is chunking done inside the SDK during the multipart upload. The chunk buffers are generally small, only 128Kb by default. Hence I believe general mark/reset is required (not only from the beginning).

Assuming that renouncing the chunked-upload-and-retry by the cloud SDK library is a last resort option, we're left with four other choices:

  1. Buffer the cipher text in memory. This incurs 128Kb of memory overhead, and the client losses the ability to control the chunk size (on the sdk level).
  2. Buffer the cipher text on disk on a spool file. Disk IO performance impact to be determined.
  3. Use AES/CTR with HmacSha512. This encryption scheme is able to produce a mark/reset-able CipherInputStream (there is such an implementation in the BC library). This gets us coding the decryption and authentication separately, exposing us to some pitfalls, but I am confident we can straighten them all in reviews.
  4. AES/GCM can be translated to an AES/CTR scheme, therefore we might buffer seek the plain text and redo the encryption upon a reset, using the CTR cipher specifically configured for the seek position. The code will be nasty, but we can test that re-winded and re-encrypted streams are identical to the original AES/GCMciphertext.

From my pure engineering perspective I would pick option 3. The code implemented with BC should be very neat and the caveats doing the MAC yourself are manageable. I don't like buffering given how much has been invested in developing seek-able streams al over the codebase (hence discounting 1 and 2), and option 4 is a bit too complex. That being said I think option 4, when it finally works, will be easier to "prove" correct (because we can test cipher text equality). Also, on-disk spooling, if feasible, would be a great leeway in terms of future configurability of the encryption plugin; maybe spooling is required by other features too?

I am curious what's your thinking on this @original-brownbear @ywelsch @tvernum .
In the mean time I will try to implement option 3 using BouncyCastle to see if there's a noticeable performance impact.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would agree that 3 seems like the cleanest option here.
But as I said above, I think 4 is fine too. The situation of having the reset the stream and retry the upload of a chunk should be a relatively rare occurence (obviously this depends on the specific networking situation, but even then the time spent on retries should be small relative to the overall snapshot time)
=> Assuming 3 doesn't work, I think it'd be better to spent some CPU here and go with 4 than to introduce the complications (there's a number of open questions this would raise in regards to disk usage IMO) of spooling to disk (2.) or restricting upload chunk sizing flexibility (1.).

@albertzaharovits
Copy link
Contributor Author

@bizybot I think you can push the code that moves this plugin under the x-pack directory.
Given that the code sits in a few self contained files, moving them around to another directory should carry @original-brownbear 's comments. Worst case we can revert the move, and open another PR if the comments get in an unusable state.

@albertzaharovits albertzaharovits changed the base branch from master to client-side-encrypted-snapshot-repos October 9, 2019 23:48
@albertzaharovits albertzaharovits marked this pull request as ready for review October 9, 2019 23:51
@albertzaharovits
Copy link
Contributor Author

I am going to merge this into a feature branch, although it did not pass reviews.

Given the excellent feedback that I've already got I know what needs changed (make CipherInputStream support mark) but I don't want to hold off this PR until I get there and instead prefer to unlock @bizybot .

I hope the following PRs that will address the issues raised here will not be perturbed by me merging this, since most the changes here are new code (it won't strain the reviewer with changes upon changes).

@albertzaharovits albertzaharovits merged commit ef2f8aa into elastic:client-side-encrypted-snapshot-repos Oct 10, 2019
@jmlrt
Copy link
Member

jmlrt commented Oct 11, 2019

@albertzaharovits fyi elastic+elasticsearch+pull-request+multijob+packaging-tests-unix-next-sample was running since 37 hours on ubuntu-16.04 Ci workers and /var/lib/jenkins/jobs/elastic+elasticsearch+pull-request+multijob+packaging-tests-unix-next-sample/configurations/axis-os/ubuntu-16.04-packaging/builds/183/build-log s.log used 245GB on elasticsearch-ci master.

I had to cancel the job and delete the build as part of https://github.com/elastic/infra/issues/15235.

@jmlrt
Copy link
Member

jmlrt commented Oct 11, 2019

@albertzaharovits elastic+elasticsearch+pull-request+multijob+packaging-tests-unix-next-sample is still running since 37 hours and taking 131GB on centos-7.

root@elasticsearch-ci-master-blue:/var/lib/jenkins/jobs/elastic+elasticsearch+pull-request+multijob+packaging-tests-unix-next-sample/configurations/axis-os/centos-7-packaging/builds/183# du -sh *
131G    build-logs.log
12K     build-logs.yml
4.0K    build-results.yml
496K    changelog.xml
1.5G    log
4.0K    meta.json
4.0K    module
16K     registry

I won't stop the job and delete log files yet if you want to investigate.

@albertzaharovits
Copy link
Contributor Author

@jmlrt please cancel the job and delete any files. This PR has been merged into a feature branch it did not pass all compilation tests, ie I'm not interested in the results of those jobs. I will later fix the issues in PRs.

albertzaharovits added a commit that referenced this pull request Jan 6, 2020
This adds a new bare snapshot repository project which contains the classes implementing encryption (and decryption) input stream decorators that support mark and reset.

Relates #48221 , #46170
albertzaharovits added a commit that referenced this pull request Nov 25, 2020
This adds a new bare snapshot repository project which contains the classes implementing encryption (and decryption) input stream decorators that support mark and reset.

Relates #48221 , #46170
albertzaharovits added a commit to albertzaharovits/elasticsearch that referenced this pull request Nov 30, 2020
This adds a new bare snapshot repository project which contains the classes implementing encryption (and decryption) input stream decorators that support mark and reset.

Relates elastic#48221 , elastic#46170
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >feature :Security/Security Security issues without another label
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants