[sda-download] Implement random access in encrypted files #696

dbampalikis · 2024-02-28T08:40:12Z

As an sda-user
I want to be able to download specific parts of an encrypted file
In order to be able to get only the region I am interested in

The service currently allows to download specific byte ranges of unencrypted files but in the case of encrypted files, that's only possible for byte ranges that start from the beginning of the file. We need to be able to support random byte ranges of encrypted files, to support the htsget case.

A/C

A PR that allows for downloading specific ranges of encrypted files

pontus · 2024-02-28T15:19:31Z

Assuming we want to avoid reencryption of possibly large amounts of data, this should use the intended support for this in the crypt4gh file format.

In short, each file/data stream is split into 64kbyte blocks that are encrypted/ separately. This is also the smallest unit for decryption as these blocks are what MACs are created for.

This means that to send logical byte 65535-65536 (base 0), one would need to send the reencrypted header and the first two data blocks (65536+extra bytes for crypt4gh). As the receiver only want those two bytes, there would also need to be a data edit list in the header to instruct it to throw away bytes 0-65534 and 65537-131071.

So the header reencryption service needs to be able to accept a dataeditlist to be put in the header.

Currently, I think there's only the chacha20_ietf_poly1305 cipher, so a fixed block size of 65564 can be used, but possibly it might make sense to have a function in the crypt4gh library that takes a header and responds with the block size (or similar).

pontus · 2024-02-28T15:29:42Z

For both the unencrypted and encrypted data out case, there will also be a performance motive to not request the entire object from the archive and only return the wanted bit but rather only requesting the range actually needed.

For the encrypted case, this is fairly simple - the s3 download client could pass a Range with the bytes wanted.

The question would be if we would prefer having a unified handling for unencrypted and encrypted.

For the unencrypted case, it might make sense to have a reader that maps calls to Read to a s3 call that is essentially managed synchronously or something similar.

MalinAhlberg · 2024-03-05T15:24:15Z

When decrypting a partial file, the resulting file size should be what was originally asked for, not more. Ie, the extra data passed on to meet the next data boundary block should be removed. Use data-edit-list. See #695 (comment)

This was referenced Feb 28, 2024

ReEncryptHeader in headers could accept a dataeditlist to override whatever is currently in the header. neicnordic/crypt4gh#124

Closed

[WIP] Add support for random access to backend objects, not fetching entire objects. #703

Closed

MalinAhlberg mentioned this issue Mar 4, 2024

htsget download endpoints #695

Merged

MalinAhlberg mentioned this issue Apr 26, 2024

Feat/htsget seekable #831

Merged

MalinAhlberg closed this as completed in #831 Apr 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[sda-download] Implement random access in encrypted files #696

[sda-download] Implement random access in encrypted files #696

dbampalikis commented Feb 28, 2024

pontus commented Feb 28, 2024

pontus commented Feb 28, 2024

MalinAhlberg commented Mar 5, 2024

[sda-download] Implement random access in encrypted files #696

[sda-download] Implement random access in encrypted files #696

Comments

dbampalikis commented Feb 28, 2024

pontus commented Feb 28, 2024

pontus commented Feb 28, 2024

MalinAhlberg commented Mar 5, 2024