Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] [RFC] add cryptographic hash to seekable format #2737

Closed
wants to merge 1 commit into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
30 changes: 20 additions & 10 deletions contrib/seekable_format/zstd_seekable_compression_format.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ are clearly marked.
Distribution of this document is unlimited.

### Version
0.1.0 (11/04/17)
0.2.0 (31/07/21)

## Introduction
This document defines a format for compressed data to be stored so that subranges of the data can be efficiently decompressed without requiring the entire document to be decompressed.
Expand Down Expand Up @@ -78,26 +78,31 @@ A bitfield describing the format of the seek table.

| Bit number | Field name |
| ---------- | ---------- |
| 7 | `Checksum_Flag` |
| 6-2 | `Reserved_Bits` |
| 7 | `XXH64_Checksum_Flag` |
| 6 | `SHA512-256_Checksum_Flag`|
| 5-2 | `Reserved_Bits` |
| 1-0 | `Unused_Bits` |

While only `Checksum_Flag` currently exists, there are 7 other bits in this field that can be used for future changes to the format,
While only `Checksum_Flag` currently exists, there are 6 other bits in this field that can be used for future changes to the format,
for example the addition of inline dictionaries.

__`Checksum_Flag`__
__`XXH64_Checksum_Flag`__

If the checksum flag is set, each of the seek table entries contains a 4 byte checksum of the uncompressed data contained in its frame.

__`SHA512-256_Checksum_Flag`__

If the checksum flag is set, each of the seek table entries contains a 32 byte SHA-512/256 checksum of the uncompressed data contained in its frame.

`Reserved_Bits` are not currently used but may be used in the future for breaking changes, so a compliant decoder should ensure they are set to 0. `Unused_Bits` may be used in the future for non-breaking changes, so a compliant decoder should not interpret these bits.

#### __`Seek_Table_Entries`__

`Seek_Table_Entries` consists of `Number_Of_Frames` (one for each frame in the data, not including the seek table frame) entries of the following form, in sequence:

|`Compressed_Size`|`Decompressed_Size`|`[Checksum]`|
|-----------------|-------------------|------------|
| 4 bytes | 4 bytes | 4 bytes |
|`Compressed_Size`|`Decompressed_Size`|`[XXH64_Checksum]`|`[SHA512-256_Checksum]`|
|-----------------|-------------------|------------------|-----------------------|
| 4 bytes | 4 bytes | 4 bytes | 32 bytes |

__`Compressed_Size`__

Expand All @@ -108,9 +113,14 @@ __`Decompressed_Size`__

The size of the decompressed data contained in the frame. For skippable or otherwise empty frames, this value is 0.

__`Checksum`__
__`XXH64_Checksum`__

Only present if `XXH64_Checksum_Flag` is set in the `Seek_Table_Descriptor`. Value : the least significant 32 bits of the XXH64 digest of the uncompressed data, stored in little-endian format.

__`SHA512-256_Checksum`__

Only present if `Checksum_Flag` is set in the `Seek_Table_Descriptor`. Value : the least significant 32 bits of the XXH64 digest of the uncompressed data, stored in little-endian format.
Only present if `SHA512-256_Checksum_Flag` is set in the `Seek_Table_Descriptor`. Value : the 256 bits of the SHA-512/256 digest of the uncompressed data, stored in little-endian format.

## Version Changes
- 0.1.0: initial version
- 0.2.0: add cryptographic content hash