Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No check if Reserved of Symbol_Compression_Modes is 0 #3821

Closed
aimuz opened this issue Nov 15, 2023 · 8 comments · Fixed by #3840
Closed

No check if Reserved of Symbol_Compression_Modes is 0 #3821

aimuz opened this issue Nov 15, 2023 · 8 comments · Fixed by #3840
Assignees

Comments

@aimuz
Copy link
Contributor

aimuz commented Nov 15, 2023

Describe the bug

According to the https://datatracker.ietf.org/doc/html/rfc8878#name-symbol_compression_modes description, Reserved should have to be 0, in which case the following file should not be able to be extracted

test-diverge-855c1b40.zst.zip

Using educational_decoder gives the expected error, but using the zstd binary does not.

use educational_decoder

./harness test-diverge-855c1b40.zst test-diverge-855c1b40
Error: Corruption detected while decompressing

use zstd

zstd -d test-diverge-855c1b40.zst
test-diverge-855c1b40.zst: 102400 bytes

To Reproduce
Steps to reproduce the behavior:

  1. Downloads data '...'
  2. Run '...' with flags '...'
  3. Scroll up on the log to '....'
  4. See error

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots and charts
If applicable, add screenshots and charts to help explain your problem.

Desktop (please complete the following information):

  • OS: [e.g. Mac] Mac
  • Version [e.g. 22] *** Zstandard CLI (64-bit) v1.5.5, by Yann Collet ***
  • Compiler [e.g. gcc]
  • Flags [e.g. O2]
  • Other relevant hardware specs [e.g. Dual-core]
  • Build system [e.g. Makefile]

Additional context
Add any other context about the problem here.

@terrelln
Copy link
Contributor

We don't guarantee to reject all possible invalid frames. However this is an easy case, and would make making use of this bit in the future easier. So we should definitely do this.

@aimuz
Copy link
Contributor Author

aimuz commented Nov 23, 2023

So the implementation here for zstd should ignore this bit check, is that right?

@Cyan4973
Copy link
Contributor

There are 2 "special" bits in the frame header :

  • reserved : this one must be set to zero, and it should be checked by the decoder.
  • unused : this one should not be checked by the decoder. It's allowed to be either 1 or 0, to signal an optional property not required for compliant decoding. The reference encoder sets it to zero.

I was almost certain that the reserved bit is checked for the value zero, though I could be wrong, or something could have changed inadvertently. Alternatively, there is a potential risk of confusion between those 2 bits.

@klauspost
Copy link

@Cyan4973 What I think you are referring to is the Frame_Header_Descriptor. These are the two reserved bits in the Sequences_Section_Header on compressed blocks.

@terrelln
Copy link
Contributor

Yeah, if these bits are non-zero, the frame is invalid. A compliant decoder may reject this frame. Unlike the unused bit in the Frame_header_Descriptor, which a compliant decoder must ignore.

We will update the standard decoder to reject frames that set these bits.

@klauspost
Copy link

@terrelln @Cyan4973 Maybe having an unused bit would be a benefit, since it potentially could be used for a non-breaking change.

For example having a bit that would signify "this block does not use repeat offsets from previous block", could be a very useful "additional information" bit - and since it isn't a breaking change older versions could just ignore it.

I don't want to de-rail on whether this specific bit should be added here. Just that having an ignored bit could be valuable. It would however be changing the behavior currently specified in the documentation.

@Cyan4973
Copy link
Contributor

For example having a bit that would signify "this block does not use repeat offsets from previous block", could be a very useful "additional information" bit - and since it isn't a breaking change older versions could just ignore it.

That's indeed one of the scenarios considered for the unused bit.

But note that we have only reserved space for this bit at frame level.
At the time the specification was written, the idea of defining such bits at block level was not present.

Btw, it's a reasonably good idea,
but unfortunately, now that the format has been crystallised into an RFC,
it gives no room for such re-interpretation.
If one would try to introduce such a format change today,
I would be highly concerned about breaking compatibility with deployed decoders following the published specification.
Ecosystem stability is worth preserving.

@aimuz
Copy link
Contributor Author

aimuz commented Nov 28, 2023

I tried to submit a patch, I'm not familiar with C. I hope it works!

@Cyan4973 Cyan4973 self-assigned this Mar 5, 2024
klauspost added a commit to klauspost/compress that referenced this issue Mar 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants