Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent spurious InvalidDistanceCode errors #26

Merged
merged 1 commit into from
Sep 17, 2024

Conversation

fintelia
Copy link
Contributor

Normally in a deflate bitstream all distance codes are valid. However, there are two cases where distance codes can be invalid:

  1. If there is only one distance with non-zero code length, it is assigned code 0 and code 1 is invalid.
  2. If all distance code lengths are zero, then back references cannot appear in the bitstream and all distance codes are invalid. Notably, however, it is permissible for length codes to be defined in the block header as long as they never appear in the bitstream itself.

Case (2) turns out to be a problem if we don't specifically account for it...

This is because fdeflate uses the bits it has read so far concatenated with padding bits when decoding the next symbol. This way it doesn't need a full 8-bytes of input data if the next symbol is only a couple bits (and so it can handle the final symbols in a bitstream). After a symbol is read, we check whether that symbol used any of padding bits and bail if so. If padding bits were used then the resulting symbol won't necessarily match what is actually present in the bitstream, but because huffman codes are prefix codes, whatever symbol comes next must require more bits than we have so far.

Normally we try to read a full length-distance pair of symbols before checking whether enough bits are available. But unfortunately, the check for valid distance symbols came before the check for there being enough bits, leading to the spurious error.

This PR moves the check earlier, and also adds a small optimization: The distance code slowpath is only used for distance codes that are 10+ bits, so before taking the slowpath for decoding a distance symbol we first check whether we have at least 10 bits beyond the number required for the length symbol

@fintelia fintelia linked an issue Sep 17, 2024 that may be closed by this pull request
@anforowicz
Copy link
Contributor

Thanks! I can confirm that this commit fixes the new test_input_chunking_sensitivity_example1 test case from #24.

@fintelia fintelia merged commit a26acce into image-rs:main Sep 17, 2024
anforowicz added a commit to anforowicz/fdeflate that referenced this pull request Sep 18, 2024
This commit adds explicit handling of an invalid literal lengths
Huffman tree - one where the length of encoding for symbol 256
(256 = end of block) is zero (meaning that such symbol never
appears in the compressed stream).  This behavior is consistent
with `zlib` - see the comment here:
madler/zlib#82 (comment)

This commit also adds regression tests:

* `test_input_chunking_sensitivity_when_no_end_of_block_symbol`
  (regression test covering the product code changes from this commit)
* `test_input_chunking_sensitivity_when_handling_distance_codes`
  (belated regression test for an issue fixed earlier in
  image-rs#26
* `inflate_bytewise3` fuzzer (used to generate inputs for both of the
  unit tests above;  temporarily disabled until known fuzzing issues are
  fixed which is tracked in
  image-rs#25)
anforowicz added a commit to anforowicz/fdeflate that referenced this pull request Sep 18, 2024
This commit adds explicit handling of an invalid literal lengths
Huffman tree - one where the length of encoding for symbol 256
(256 = end of block) is zero (meaning that such symbol never
appears in the compressed stream).  This behavior is consistent
with `zlib` - see the comment here:
madler/zlib#82 (comment)

This commit also adds regression tests:

* `test_input_chunking_sensitivity_when_no_end_of_block_symbol`
  (regression test covering the product code changes from this commit)
* `test_input_chunking_sensitivity_when_handling_distance_codes`
  (belated regression test for an issue fixed earlier in
  image-rs#26
* `inflate_bytewise3` fuzzer (used to generate inputs for both of the
  unit tests above;  temporarily disabled until known fuzzing issues are
  fixed which is tracked in
  image-rs#25)
anforowicz added a commit to anforowicz/fdeflate that referenced this pull request Sep 18, 2024
This commit adds explicit handling of an invalid literal lengths
Huffman tree - one where the length of encoding for symbol 256
(256 = end of block) is zero (meaning that such symbol never
appears in the compressed stream).  This behavior is consistent
with `zlib` - see the comment here:
madler/zlib#82 (comment)

This commit also adds regression tests:

* `test_input_chunking_sensitivity_when_no_end_of_block_symbol`
  (regression test covering the product code changes from this commit)
* `test_input_chunking_sensitivity_when_handling_distance_codes`
  (belated regression test for an issue fixed earlier in
  image-rs#26
* `inflate_bytewise3` fuzzer (used to generate inputs for both of the
  unit tests above;  temporarily disabled until known fuzzing issues are
  fixed which is tracked in
  image-rs#25)
anforowicz added a commit to anforowicz/fdeflate that referenced this pull request Sep 18, 2024
This commit adds explicit handling of an invalid literal lengths
Huffman tree - one where the length of encoding for symbol 256
(256 = end of block) is zero (meaning that such symbol never
appears in the compressed stream).  This behavior is consistent
with `zlib` - see the comment here:
madler/zlib#82 (comment)

This commit also adds regression tests:

* `test_input_chunking_sensitivity_when_no_end_of_block_symbol`
  (regression test covering the product code changes from this commit)
* `test_input_chunking_sensitivity_when_handling_distance_codes`
  (belated regression test for an issue fixed earlier in
  image-rs#26
* `inflate_bytewise3` fuzzer (used to generate inputs for both of the
  unit tests above;  temporarily disabled until known fuzzing issues are
  fixed which is tracked in
  image-rs#25)
anforowicz added a commit to anforowicz/fdeflate that referenced this pull request Sep 18, 2024
This commit adds explicit handling of an invalid literal lengths
Huffman tree - one where the length of encoding for symbol 256
(256 = end of block) is zero (meaning that such symbol never
appears in the compressed stream).  This behavior is consistent
with `zlib` - see the comment here:
madler/zlib#82 (comment)

This commit also adds regression tests:

* `test_input_chunking_sensitivity_when_no_end_of_block_symbol_example1`
  and `..._example2` (regression tests covering the product code changes
  from this commit)
* `test_input_chunking_sensitivity_when_handling_distance_codes`
  (belated regression test for an issue fixed earlier in
  image-rs#26
* `inflate_bytewise3` fuzzer (used to generate inputs for both of the
  unit tests above;  temporarily disabled until known fuzzing issues are
  fixed which is tracked in
  image-rs#25)
anforowicz added a commit to anforowicz/fdeflate that referenced this pull request Sep 19, 2024
Explicitly reject Huffman trees with no symbol for end-of-block.
This commit adds regression tests:

* `test_input_chunking_sensitivity_when_no_end_of_block_symbol_example1`
  and `..._example2` (regression tests covering the issue fixed by
  image-rs#27)
* `test_input_chunking_sensitivity_when_handling_distance_codes`
  (regression test for the issue fixed by
  image-rs#26
* `inflate_bytewise3` fuzzer (used to discover inputs for the first 2 of
  the unit tests above + minimize the input for the other one)
anforowicz added a commit to anforowicz/fdeflate that referenced this pull request Sep 19, 2024
This commit adds regression tests:

* `test_input_chunking_sensitivity_when_no_end_of_block_symbol_example1`
  and `..._example2` (regression tests covering the issue fixed by
  image-rs#27)
* `test_input_chunking_sensitivity_when_handling_distance_codes`
  (regression test for the issue fixed by
  image-rs#26
* `inflate_bytewise3` fuzzer (used to discover inputs for the first 2 of
  the unit tests above + minimize the input for the other one)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Different results depending on how the input is "chunked"
2 participants