Prevent spurious InvalidDistanceCode errors #26

fintelia · 2024-09-17T02:42:43Z

Normally in a deflate bitstream all distance codes are valid. However, there are two cases where distance codes can be invalid:

If there is only one distance with non-zero code length, it is assigned code 0 and code 1 is invalid.
If all distance code lengths are zero, then back references cannot appear in the bitstream and all distance codes are invalid. Notably, however, it is permissible for length codes to be defined in the block header as long as they never appear in the bitstream itself.

Case (2) turns out to be a problem if we don't specifically account for it...

This is because fdeflate uses the bits it has read so far concatenated with padding bits when decoding the next symbol. This way it doesn't need a full 8-bytes of input data if the next symbol is only a couple bits (and so it can handle the final symbols in a bitstream). After a symbol is read, we check whether that symbol used any of padding bits and bail if so. If padding bits were used then the resulting symbol won't necessarily match what is actually present in the bitstream, but because huffman codes are prefix codes, whatever symbol comes next must require more bits than we have so far.

Normally we try to read a full length-distance pair of symbols before checking whether enough bits are available. But unfortunately, the check for valid distance symbols came before the check for there being enough bits, leading to the spurious error.

This PR moves the check earlier, and also adds a small optimization: The distance code slowpath is only used for distance codes that are 10+ bits, so before taking the slowpath for decoding a distance symbol we first check whether we have at least 10 bits beyond the number required for the length symbol

anforowicz · 2024-09-17T18:18:35Z

Thanks! I can confirm that this commit fixes the new test_input_chunking_sensitivity_example1 test case from #24.

This commit adds explicit handling of an invalid literal lengths Huffman tree - one where the length of encoding for symbol 256 (256 = end of block) is zero (meaning that such symbol never appears in the compressed stream). This behavior is consistent with `zlib` - see the comment here: madler/zlib#82 (comment) This commit also adds regression tests: * `test_input_chunking_sensitivity_when_no_end_of_block_symbol` (regression test covering the product code changes from this commit) * `test_input_chunking_sensitivity_when_handling_distance_codes` (belated regression test for an issue fixed earlier in image-rs#26 * `inflate_bytewise3` fuzzer (used to generate inputs for both of the unit tests above; temporarily disabled until known fuzzing issues are fixed which is tracked in image-rs#25)

This commit adds explicit handling of an invalid literal lengths Huffman tree - one where the length of encoding for symbol 256 (256 = end of block) is zero (meaning that such symbol never appears in the compressed stream). This behavior is consistent with `zlib` - see the comment here: madler/zlib#82 (comment) This commit also adds regression tests: * `test_input_chunking_sensitivity_when_no_end_of_block_symbol_example1` and `..._example2` (regression tests covering the product code changes from this commit) * `test_input_chunking_sensitivity_when_handling_distance_codes` (belated regression test for an issue fixed earlier in image-rs#26 * `inflate_bytewise3` fuzzer (used to generate inputs for both of the unit tests above; temporarily disabled until known fuzzing issues are fixed which is tracked in image-rs#25)

Explicitly reject Huffman trees with no symbol for end-of-block. This commit adds regression tests: * `test_input_chunking_sensitivity_when_no_end_of_block_symbol_example1` and `..._example2` (regression tests covering the issue fixed by image-rs#27) * `test_input_chunking_sensitivity_when_handling_distance_codes` (regression test for the issue fixed by image-rs#26 * `inflate_bytewise3` fuzzer (used to discover inputs for the first 2 of the unit tests above + minimize the input for the other one)

This commit adds regression tests: * `test_input_chunking_sensitivity_when_no_end_of_block_symbol_example1` and `..._example2` (regression tests covering the issue fixed by image-rs#27) * `test_input_chunking_sensitivity_when_handling_distance_codes` (regression test for the issue fixed by image-rs#26 * `inflate_bytewise3` fuzzer (used to discover inputs for the first 2 of the unit tests above + minimize the input for the other one)

Prevent InvalidDistanceCode spurious errors

8c8fec0

fintelia linked an issue Sep 17, 2024 that may be closed by this pull request

Different results depending on how the input is "chunked" #25

Closed

fintelia merged commit a26acce into image-rs:main Sep 17, 2024

anforowicz mentioned this pull request Sep 18, 2024

Regression tests for input chunking sensitivity. #28

Merged

fintelia mentioned this pull request Sep 19, 2024

Different results depending on how the input is "chunked" #25

Closed

anforowicz mentioned this pull request Sep 19, 2024

Need to investigate OSSFuzz failures image-rs/image-png#497

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prevent spurious InvalidDistanceCode errors #26

Prevent spurious InvalidDistanceCode errors #26

fintelia commented Sep 17, 2024

anforowicz commented Sep 17, 2024

Prevent spurious InvalidDistanceCode errors #26

Prevent spurious InvalidDistanceCode errors #26

Conversation

fintelia commented Sep 17, 2024

anforowicz commented Sep 17, 2024