-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prevent spurious InvalidDistanceCode errors #26
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Thanks! I can confirm that this commit fixes the new |
anforowicz
added a commit
to anforowicz/fdeflate
that referenced
this pull request
Sep 18, 2024
This commit adds explicit handling of an invalid literal lengths Huffman tree - one where the length of encoding for symbol 256 (256 = end of block) is zero (meaning that such symbol never appears in the compressed stream). This behavior is consistent with `zlib` - see the comment here: madler/zlib#82 (comment) This commit also adds regression tests: * `test_input_chunking_sensitivity_when_no_end_of_block_symbol` (regression test covering the product code changes from this commit) * `test_input_chunking_sensitivity_when_handling_distance_codes` (belated regression test for an issue fixed earlier in image-rs#26 * `inflate_bytewise3` fuzzer (used to generate inputs for both of the unit tests above; temporarily disabled until known fuzzing issues are fixed which is tracked in image-rs#25)
anforowicz
added a commit
to anforowicz/fdeflate
that referenced
this pull request
Sep 18, 2024
This commit adds explicit handling of an invalid literal lengths Huffman tree - one where the length of encoding for symbol 256 (256 = end of block) is zero (meaning that such symbol never appears in the compressed stream). This behavior is consistent with `zlib` - see the comment here: madler/zlib#82 (comment) This commit also adds regression tests: * `test_input_chunking_sensitivity_when_no_end_of_block_symbol` (regression test covering the product code changes from this commit) * `test_input_chunking_sensitivity_when_handling_distance_codes` (belated regression test for an issue fixed earlier in image-rs#26 * `inflate_bytewise3` fuzzer (used to generate inputs for both of the unit tests above; temporarily disabled until known fuzzing issues are fixed which is tracked in image-rs#25)
anforowicz
added a commit
to anforowicz/fdeflate
that referenced
this pull request
Sep 18, 2024
This commit adds explicit handling of an invalid literal lengths Huffman tree - one where the length of encoding for symbol 256 (256 = end of block) is zero (meaning that such symbol never appears in the compressed stream). This behavior is consistent with `zlib` - see the comment here: madler/zlib#82 (comment) This commit also adds regression tests: * `test_input_chunking_sensitivity_when_no_end_of_block_symbol` (regression test covering the product code changes from this commit) * `test_input_chunking_sensitivity_when_handling_distance_codes` (belated regression test for an issue fixed earlier in image-rs#26 * `inflate_bytewise3` fuzzer (used to generate inputs for both of the unit tests above; temporarily disabled until known fuzzing issues are fixed which is tracked in image-rs#25)
anforowicz
added a commit
to anforowicz/fdeflate
that referenced
this pull request
Sep 18, 2024
This commit adds explicit handling of an invalid literal lengths Huffman tree - one where the length of encoding for symbol 256 (256 = end of block) is zero (meaning that such symbol never appears in the compressed stream). This behavior is consistent with `zlib` - see the comment here: madler/zlib#82 (comment) This commit also adds regression tests: * `test_input_chunking_sensitivity_when_no_end_of_block_symbol` (regression test covering the product code changes from this commit) * `test_input_chunking_sensitivity_when_handling_distance_codes` (belated regression test for an issue fixed earlier in image-rs#26 * `inflate_bytewise3` fuzzer (used to generate inputs for both of the unit tests above; temporarily disabled until known fuzzing issues are fixed which is tracked in image-rs#25)
anforowicz
added a commit
to anforowicz/fdeflate
that referenced
this pull request
Sep 18, 2024
This commit adds explicit handling of an invalid literal lengths Huffman tree - one where the length of encoding for symbol 256 (256 = end of block) is zero (meaning that such symbol never appears in the compressed stream). This behavior is consistent with `zlib` - see the comment here: madler/zlib#82 (comment) This commit also adds regression tests: * `test_input_chunking_sensitivity_when_no_end_of_block_symbol_example1` and `..._example2` (regression tests covering the product code changes from this commit) * `test_input_chunking_sensitivity_when_handling_distance_codes` (belated regression test for an issue fixed earlier in image-rs#26 * `inflate_bytewise3` fuzzer (used to generate inputs for both of the unit tests above; temporarily disabled until known fuzzing issues are fixed which is tracked in image-rs#25)
anforowicz
added a commit
to anforowicz/fdeflate
that referenced
this pull request
Sep 19, 2024
Explicitly reject Huffman trees with no symbol for end-of-block. This commit adds regression tests: * `test_input_chunking_sensitivity_when_no_end_of_block_symbol_example1` and `..._example2` (regression tests covering the issue fixed by image-rs#27) * `test_input_chunking_sensitivity_when_handling_distance_codes` (regression test for the issue fixed by image-rs#26 * `inflate_bytewise3` fuzzer (used to discover inputs for the first 2 of the unit tests above + minimize the input for the other one)
anforowicz
added a commit
to anforowicz/fdeflate
that referenced
this pull request
Sep 19, 2024
This commit adds regression tests: * `test_input_chunking_sensitivity_when_no_end_of_block_symbol_example1` and `..._example2` (regression tests covering the issue fixed by image-rs#27) * `test_input_chunking_sensitivity_when_handling_distance_codes` (regression test for the issue fixed by image-rs#26 * `inflate_bytewise3` fuzzer (used to discover inputs for the first 2 of the unit tests above + minimize the input for the other one)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Normally in a deflate bitstream all distance codes are valid. However, there are two cases where distance codes can be invalid:
0
and code1
is invalid.Case (2) turns out to be a problem if we don't specifically account for it...
This is because fdeflate uses the bits it has read so far concatenated with padding bits when decoding the next symbol. This way it doesn't need a full 8-bytes of input data if the next symbol is only a couple bits (and so it can handle the final symbols in a bitstream). After a symbol is read, we check whether that symbol used any of padding bits and bail if so. If padding bits were used then the resulting symbol won't necessarily match what is actually present in the bitstream, but because huffman codes are prefix codes, whatever symbol comes next must require more bits than we have so far.
Normally we try to read a full length-distance pair of symbols before checking whether enough bits are available. But unfortunately, the check for valid distance symbols came before the check for there being enough bits, leading to the spurious error.
This PR moves the check earlier, and also adds a small optimization: The distance code slowpath is only used for distance codes that are 10+ bits, so before taking the slowpath for decoding a distance symbol we first check whether we have at least 10 bits beyond the number required for the length symbol