fix root cause of #3416 #3419

Cyan4973 · 2023-01-11T23:46:18Z

A minor update in 5434de0 changed a <= into a <, and as an indirect consequence allowed compression attempt of literals when there are only 6 literals to compress (previous limit was effectively 7 literals).

This is not in itself a problem, as the threshold is merely an heuristic, but it emerged a bug that has always been there, and was just never triggered so far due to the previous limit. This bug would make the literal compressor believes that all literals are the same symbol, but for the exact case where nbLiterals==6, plus a pretty wild combination of several exceptional conditions, this outcome could be false, resulting in data corruption.

Replaced the blind heuristic by an actual test for all limit cases. Even if the threshold is changed again in the future, the detection of Repeated Literal mode will remain reliable.

fix #3416

terrelln

Can you please add a test case that would trigger this bug?

The fix LGTM

terrelln · 2023-01-12T20:24:09Z

lib/compress/zstd_compress_literals.c

-        ZSTD_memcpy(nextHuf, prevHuf, sizeof(*prevHuf));
-        return ZSTD_compressRleLiteralsBlock(dst, dstCapacity, src, srcSize);
-    }
+        if ((srcSize >= 8) || allBytesIdentical(src, srcSize)) {


Please add a comment as to why this check is necessary.

Cyan4973 · 2023-01-12T22:57:47Z

Can you please add a test case that would trigger this bug?

The fix LGTM

This is a really hard test to define.
We would probably need some golden files, both for the content and the dictionary.
None of them are naturally exported from the fuzzer (only the sequence of bits which generate those artifacts).

terrelln · 2023-01-12T22:59:56Z

This is a really hard test to define.
We would probably need some golden files, both for the content and the dictionary.
None of them are naturally exported from the fuzzer (only the sequence of bits which generate those artifacts).

Ok, it'd be great if we had it, but if it is too hard, we have the fuzzers, and we know that they will quickly catch any regressions.

A minor change in 5434de0 changed a `<=` into a `<`, and as an indirect consequence allowed compression attempt of literals when there are only 6 literals to compress (previous limit was effectively 7 literals). This is not in itself a problem, as the threshold is merely an heuristic, but it emerged a bug that has always been there, and was just never triggered so far due to the previous limit. This bug would make the literal compressor believes that all literals are the same symbol, but for the exact case where nbLiterals==6, plus a pretty wild combination of other limit conditions, this outcome could be false, resulting in data corruption. Replaced the blind heuristic by an actual test for all limit cases, so that even if the threshold is changed again in the future, the detection of RLE mode will remain reliable.

@terrelln

as requested by @terrelln

facebook-github-bot added the CLA Signed label Jan 11, 2023

Cyan4973 force-pushed the fix3416 branch from bc4f661 to 5cf0ef5 Compare January 11, 2023 23:47

Cyan4973 self-assigned this Jan 12, 2023

terrelln approved these changes Jan 12, 2023

View reviewed changes

Cyan4973 added 2 commits January 12, 2023 15:41

add explanation about new test

ac45e07

as requested by @terrelln

Cyan4973 force-pushed the fix3416 branch from 5cf0ef5 to ac45e07 Compare January 12, 2023 23:49

Cyan4973 merged commit d550908 into dev Jan 13, 2023

Cyan4973 deleted the fix3416 branch January 26, 2023 22:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix root cause of #3416 #3419

fix root cause of #3416 #3419

Cyan4973 commented Jan 11, 2023 •

edited

terrelln left a comment

terrelln Jan 12, 2023

Cyan4973 Jan 12, 2023

Cyan4973 commented Jan 12, 2023

terrelln commented Jan 12, 2023

fix root cause of #3416 #3419

fix root cause of #3416 #3419

Conversation

Cyan4973 commented Jan 11, 2023 • edited

terrelln left a comment

Choose a reason for hiding this comment

terrelln Jan 12, 2023

Choose a reason for hiding this comment

Cyan4973 Jan 12, 2023

Choose a reason for hiding this comment

Cyan4973 commented Jan 12, 2023

terrelln commented Jan 12, 2023

Cyan4973 commented Jan 11, 2023 •

edited