Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correct and clarify repcode offset history logic #3127

Merged
merged 3 commits into from
May 12, 2022

Conversation

embg
Copy link
Contributor

@embg embg commented May 10, 2022

Summary

In zstd, repcode offsets are passed to each block from the previous Compressed_Block. The compression and decompression side need to maintain identical repcode offset histories to prevent data corruption.

My last PR (#3114) introduced a bug in ZSTD_compressBlock_fast_extDict which caused those histories to fall out of sync. This bug was found by OSS fuzz, which was able to trigger data corruption by encoding a repcode match using an incorrect repcode offset (passed incorrectly from a previous block).

This PR fixes that bug and goes further, fixing a latent issue in the existing code (pre-#3114) which @terrelln and I discovered while addressing the fuzzer bug.

The latent bug

The existing offsetSaved logic in fast, doublefast, and lazy noDict breaks the repcode offset history in a more subtle way than #3114. The value of offset_1 which is passed to the next block is always correct, but offset_2 can be incorrect if both offsets are invalid going into the block and no matches are found. This is because there is only one offsetSaved variable, but two offsets to save.

Luckily, this cannot produce data corruption due to the specifics of how offset_2 is used in those matchfinders. Still, I have been able to construct an input for streaming compression (without any dictionary) which passes an incorrect offset_2 from the second-to-last block to the final block.

Even without producing real corruption, this is undesirable; the correctness of the repcode offset history logic shouldn't depend on contingent factors regarding how offset_2 is used in practice.

Solutions

This PR addresses the above problems in three ways:

  • Adds repcode offset history preservation to fast_extDict, fixing the immediate OSS fuzz issue.
  • Removes offsetSaved logic from the DMS matchfinders, since they don't actually use it. (Probably because in zstd, the whole dictionary is considered to be in the window if any byte of it is in the window).
  • Splits offsetSaved into two variables offsetSaved1 and offsetSaved2 with rotation from 1 -> 2 when necessary.

@Cyan4973
Copy link
Contributor

Great explanation @embg !

Any performance measurement, to check if the changes produce any impact ?

@embg
Copy link
Contributor Author

embg commented May 10, 2022

Great explanation @embg !

Any performance measurement, to check if the changes produce any impact ?

I figured performance measurement isn't necessary since I only touched code that executes at the very top and very bottom of the matchfinders (outside the hot loops).

@Cyan4973
Copy link
Contributor

I figured performance measurement isn't necessary since I only touched code that executes at the very top and very bottom of the matchfinders (outside the hot loops).

You would be surprised !
Sometimes, modifications that seem to be locked outside of the main loop
actually messes up register allocation inside the main loop,
resulting in measurable impact where none was expected.

@embg
Copy link
Contributor Author

embg commented May 10, 2022

I figured performance measurement isn't necessary since I only touched code that executes at the very top and very bottom of the matchfinders (outside the hot loops).

You would be surprised ! Sometimes, modifications that seem to be locked outside of the main loop actually messes up register allocation inside the main loop, resulting in measurable impact where none was expected.

Understood -- would zstd -b1 -e9 silesia.tar be sufficient here?

@Cyan4973
Copy link
Contributor

would zstd -b1 -e9 silesia.tar be sufficient here?

Yes

@embg
Copy link
Contributor Author

embg commented May 10, 2022

@Cyan4973 Measurements look good! Mostly identical speeds, a few changes < 1% in both directions at various levels. Seems like noise to me since the changes go both ways, in some cases for levels which use the same matchfinder (e.g. levels 1 and 2 with clang12).

Measurements for gcc11 and clang12

[embg@dev3100.frc1 ~/zstd] $BENCHMARK_PY run -- ./zstd_22875e_gcc11 -b1 -e9 ../silesia.tar
 1#silesia.tar       : 211975168 ->  73410303 (x2.888),  180.4 MB/s,  618.0 MB/s
 2#silesia.tar       : 211975168 ->  69490292 (x3.050),  149.2 MB/s,  559.2 MB/s
 3#silesia.tar       : 211975168 ->  66508266 (x3.187),  110.5 MB/s,  537.0 MB/s
 4#silesia.tar       : 211975168 ->  65313180 (x3.246),   96.6 MB/s,  522.7 MB/s
 5#silesia.tar       : 211975168 ->  62967059 (x3.366),   56.1 MB/s,  527.1 MB/s
 6#silesia.tar       : 211975168 ->  61468353 (x3.449),   39.9 MB/s,  557.4 MB/s
 7#silesia.tar       : 211975168 ->  60446005 (x3.507),   33.8 MB/s,  559.0 MB/s
 8#silesia.tar       : 211975168 ->  59919804 (x3.538),   26.8 MB/s,  573.2 MB/s
 9#silesia.tar       : 211975168 ->  59288102 (x3.575),   24.9 MB/s,  562.2 MB/s
[embg@dev3100.frc1 ~/zstd] $BENCHMARK_PY run -- ./zstd_8bf32d_gcc11 -b1 -e9 ../silesia.tar
 1#silesia.tar       : 211975168 ->  73410303 (x2.888),  180.9 MB/s,  617.8 MB/s
 2#silesia.tar       : 211975168 ->  69490292 (x3.050),  148.6 MB/s,  561.1 MB/s
 3#silesia.tar       : 211975168 ->  66508266 (x3.187),  110.2 MB/s,  538.0 MB/s
 4#silesia.tar       : 211975168 ->  65313180 (x3.246),   96.6 MB/s,  524.0 MB/s
 5#silesia.tar       : 211975168 ->  62967059 (x3.366),   56.7 MB/s,  527.4 MB/s
 6#silesia.tar       : 211975168 ->  61468353 (x3.449),   40.0 MB/s,  558.2 MB/s
 7#silesia.tar       : 211975168 ->  60446005 (x3.507),   34.0 MB/s,  560.1 MB/s
 8#silesia.tar       : 211975168 ->  59919804 (x3.538),   26.9 MB/s,  574.7 MB/s
 9#silesia.tar       : 211975168 ->  59288102 (x3.575),   25.2 MB/s,  563.3 MB/s
[embg@dev3100.frc1 ~/zstd] $BENCHMARK_PY run -- ./zstd_22875e_clang12 -b1 -e9 ../silesia.tar
 1#silesia.tar       : 211975168 ->  73410303 (x2.888),  181.4 MB/s,  601.0 MB/s
 2#silesia.tar       : 211975168 ->  69490292 (x3.050),  152.8 MB/s,  540.0 MB/s
 3#silesia.tar       : 211975168 ->  66508266 (x3.187),  108.7 MB/s,  517.0 MB/s
 4#silesia.tar       : 211975168 ->  65313180 (x3.246),   95.4 MB/s,  502.4 MB/s
 5#silesia.tar       : 211975168 ->  62967059 (x3.366),   58.4 MB/s,  505.5 MB/s
 6#silesia.tar       : 211975168 ->  61468353 (x3.449),   41.3 MB/s,  536.7 MB/s
 7#silesia.tar       : 211975168 ->  60446005 (x3.507),   34.7 MB/s,  536.9 MB/s
 8#silesia.tar       : 211975168 ->  59919804 (x3.538),   27.0 MB/s,  552.6 MB/s
 9#silesia.tar       : 211975168 ->  59288102 (x3.575),   25.5 MB/s,  541.7 MB/s
[embg@dev3100.frc1 ~/zstd] $BENCHMARK_PY run -- ./zstd_8bf32d_clang12 -b1 -e9 ../silesia.tar
 1#silesia.tar       : 211975168 ->  73410303 (x2.888),  182.7 MB/s,  601.4 MB/s
 2#silesia.tar       : 211975168 ->  69490292 (x3.050),  150.9 MB/s,  540.0 MB/s
 3#silesia.tar       : 211975168 ->  66508266 (x3.187),  110.7 MB/s,  519.6 MB/s
 4#silesia.tar       : 211975168 ->  65313180 (x3.246),   95.5 MB/s,  502.9 MB/s
 5#silesia.tar       : 211975168 ->  62967059 (x3.366),   58.0 MB/s,  503.3 MB/s
 6#silesia.tar       : 211975168 ->  61468353 (x3.449),   41.3 MB/s,  538.1 MB/s
 7#silesia.tar       : 211975168 ->  60446005 (x3.507),   34.6 MB/s,  537.1 MB/s
 8#silesia.tar       : 211975168 ->  59919804 (x3.538),   27.2 MB/s,  554.1 MB/s
 9#silesia.tar       : 211975168 ->  59288102 (x3.575),   25.4 MB/s,  536.8 MB/s

lib/compress/zstd_double_fast.c Outdated Show resolved Hide resolved
lib/compress/zstd_double_fast.c Outdated Show resolved Hide resolved
lib/compress/zstd_double_fast.c Outdated Show resolved Hide resolved
@embg embg merged commit f349d18 into facebook:dev May 12, 2022
@embg embg deleted the repcode_history branch May 12, 2022 17:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants