-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nondeterministic compression with ZSTD_compressCCtx
#1241
Comments
Thanks for detailed report @jblazquez , and precise reproduction instructions. note : I confirm reproduction of the issue on a Linux Ubuntu VM. |
Small comment :
I removed |
Here is my suspicion : Position In most cases, it should not matter much. If there is a match starting a position Thing is, later on, as a context is re-used, the starting position of following data blob to compress is no longer Since the first byte is no longer at position I'll check in more detail if that's what happens. Edit : just checked, and it's indeed what happens. 2 matches start from beginning of sample. They are detected starting from position |
It appears that since commit 9d65a5c, compressing the same input data twice in a row while using the same compression context - even if the context is reset between compressions - results in different outputs when using certain compression levels. We were relying on the guarantees that @Cyan4973 described in #999 and assuming that zstd would output binary identical compressed bitstreams in this scenario.
Are we misunderstanding
ZSTD_compressCCtx
and thinking that it wouldn't reuse any state in between invocations when that's not guaranteed?Steps to repro
We're only able to repro this easily on macOS (10.13), but when we discovered the problem the data had been compressed by Windows and Linux versions of zstd, so the problem doesn't appear to be platform-specific.
Save the following code to a file called
test.c
:Now build against zstd v1.3.5:
The following output will be seen:
As you can see, every level from 9 onwards results in different compressed output the second time.
This didn't happen back in v1.3.3:
It started happening to some extent with commit 9d65a5c:
Workaround
For now, we've switched to using
ZSTD_compress
, which does result in deterministic outputs in this scenario.The text was updated successfully, but these errors were encountered: