Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix long offset resolution #3460

Conversation

daniellerozenblit
Copy link
Contributor

@daniellerozenblit daniellerozenblit commented Jan 27, 2023

We currently determine whether or not longOffsets are present in a set of input sequences by checking if the windowLog is larger than STREAM_ACCUMULATOR_MIN.

This does not account for inputs with a dictionary, where it is legal for offsets to be larger than the window size. In these cases, we don't properly account for large offsets when flushing our bit accumulator in ZSTD_encodeSequences_body().

This PR introduces a largeOffsets field in the ZSTD_symbolEncodingTypeStats_t which records whether or not large offsets are present in the sequences. This value of this field is finalized in ZSTD_seqToCodes().

I ran some benchmarks for zstd level 1 & 3 and there does not appear to be any measurable change in compression speed.

lib/common/zstd_internal.h Outdated Show resolved Hide resolved
@daniellerozenblit daniellerozenblit force-pushed the fix-long-offsets-resolution-pointer branch from fb9e376 to e10676a Compare January 27, 2023 19:40
@daniellerozenblit daniellerozenblit force-pushed the fix-long-offsets-resolution-pointer branch from e10676a to 9e4c66b Compare January 27, 2023 20:04
Copy link
Contributor

@terrelln terrelln left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! I will also benchmark this PR to make sure I also don't see regressions.

Thanks for adding the test! We might need to try to reduce the memory usage of the test a little bit, otherwise it could be flaky in 32-bit builds, because we could run out of address space.

tests/zstreamtest.c Outdated Show resolved Hide resolved
tests/zstreamtest.c Show resolved Hide resolved
daniellerozenblit and others added 3 commits January 27, 2023 16:58
Co-authored-by: Nick Terrell <nickrterrell@gmail.com>
…ellerozenblit/zstd into fix-long-offsets-resolution-pointer
size_t const kNbSequences = 4;
ZSTD_Sequence* sequences = malloc(sizeof(ZSTD_Sequence) * kNbSequences);
void* const checkBuf = malloc(srcSize);
const size_t largeDictSize = 1 << 25;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh nice, you got it to reproduce with 1 << 25!

In that case we don't need to hide it behind bigTests. But I still like having one 32-bit test in CI that does run with --big-tests. That should help improve our 32-bit coverage a bit, without making our users tests flaky.

@daniellerozenblit daniellerozenblit marked this pull request as ready for review January 30, 2023 14:28
Copy link
Contributor

@terrelln terrelln left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks!

@daniellerozenblit daniellerozenblit merged commit 0017663 into facebook:dev Jan 30, 2023
@daniellerozenblit daniellerozenblit deleted the fix-long-offsets-resolution-pointer branch March 6, 2023 19:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants