Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Converge sumtype (offset | repcode) numeric representation towards offBase #2965

Merged
merged 3 commits into from
Jan 24, 2022

Conversation

Cyan4973
Copy link
Contributor

@Cyan4973 Cyan4973 commented Dec 30, 2021

This completes #2962 , by ensuring that the numeric representation of the sumtype (offset | repcode) used at the interface of ZSTD_storeSeq() (and elsewhere) is the same as offBase, as stored in the seqStore.
This skips one transient numeric representation, hopefully simplifying the pipeline.

It's a good opportunity to rename all these transformations to better reflect their position in the pipeline, aka :
offset | repcode => offBase => offCode + offBits
following the model previously established for matchLength.

This PR does not impact the compressed output.
The first commit left a subtle minor difference in btlazy2 parser, resulting in rare minor differences in compressed output. This was later fixed in the last commit, so now there is no difference, even for btlazy2.

Also :
fixed PLATFORM_POSIX_VERSION within zstdseek_decompress.c.
This macro was used there without being defined.
This triggered warning / errors on my workstation during tests.

directly at ZSTD_storeSeq() interface.

In the process, remove ZSTD_REP_MOVE.

This makes it possible, in future commits,
to update and effectively simplify the naming scheme
to properly label the updated processing pipeline :
offset | repcode => offBase => offCode + offBits
subtle dependency on sumtype numeric representation
@Cyan4973
Copy link
Contributor Author

ping
this should be ready for merge, now that v1.5.2 is published

@Cyan4973
Copy link
Contributor Author

ping 2

@@ -23,13 +23,64 @@
# endif
#endif

/* ************************************************************
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change intended? It isn't mentioned in the summary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes,
PLATFORM_POSIX_VERSION is used a bit below
but it's not defined anywhere,
so it's silently == 0 by default, always,
or with the right set of compiler flags, it generates a warning, or an error.

Hence, I define it here, using the same definition as in platform.h.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weird... I've been refreshing this page, but GitHub didn't show me this comment until now.

Can you also mention this in the PR summary please? Then the PR LGTM.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@terrelln
Copy link
Contributor

terrelln commented Jan 24, 2022

Can you please also mention what the first commit, "fixed minor compression difference in btlazy2" is fixing?

It isn't clear what the problem is. And I didn't realize that this PR had a functional change until I looked at that commit title, I assumed it was all refactoring.

@Cyan4973
Copy link
Contributor Author

Cyan4973 commented Jan 24, 2022

Can you please also mention what the first commit, "fixed minor compression difference in btlazy2 " is fixing?

It isn't clear what the problem is. And I didn't realize that this PR had a functional change until I looked at that commit title, I assumed it was all refactoring.

The PR, as a whole, is "just" a refactoring.
But the first commit introduced a very subtle compression difference in btlazy2 mode, at line 330.
It's an off-by-one difference in the way cost of offsets are evaluated.
It barely ever triggers, but with billions of attempts, it did happen to change the selection outcome from time to time.
The last commit fixes that, so that the PR as a whole does generate exactly the same compressed outcome as before, including for btlazy2.

@terrelln
Copy link
Contributor

Can you add that to the PR summary please?

@Cyan4973
Copy link
Contributor Author

Can you add that to the PR summary please?

done

@Cyan4973 Cyan4973 merged commit cc7d23b into dev Jan 24, 2022
@Cyan4973 Cyan4973 deleted the offbase branch January 13, 2023 04:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants