Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix sequence validation and seqStore bounds check #3439

Merged

Conversation

daniellerozenblit
Copy link
Contributor

@daniellerozenblit daniellerozenblit commented Jan 20, 2023

While working on the Sequence Compression API fuzzer, I found bugs in in ZSTD_validateSequence() as well as in ZSTD_copySequencesToSeqStoreExplicitBlockDelim() that allow one additional seqDef to be written to the seqStore, past the designated memory allocation.

Bug 1

The validation method currently only checks that the matchLength of each sequence is greater than the global minimum match length of 3. This is not accurate in cases where the minMatch in the cctx does not properly correspond with the true minimum match length of the sequences.

In most cases, this is fine. However, we currently allocate less memory for minMatch > 3 than for minMatch = 3. This means that ZSTD_validateSequence() does not always properly validate sequences given the memory constraints determined by minMatch in the cctx.

Fix

Fix: See changes in ZSTD_validateSequence()

We now loosely check that match lengths respect the cctx's minMatch. Meaning, we check that all match lengths are greater than 3 when minMatch is greater than 3. This is enough to validate that maxNbSeq is large enough to fit all sequences.

Bug 2

ZSTD_validateSequence() is optional, so we need to protect agains these cases even when it is not enabled. We have asserts that catch overwrites to the seqStore, but these are not run in the production build. There is also a check before ZSTD_storeSeq() to ensure that we have not already passed the memory limit. However, this allows us to write one additional sequence past MaxNbSeq before throwing an error.

We are able to roundtrip without error in cases where we only write one sequence past the given memory limit.

Fix

Fix: See changes in ZSTD_copySequencesToSeqStoreExplicitBlockDelim() and ZSTD_copySequencesToSeqStoreNoBlockDelim(). I have changed these checks to return an error if we have already reached maxNbSeq, so that we do not attempt to write an additional sequence.

@daniellerozenblit daniellerozenblit marked this pull request as ready for review January 20, 2023 18:51
@Cyan4973
Copy link
Contributor

I presume there is no change in performance when sequence validation is not enabled ?

@embg embg self-assigned this Jan 20, 2023
Copy link
Contributor

@embg embg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job finding and fixing these bugs in the sequence compression API!

Looks good, I just have a couple minor comments.

lib/compress/zstd_compress.c Outdated Show resolved Hide resolved
tests/zstreamtest.c Show resolved Hide resolved
tests/zstreamtest.c Show resolved Hide resolved
@daniellerozenblit
Copy link
Contributor Author

I presume there is no change in performance when sequence validation is not enabled ?

I ran some benchmarks and there doesn't appear to be any change in performance.

Copy link
Contributor

@embg embg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@@ -53,6 +53,7 @@ const char* ERR_getErrorString(ERR_enum code)
case PREFIX(dstBuffer_wrong): return "Destination buffer is wrong";
case PREFIX(srcBuffer_wrong): return "Source buffer is wrong";
case PREFIX(externalMatchFinder_failed): return "External matchfinder returned an error code";
case PREFIX(invalid_external_sequences): return "External matchfinder returned an error code";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please change this error message to something like "External sequences are not valid"?

@@ -53,6 +53,7 @@ const char* ERR_getErrorString(ERR_enum code)
case PREFIX(dstBuffer_wrong): return "Destination buffer is wrong";
case PREFIX(srcBuffer_wrong): return "Source buffer is wrong";
case PREFIX(externalMatchFinder_failed): return "External matchfinder returned an error code";
case PREFIX(invalid_external_sequences): return "External sequences are not valid";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: even error codes have a naming scheme!
Looking at existing ones, the pattern is typically topic_qualifier.

For this case, it would be something like :
externalSequences_invalid.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants