Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core/zip+lzma] Properly account for header size #14523

Merged
merged 4 commits into from
Feb 8, 2024

Conversation

hahnjo
Copy link
Member

@hahnjo hahnjo commented Feb 2, 2024

The compression algorithms only see the buffers without the header, so the sizes have to be adjusted accordingly.

Fixes #14508

FYI @Dr15Jones

@phsft-bot
Copy link
Collaborator

Starting build on ROOT-performance-centos8-multicore/soversion, ROOT-ubuntu2204/nortcxxmod, ROOT-ubuntu2004/python3, mac12arm/cxx20, windows10/default
How to customize builds

@hahnjo
Copy link
Member Author

hahnjo commented Feb 2, 2024

IMHO this is quite bad - to how many ROOT versions should we backport this? This problem is basically hiding there all the way since commit 4b54256 in 2011!

Copy link

github-actions bot commented Feb 2, 2024

Test Results

    10 files      10 suites   1d 22h 34m 10s ⏱️
 2 497 tests  2 495 ✅   0 💤 2 ❌
23 869 runs  23 593 ✅ 272 💤 4 ❌

For more details on these failures, see this check.

Results for commit 45e09b2.

♻️ This comment has been updated with latest results.

@phsft-bot
Copy link
Collaborator

Starting build on ROOT-performance-centos8-multicore/soversion, ROOT-ubuntu2204/nortcxxmod, ROOT-ubuntu2004/python3, mac12arm/cxx20, windows10/default
How to customize builds

@hahnjo hahnjo changed the title [core/lzma] Properly account for kHeaderSize [core/zip+lzma] Properly account for header size Feb 2, 2024
@hahnjo
Copy link
Member Author

hahnjo commented Feb 2, 2024

So @jblomer naively asked "what about ZLIB", and it turns out to be equally wrong... I also added a test that at least catches the compression side of things. For the decompression, it's a bit harder because it's not clear how to check if the library read more bytes than it should have (without it running into errors because of decompression errors).

@hahnjo
Copy link
Member Author

hahnjo commented Feb 2, 2024

After more investigation, it seems that all existing code paths in TKey.cxx, TBufferXML.cxx, TMessage.cxx, and TBasket.cxx allocate a buffer that is slightly larger, so it's probably not an as critical problem for the non-RNTuple case...

core/zip/src/RZip.cxx Outdated Show resolved Hide resolved
@phsft-bot
Copy link
Collaborator

Starting build on ROOT-performance-centos8-multicore/soversion, ROOT-ubuntu2204/nortcxxmod, ROOT-ubuntu2004/python3, mac12arm/cxx20, windows10/default
How to customize builds

Copy link
Member

@pcanal pcanal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(pending review)

@hahnjo
Copy link
Member Author

hahnjo commented Feb 5, 2024

To reiterate on why we "only" need to fix gzip and lzma: The other compression algorithms already do this,

if (cxlevel >= 4) {
returnStatus = LZ4_compress_HC(src, &tgt[kHeaderSize], *srcsize, *tgtsize - kHeaderSize, cxlevel);
} else {
returnStatus = LZ4_compress_default(src, &tgt[kHeaderSize], *srcsize, *tgtsize - kHeaderSize);
}

root/core/zip/src/RZip.cxx

Lines 145 to 148 in e8545f7

state.out_buf = tgt;
state.out_size = (unsigned) (*tgtsize);
state.out_offset = HDRSIZE;
state.R__window_size = 0L;

(that's the very original code with the old compression algorithm; it uses an offset which is correct by construction)

size_t retval = ZSTD_compressCCtx(fCtx.get(),
&tgt[kHeaderSize], static_cast<size_t>(*tgtsize - kHeaderSize),
src, static_cast<size_t>(*srcsize),
2*cxlevel);

@pcanal
Copy link
Member

pcanal commented Feb 6, 2024

Note that the problem appears (and/or is uncovered) 'recently' and was "introduced" by e052b58: [ntuple] RPageSinkBuf: Always seal before CommitCluster (prior to this commit valgrind is silent).

@hahnjo
Copy link
Member Author

hahnjo commented Feb 6, 2024

Somewhat, that commit made it more likely for regular users to run into the problem: Essentially, the commit moved code around to always seal pages in CommitPage. Before it was only done when implicit MT was enabled, and the original reproducer code fails in ROOT 6.30 with a preceding call to ROOT::EnableImplicitMT(). In my understanding, that's also what CMSSW does, so for them it doesn't make a difference. (In fact, the problematic pattern of exactly allocating as many bytes as the uncompressed page holds can be traced back to commits 1ea8447 and 88bd1f0 at the very beginning of RPageSinkBuf's history)

@hahnjo
Copy link
Member Author

hahnjo commented Feb 7, 2024

ping @pcanal it would be really good to have the fix in, my understanding is that it blocks CMS RNTuple work...

core/zip/test/ZipTest.cxx Outdated Show resolved Hide resolved
R__unzipZLIB is already properly subtracting it from srcsize.
lzma_code must only see the buffers without the header, so the sizes
have to be adjusted accordingly.

Fixes root-project#14508
In practice, the target size is greater or equal the source size in most
cases for ROOT, but add this additional correct check to fuzz the inputs
in the next commit.
This would have found any of the previous three commits.
@phsft-bot
Copy link
Collaborator

Starting build on ROOT-performance-centos8-multicore/soversion, ROOT-ubuntu2204/nortcxxmod, ROOT-ubuntu2004/python3, mac12arm/cxx20, windows10/default
How to customize builds

@pcanal
Copy link
Member

pcanal commented Feb 7, 2024

To reiterate on why we "only" need to fix gzip and lzma: The other compression algorithms already do this,

Indeed. The diffs was made less obvious because:
ZLIB decompression is already doing the right thing.
ZLIB and LZMA use a struct to pass the configuration rather than function argument so the code pattern is slight different.

it seems that all existing code paths in TKey.cxx, TBufferXML.cxx, TMessage.cxx, and TBasket.cxx allocate a buffer that is slightly larger, so it's probably not an as critical problem

Right, the allocations is done:

      Int_t buflen = TMath::Max(512,fKeylen + fObjlen + 9*nbuffers + 28); //add 28 bytes in case object is placed in a deleted gap

and used via

          char *bufcur = &fBuffer[fKeylen];

so the only extra is 9*nbuffers + 28 which reduces the risk of writing the end since the size is larger than fObjlen + kHeaderSize but that leaves 2 additional question:

  • why are those added?
  • why doesn't RNTuple need it?

01bb696 hints that the compression engine were seen as writing past the end ... it is plausible since the prior delta was ``9*nbuffers + 8withnbuffers==0` is common case. (in hindsight, this commit was not investigated long enough and needed a test).

The 9*nbuffers is meant to be for the keys and is now inaccurate (most algorithms have a 9 bytes header but for lz4 we have seemingly 73. This part is missing from the RNTuple usage. The consequences is that on data set that is not compressible TTree might use a bit more space (header + barely compressed size) vs RNTuple (uncompressed size which might be less than header + barely compressed size).

This of course assume that the compression algorithm strictly respect the limit given (it would be a serious security risk if not).

The 8 is commented as "8 bytes in case object is placed in a deleted gap" (the 20 was seemingly added to work-around the bug fixed here) and is not clear to me (the 'delete gap' is most likely talking about a space 'freed' inside a ROOT file.

Copy link
Member

@pcanal pcanal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks. This patch needs to be backported to as many older releases as possible as it can lead to a memory over-write even in the case of TTree (the compression is being given a memory area smaller than it is and unless the compression algorithm stops before it has over-inflated the object by 28+9 bytes, it might still happens)

A a side note, the extra size given by TKey and TBaskets probably should be removed (delta understanding why there was a +8 "in case object is placed in a deleted gap".

@pcanal
Copy link
Member

pcanal commented Feb 7, 2024

(delta understanding why there was a +8 "in case object is placed in a deleted gap".

I am now guessing that this was a micro optimization to better manage the memory. We should also consider to remove it.

@hahnjo hahnjo merged commit 73d8c3d into root-project:master Feb 8, 2024
13 of 15 checks passed
@hahnjo hahnjo deleted the core-zip branch February 8, 2024 07:31
@hahnjo
Copy link
Member Author

hahnjo commented Feb 8, 2024

01bb696 hints that the compression engine were seen as writing past the end ... it is plausible since the prior delta was 9*nbuffers + 8 with nbuffers==0 is common case. (in hindsight, this commit was not investigated long enough and needed a test).

I think nbuffers >= 1 in all cases, so we should always have 9 additional bytes beyond what we tell R__zipMultipleAlgorithm.

This of course assume that the compression algorithm strictly respect the limit given (it would be a serious security risk if not).

Yes, we have to operate under that assumption.

This patch needs to be backported to as many older releases as possible as it can lead to a memory over-write even in the case of TTree (the compression is being given a memory area smaller than it is and unless the compression algorithm stops before it has over-inflated the object by 28+9 bytes, it might still happens)

Yes, I think the compression algorithms stop at the buffer sizes we give them. Unless I'm missing something, this means only RNTuple was affected by this and TTree was fine because of the slightly larger buffers? For now, I've opened backports for 6.30 (#14624), 6.28 (#14625), and 6.26 (#14626). If we find that TTree is also affected, we can (and have to) open more backports.

A a side note, the extra size given by TKey and TBaskets probably should be removed (delta understanding why there was a +8 "in case object is placed in a deleted gap".

Ok, we can try (in master). We have to be careful though, I don't want to introduce more memory errors for writing TTrees...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Using LZMA compression with RNTupleWriter leads to memory corruption
4 participants