-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[core/zip+lzma] Properly account for header size #14523
Conversation
Starting build on |
IMHO this is quite bad - to how many ROOT versions should we backport this? This problem is basically hiding there all the way since commit 4b54256 in 2011! |
Test Results 10 files 10 suites 1d 22h 34m 10s ⏱️ For more details on these failures, see this check. Results for commit 45e09b2. ♻️ This comment has been updated with latest results. |
Starting build on |
So @jblomer naively asked "what about ZLIB", and it turns out to be equally wrong... I also added a test that at least catches the compression side of things. For the decompression, it's a bit harder because it's not clear how to check if the library read more bytes than it should have (without it running into errors because of decompression errors). |
After more investigation, it seems that all existing code paths in |
Starting build on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(pending review)
To reiterate on why we "only" need to fix gzip and lzma: The other compression algorithms already do this, Lines 54 to 58 in e8545f7
Lines 145 to 148 in e8545f7
(that's the very original code with the old compression algorithm; it uses an offset which is correct by construction) root/core/zstd/src/ZipZSTD.cxx Lines 32 to 35 in e8545f7
|
Note that the problem appears (and/or is uncovered) 'recently' and was "introduced" by e052b58: [ntuple] RPageSinkBuf: Always seal before CommitCluster (prior to this commit valgrind is silent). |
Somewhat, that commit made it more likely for regular users to run into the problem: Essentially, the commit moved code around to always seal pages in |
ping @pcanal it would be really good to have the fix in, my understanding is that it blocks CMS RNTuple work... |
R__unzipZLIB is already properly subtracting it from srcsize.
lzma_code must only see the buffers without the header, so the sizes have to be adjusted accordingly. Fixes root-project#14508
In practice, the target size is greater or equal the source size in most cases for ROOT, but add this additional correct check to fuzz the inputs in the next commit.
This would have found any of the previous three commits.
Starting build on |
Indeed. The diffs was made less obvious because:
Right, the allocations is done:
and used via
so the only extra is
01bb696 hints that the compression engine were seen as writing past the end ... it is plausible since the prior delta was ``9*nbuffers + 8 The This of course assume that the compression algorithm strictly respect the limit given (it would be a serious security risk if not). The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks. This patch needs to be backported to as many older releases as possible as it can lead to a memory over-write even in the case of TTree (the compression is being given a memory area smaller than it is and unless the compression algorithm stops before it has over-inflated the object by 28+9 bytes, it might still happens)
A a side note, the extra size given by TKey and TBaskets probably should be removed (delta understanding why there was a +8 "in case object is placed in a deleted gap".
I am now guessing that this was a micro optimization to better manage the memory. We should also consider to remove it. |
I think
Yes, we have to operate under that assumption.
Yes, I think the compression algorithms stop at the buffer sizes we give them. Unless I'm missing something, this means only RNTuple was affected by this and TTree was fine because of the slightly larger buffers? For now, I've opened backports for 6.30 (#14624), 6.28 (#14625), and 6.26 (#14626). If we find that TTree is also affected, we can (and have to) open more backports.
Ok, we can try (in |
The compression algorithms only see the buffers without the header, so the sizes have to be adjusted accordingly.
Fixes #14508
FYI @Dr15Jones