Revert choice to change default Snapshot TTree compression settings#21753
Open
vepadulano wants to merge 1 commit intoroot-project:masterfrom
Open
Revert choice to change default Snapshot TTree compression settings#21753vepadulano wants to merge 1 commit intoroot-project:masterfrom
vepadulano wants to merge 1 commit intoroot-project:masterfrom
Conversation
e1242ab to
c6a05cd
Compare
root-project@61088a3 made the deliberate choice to change the default compression settings when calling Snapshot with TTree output format from 101 to 505. This choice was the result of internal discussion within the team, based on the empirical evidence available up to that point that showed ZSTD outperforming ZLIB on all metrics for the TTree datasets (as well as for RNTuple datasets). This commit proposes to revert that choice based on new evidence, summarised at https://github.com/vepadulano/ttree-lossless-compression-studies. The main takeaway message from that study is that TTree datasets with branches of type ROOT::RVec where many (if not all) of the collections are empty are compressed better with ZLIB than with ZSTD. Being this case actually quite relevant, as most datasets are made of branches with collection types and as the result of analysis steps these collections may be skimmed quite drastically, there is enough motivation to move the default compression settings for TTree back to 101. This commit changes the default RSnapshotOptions values for compression settings respectively to 'kUndefined' and '0' for the compression algorithm and the compression level. When the 'kUndefined' compression algorithm is used, Snapshot will behave differently depending on the output format: the settings will be 101 for TTree and 505 for RNTuple. Add one test per respective output format to check the default values are respected.
c6a05cd to
2a688e1
Compare
Test Results 22 files 22 suites 3d 5h 39m 46s ⏱️ For more details on these failures, see this check. Results for commit 2a688e1. ♻️ This comment has been updated with latest results. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
61088a3 made the deliberate choice to change the default compression settings when calling Snapshot with TTree output format from 101 to 505. This choice was the result of internal discussion within the team, based on the empirical evidence available up to that point that showed ZSTD outperforming ZLIB on all metrics for the TTree datasets (as well as for RNTuple datasets).
This commit proposes to revert that choice based on new evidence, summarised at https://github.com/vepadulano/ttree-lossless-compression-studies. The main takeaway message from that study is that TTree datasets with branches of type ROOT::RVec where many (if not all) of the collections are empty are compressed better with ZLIB than with ZSTD. Being this case actually quite relevant, as most datasets are made of branches with collection types and as the result of analysis steps these collections may be skimmed quite drastically, there is enough motivation to move the default compression settings for TTree back to 101.
This commit changes the default RSnapshotOptions values for compression settings respectively to 'kUndefined' and '0' for the compression algorithm and the compression level. When the 'kUndefined' compression algorithm is used, Snapshot will behave differently depending on the output format: the settings will be 101 for TTree and 505 for RNTuple.
Add one test per respective output format to check the default values are respected.
Note
This PR is motivated by https://root-forum.cern.ch/t/large-changes-in-branch-sizes-in-later-builds-of-root/64753