Skip to content

Revert choice to change default Snapshot TTree compression settings#21753

Open
vepadulano wants to merge 1 commit intoroot-project:masterfrom
vepadulano:rdf-revert-snapshot-default-ttree-comp
Open

Revert choice to change default Snapshot TTree compression settings#21753
vepadulano wants to merge 1 commit intoroot-project:masterfrom
vepadulano:rdf-revert-snapshot-default-ttree-comp

Conversation

@vepadulano
Copy link
Copy Markdown
Member

61088a3 made the deliberate choice to change the default compression settings when calling Snapshot with TTree output format from 101 to 505. This choice was the result of internal discussion within the team, based on the empirical evidence available up to that point that showed ZSTD outperforming ZLIB on all metrics for the TTree datasets (as well as for RNTuple datasets).

This commit proposes to revert that choice based on new evidence, summarised at https://github.com/vepadulano/ttree-lossless-compression-studies. The main takeaway message from that study is that TTree datasets with branches of type ROOT::RVec where many (if not all) of the collections are empty are compressed better with ZLIB than with ZSTD. Being this case actually quite relevant, as most datasets are made of branches with collection types and as the result of analysis steps these collections may be skimmed quite drastically, there is enough motivation to move the default compression settings for TTree back to 101.

This commit changes the default RSnapshotOptions values for compression settings respectively to 'kUndefined' and '0' for the compression algorithm and the compression level. When the 'kUndefined' compression algorithm is used, Snapshot will behave differently depending on the output format: the settings will be 101 for TTree and 505 for RNTuple.

Add one test per respective output format to check the default values are respected.

Note

This PR is motivated by https://root-forum.cern.ch/t/large-changes-in-branch-sizes-in-later-builds-of-root/64753

@vepadulano vepadulano self-assigned this Mar 31, 2026
@vepadulano vepadulano force-pushed the rdf-revert-snapshot-default-ttree-comp branch from e1242ab to c6a05cd Compare March 31, 2026 12:21
Copy link
Copy Markdown
Contributor

@enirolf enirolf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

root-project@61088a3
made the deliberate choice to change the default compression settings when
calling Snapshot with TTree output format from 101 to 505. This choice was the
result of internal discussion within the team, based on the empirical evidence
available up to that point that showed ZSTD outperforming ZLIB on all metrics
for the TTree datasets (as well as for RNTuple datasets).

This commit proposes to revert that choice based on new evidence, summarised at
https://github.com/vepadulano/ttree-lossless-compression-studies. The main
takeaway message from that study is that TTree datasets with branches of type
ROOT::RVec where many (if not all) of the collections are empty are compressed
better with ZLIB than with ZSTD. Being this case actually quite relevant, as
most datasets are made of branches with collection types and as the result of
analysis steps these collections may be skimmed quite drastically, there is
enough motivation to move the default compression settings for TTree back to
101.

This commit changes the default RSnapshotOptions values for compression settings
respectively to 'kUndefined' and '0' for the compression algorithm and the
compression level. When the 'kUndefined' compression algorithm is used, Snapshot
will behave differently depending on the output format: the settings will be 101
for TTree and 505 for RNTuple.

Add one test per respective output format to check the default values are
respected.
@vepadulano vepadulano force-pushed the rdf-revert-snapshot-default-ttree-comp branch from c6a05cd to 2a688e1 Compare March 31, 2026 14:12
@vepadulano vepadulano requested a review from enirolf March 31, 2026 14:12
@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 31, 2026

Test Results

    22 files      22 suites   3d 5h 39m 46s ⏱️
 3 833 tests  3 831 ✅  1 💤 1 ❌
75 674 runs  75 655 ✅ 18 💤 1 ❌

For more details on these failures, see this check.

Results for commit 2a688e1.

♻️ This comment has been updated with latest results.

@vepadulano vepadulano added the clean build Ask CI to do non-incremental build on PR label Apr 1, 2026
@vepadulano vepadulano closed this Apr 1, 2026
@vepadulano vepadulano reopened this Apr 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

clean build Ask CI to do non-incremental build on PR in:RDataFrame

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants