- 
                Notifications
    You must be signed in to change notification settings 
- Fork 1.4k
          [df] Add RNTupleWriteOptions to RSnapshotOptions and add format-compatibility checks
          #19976
        
          New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
| Test Results    21 files      21 suites   3d 17h 49m 34s ⏱️ For more details on these failures, see this check. Results for commit 52895b9. ♻️ This comment has been updated with latest results. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
| // compression settings in fNTupleWriteOpts have not been changed, and the compression algorithm or level in fOptions | ||
| // have. | ||
| if (fOptions.fNTupleWriteOpts.GetCompression() == RCompressionSetting::EDefaults::kUseGeneralPurpose && | ||
| (fOptions.fCompressionAlgorithm != RCompressionSetting::EAlgorithm::kZLIB || | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that leaves the possibility that a user explicitly sets fCompression... to what happens to be the current default, but then is surprised because RNTuple doesn't pick it up but continue to use its zstd default. I don't have a good solution though, maybe that's ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, good point. Perhaps to avoid situations like these it might be better to also warn users for fCompression..., and require them to be set through fNTupleWriteOpts only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After the merger of #20030, the behavior is now such that fCompression[...] will be propagated to fNTupleWriteOpts if and only if in there it has not changed from the default. I think that should (reasonably) remove possible ambiguities regarding what compression is used.
12b5c24    to
    cb535da      
    Compare
  
    cb535da    to
    2ae5d46      
    Compare
  
    N.B., compression settings that have been set directly through the snapshot options are propagated to the RNTuple write options, provided that they haven't been set there already.
... and warn users when an option has been set that has no effect on the chosen output format.
2ae5d46    to
    52895b9      
    Compare
  
    | After discussing offline with @vepadulano, we concluded that in the end it makes more sense to throw an exception in case either of  | 
| #ifndef ROOT_RSNAPSHOTOPTIONS | ||
| #define ROOT_RSNAPSHOTOPTIONS | ||
|  | ||
| #include "ROOT/RNTupleWriteOptions.hxx" | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies for bringing this up only now. By looking at the code more closely, I am reminded that a lot of work has been put into removing all header inclusions in RDataFrame of other parts of ROOT I/O (namely TTree, TChain, TFile and related headers). This now introduces another header inclusion which is logically similar to including something from TTree, and I'm afraid it goes against the rest of the work that was done. I don't have yet a clear idea on how to overcome this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two ways I can see:
- Pass the RNTupleWriteOptions via std::unique_ptrand default tonullptr.
- Unpack the various options from RNTupleWriteOptions flat into RSnapshotOptions, and then only internally construct the RNTupleWriteOptions in the .cxx
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The second option would have the slight advantage of removing the need for the checks of compression settings, we just wouldn't create another compression setting option in the RSnapshotOptions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, thanks for bringing this up. I agree that the second option seems more favorable. That would also keep the struct as simple as possible. Perhaps it's not necessary to add all write options for the time being, but only the ones we expect are more commonly tweaked, but on the other hand it also doesn't cost much more effort to just add them all at once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In principle looks ok to me. I think I would prefer for the moment an internal function in RDF or RNTuple that compares the RNTuple write options. I'm not sure how useful this is as a general API, and once it is in we are bound to maintain it.
For snapshotting to RNTuple, users can now pass an
RNTupleWriteOptionstoRSnapshotOptionsto configure the output RNTuple. Compression settings that have been set directly throughRSnapshotOptions::fCompressionAlgorithmandRSnapshotOptions::fCompressionLevelare propagated toRSnapshotOptions::fNTupleWriteOpts, provided that they haven't been set there already as well.In addition, a check has been added to warn users when they set options that are specific to one output format, but
RSnapshotOptions.fOutputFormathas been set to use the other.Closes #19784.