Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RF] RooDataSet conversion to TTree fails for large datasets (bytecount too large) #12710

Closed
vlisovsk opened this issue Apr 25, 2023 · 5 comments

Comments

@vlisovsk
Copy link

Describe the bug

I have a RooDataSet with (say) 100 million events.
I try to do

RooDataSet::setDefaultStorageType(RooAbsData::Tree);
const TTree* tree8 = mydataw_z.GetClonedTree();

This works well for smaller datasets, but for my huge dataset I get the following error:

Error in <TBufferFile::WriteByteCount>: bytecount too large (more than 1073741822)
Error in <TBufferFile::WriteByteCount>: bytecount too large (more than 1073741822)
Error in <TBufferFile::CheckByteCount>: object of class TObjArray read too many bytes: 1600049012 instead of 526307188
Warning in <TBufferFile::CheckByteCount>: TObjArray::Streamer() not in sync with data, fix Streamer()
Warning in <TBufferFile::CheckObject>: reference to object of unavailable class TList, offset=4216542 pointer will be 0
Error in <TExMap::Remove>: key 2005325560 not found at 371
Warning in <TBufferFile::CheckObject>: reference to object of unavailable class TObject, offset=2005325560 pointer will be 0
Error in <TExMap::Remove>: key 364036325 not found at 135
Warning in <TBufferFile::CheckObject>: reference to object of unavailable class TList, offset=364036325 pointer will be 0
Error in <TExMap::Remove>: key 1061746196 not found at 217
Warning in <TBufferFile::CheckObject>: reference to an unavailable class, pointers of that type will be 0
Warning in <TBufferFile::CheckObject>: reference to object of unavailable class TObject, offset=1171537962 pointer will be 0
Error in <TExMap::Remove>: key 586225820 not found at 447
Warning in <TBufferFile::CheckObject>: reference to object of unavailable class TObject, offset=586225820 pointer will be 0
Error in <TExMap::Remove>: key 423978783 not found at 84
Warning in <TBufferFile::CheckObject>: reference to object of unavailable class TList, offset=423978783 pointer will be 0
Warning in <TBufferFile::CheckObject>: reference to object of unavailable class TObject, offset=4207193 pointer will be 0
Error in <TExMap::Remove>: key 839042239 not found at 12
Warning in <TBufferFile::CheckObject>: reference to object of unavailable class TBranchRef, offset=839042239 pointer will be 0
Error in <TBufferFile::CheckByteCount>: object of class TTree read too many bytes: 883788679 instead of 526307551
Warning in <TBufferFile::CheckByteCount>: TTree::Streamer() not in sync with data, fix Streamer()

 *** Break *** segmentation violation
root.exe(60607,0x102138580) malloc: Incorrect checksum for freed object 0x15c1e1a00: probably modified after being freed.
Corrupt value: 0x0
root.exe(60607,0x102138580) malloc: *** set a breakpoint in malloc_error_break to debug

Expected behavior

The tree should be written to file without failures.

To Reproduce

I have prepared a rather minimal example: https://cernbox.cern.ch/s/jhiOyKZJN89I3Hq
It is based on the RooFit example because my actual use case is saving the result of the sPlot into a TTree, but the sPlot is not the issue here.

Setup

  1. ROOT version: 6.28/02
  2. Operating system: MacOS (but the same happens on lxplus)
  3. How you obtained ROOT: macports (but the same happens on lxplus).

Additional context

I would be glad to learn if there is a more efficient way of saving RooDataSet into a TTree. I tried to find if there is a RooDataSet conversion to RDataFrame but found only the inverse operation.

@vlisovsk vlisovsk added the bug label Apr 25, 2023
@guitargeek guitargeek self-assigned this Apr 25, 2023
@guitargeek guitargeek changed the title RooDataSet conversion to TTree fails for large datasets (bytecount too large) [RF] RooDataSet conversion to TTree fails for large datasets (bytecount too large) Apr 25, 2023
@guitargeek
Copy link
Contributor

Thanks for reporting this!

It will take some time to get a useful answer, because this seems to be quite a fundamental problem that was already discussed here with no solution:
https://sft.its.cern.ch/jira/browse/ROOT-10686

Thanks for setting up a reproducer. This will help me to figure out what can be done.

@dpiparo
Copy link
Member

dpiparo commented Jan 29, 2024

I believe this is another incarnation of the same limitation encountered here https://its.cern.ch/jira/projects/ROOT/issues/ROOT-10450

@ferdymercury
Copy link
Collaborator

ferdymercury commented Jan 31, 2024

@dpiparo
Copy link
Member

dpiparo commented Feb 5, 2024

This is the reference issue in GH #6734

@dpiparo dpiparo closed this as completed Feb 5, 2024
Copy link

github-actions bot commented Feb 6, 2024

Hi @dpiparo, @guitargeek,

It appears this issue is closed, but wasn't yet added to a project. Please add upcoming versions that will include the fix, or 'not applicable' otherwise.

Sincerely,
🤖

@ferdymercury ferdymercury added this to Issues in Fixed in: not applicable via automation Feb 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

4 participants