Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retain tree name through TChain with RNTupleImporter #13088

Closed
bernhardmgruber opened this issue Jun 23, 2023 · 13 comments
Closed

Retain tree name through TChain with RNTupleImporter #13088

bernhardmgruber opened this issue Jun 23, 2023 · 13 comments
Assignees

Comments

@bernhardmgruber
Copy link
Contributor

Explain what you would like to see improved and how.

The other day someone helpful suggested this way of creating an artifically enlarged ntuple file for testing:

TChain chain;
for (int i = 0; i < 25; i++) chain.Add("B2HHH~zstd.root?#DecayTree"); // TTree file
ROOT::Experimental::RNTupleImporter::Create(&chain, "B2HHH~zstd~25x.ntuple")->Import();

Afterwards however, I was quite lost because the tree inside the result file was no longer called "DecayTree", but "ROOT::Experimental::RNTuple". I understand that if you mix multiple files with a TChain there is no unique name to pick. But if there was a unique name, I think it would be great to carry it over to the resulting file!

ROOT version

ROOT Version: 6.29/01
Built for linuxx8664gcc on Jun 22 2023, 16:45:00
From heads/master@v6-29-01-1712-g1d65f2ecf4

Installation method

built from source

Operating system

Linux

Additional context

No response

@bernhardmgruber bernhardmgruber changed the title Retain tree name through TChain Retain tree name through TChain with RNTupleImporter Jun 23, 2023
@vepadulano
Copy link
Member

Hi @bernhardmgruber , thank you for reporting this!

I understand the problem, we need to make sure we give a meaningful name to the RNTuple. Regarding your specific example above, I have a doubt about the design. Since the input is a TChain with no name, there is no evident unique name to give to the imported RNTuple. If the TChain had a name then that would be the correct name to assign. But in this case, we could:

  1. Get the name of the first tree in the first file of the chain, and assign that irrespective of the names of the other trees in the chain.
  2. Check that all trees of the chain have the same name (the first one), and assign that. If this is not the case, fallback to assigning the first name anyway ?
  3. Invent a new name for the imported RNTuple which will always be applied in case the user is importing from an un-named TChain. This could be anything from something simple like MyRNTuple to something more unique and complicated like R__IMPORTED_RNTUPLE.

Thoughts? Also inviting @pcanal and @jblomer to comment

@enirolf
Copy link
Contributor

enirolf commented Jun 26, 2023

Hello! I think in the case where the provided TChain does not have a name, picking the name of the first tree in the chain would be a sensible choice (provided this is well-documented, of course).

In any case, if a different name is desired, one can always call RNTupleImporter::SetNTupleName to change the name of the target RNTuple.

@bernhardmgruber
Copy link
Contributor Author

If the TChain had a name then that would be the correct name to assign.

Yes.
(I didn't know TChain's could have a name. I thought they were conceptually just concatenations of trees)

  1. Get the name of the first tree in the first file of the chain, and assign that irrespective of the names of the other trees in the chain.

That sounds too deliberate to me.

  1. Check that all trees of the chain have the same name (the first one), and assign that.

I think this would be the only sensible solution. But I don't kow how big TChains can get and whether such a check would become expensive at some point.

If this is not the case, fallback to assigning the first name anyway ?

Again, I think this is too deliberate.

  1. Invent a new name for the imported RNTuple which will always be applied in case the user is importing from an un-named TChain. This could be anything from something simple like MyRNTuple to something more unique and complicated like R__IMPORTED_RNTUPLE.

That is the the status quo, as observed by me. The name picked is ROOT::Experimental::RNTuple.

@vepadulano
Copy link
Member

But I don't kow how big TChains can get and whether such a check would become expensive at some point.

Yes, sorry I should have given a bit more context. A TChain could also have O(10^3) files, in extreme cases O(10^4).

Personally, I agree with @enirolf as I still think getting the first name is not that bad of an idea. It's surely not super elegant, but I believe it is at least better than giving it an arbitrary name like it's done currently. Maybe a compromise could be:

  1. Check how many files the TChain has.
  2. If it's less than THRESHOLD, then run the check that all files have the same tree name. The THRESHOLD number should probably be decided after a few performance benchmarks. If the files don't have all the same tree, I still think we should give the first tree name (and maybe issue a warning that we're doing so).
  3. If it's more, just take the first tree name (with a warning).

@bernhardmgruber
Copy link
Contributor Author

Alright, than maybe really just take the first tree's name. The algorithm for chosing a name involving a threshold may also be surprising to some users. Imagine relying on the TChan picking the first tree's name and then adding a bunch of more files and suddenly the name changes :)

@enirolf
Copy link
Contributor

enirolf commented Jun 28, 2023

One more thing to add, it is also possible to create a named TChain and have ROOT only the trees with that name:

TChain chain("DecayTree");
for (int i = 0; i < 25; i++) chain.Add("B2HHH~zstd.root"); // TTree file

This will of course only work when all trees you want to chain have the same name so we should still do something for the other cases (or the case you described in the original post), but just wanted to point out that this is an option (and in this case, the importer will use the original name).

@pcanal
Copy link
Member

pcanal commented Jun 28, 2023

One more thing to add, it is also possible to create a named TChain

Indeed. And this is the most common use case.

and have ROOT only the trees with that name:

Technically this is inaccurate. TChain::Add will default to using the chain name but it can be overriden:

TChain chain("DecayTree");
for (int i = 0; i < 25; i++) chain.Add("B2HHH~zstd.root"); // TTree file
for (int i = 0; i < 25; i++) chain.Add("B2HHH~zstd-2024.root?#BetterTreeName"); // TTree file

@pcanal
Copy link
Member

pcanal commented Jun 28, 2023

I think that we need to add a signature of ROOT::Experimental::RNTupleImporter::Create that let the user over-ride the default name (what ever it is).

For the default behavior I see only 2 sensible behavior:

  • Use the user provided name, if none, use the TChain name, if not set, use the first TTree and leave it at that.
    or
  • Use the user provided name, if none, use the TChain name, if not set, use the first TTree. In the later case (i.e. we guessed the name) throw an exception if encountering a different name.

Note that the 2nd behavior might actually be the prefer one ... depending on what we do if/when we encounter a TTree which a different structure. (I.e. we reject TTree with a different structure, we ought to reject the name. If we accept TTree with a different structure, we ought to ignore the name).

hadd for example let some differences slides (like added branches being ignored) and use the first file as the decider of the structure.

@bernhardmgruber
Copy link
Contributor Author

For completeness, RNTupleImporter already allows to supply a tree name:

auto imp = ROOT::Experimental::RNTupleImporter::Create(&chain, "B2HHH~zstd~10x.ntuple");
imp->SetNTupleName(“DecayTree”);
imp->Import();

My issue is just about an improvement of user experience.

@github-actions
Copy link

Hi @enirolf, @vepadulano,

It appears this issue is closed, but wasn't yet added to a project. Please add upcoming versions that will include the fix, or 'not applicable' otherwise.

Sincerely,
🤖

1 similar comment
@github-actions
Copy link

Hi @enirolf, @vepadulano,

It appears this issue is closed, but wasn't yet added to a project. Please add upcoming versions that will include the fix, or 'not applicable' otherwise.

Sincerely,
🤖

@github-actions
Copy link

github-actions bot commented Aug 1, 2023

Hi @enirolf, @vepadulano,

It appears this issue is closed, but wasn't yet added to a project. Please add upcoming versions that will include the fix, or 'not applicable' otherwise.

Sincerely,
🤖

1 similar comment
@github-actions
Copy link

github-actions bot commented Aug 2, 2023

Hi @enirolf, @vepadulano,

It appears this issue is closed, but wasn't yet added to a project. Please add upcoming versions that will include the fix, or 'not applicable' otherwise.

Sincerely,
🤖

@vepadulano vepadulano added this to Issues in Fixed in 6.30/00 via automation Aug 2, 2023
maksgraczyk pushed a commit to maksgraczyk/root that referenced this issue Jan 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Development

No branches or pull requests

4 participants