Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor tweak to regex to fix #74. #77

Merged
merged 1 commit into from
Aug 25, 2020

Conversation

jpivarski
Copy link
Member

@tamasgal and @bernd1995, this fixes #74 and therefore provides a good model for JuliaHEP/UnROOT.jl#9. The information about a branch's dimensions are stored in the TLeaf's fTitle, and that needs to be parsed by a regex. The example @bernd1995 created had no leaf names:

tree_arr -> Branch("6dVec",&vec,"[6]/D");
tree_arr -> Branch("2x3Mat",&mat,"[2][3]/D");

as opposed to

tree_str -> Branch("2x3mat",&mat,"el[6]/D");

(where the leaf name is "el"), which I guess is legal, so the regex has to allow for that.

This CI is going to fail until I fix scikit-hep/awkward#416, which @tamasgal pointed out in another issue.

@tamasgal
Copy link
Contributor

Perfect, thanks for the follow-up!

@jpivarski
Copy link
Member Author

Ah—they passed because CI isn't testing Dask. That's okay; I'm going to be removing the Dask experiment, anyway. (It puts restrictions on Awkward Arrays that I complained about in scikit-hep/awkward#350.)

@jpivarski jpivarski merged commit 3827f96 into master Aug 25, 2020
@jpivarski jpivarski deleted the jpivarski/tweak-regex-for-issue-74 branch August 25, 2020 14:55
@jpivarski
Copy link
Member Author

which I guess is legal, so the regex has to allow for that.

I just tried it when working on another issue, and ROOT does complain:

root [8] tree->Branch("stuffo", &stuffo, "/L")
Warning in <TBranch::TBranch>: No name was given to the leaf number '0' in the leaflist of the branch 'stuffo'.
(TBranch *) 0x562798faa940

so it's not 100% allowed.

@tamasgal
Copy link
Contributor

OK interesting 😉 Do you plan to change it back or show a warning? At this very moment I don't see any danger in the relaxed regex but maybe encouraging a proper protocol is the better way, meaning that a warning/error is probably more appropriate than silently accepting it 🤔

@jpivarski
Copy link
Member Author

No, ROOT's warning is in the creation of these nameless leaves, but since ROOT doesn't forbid it, there can be files out there that have this feature. We therefore need to be able to read it.

This regex was one consequence of that; there might be others. If, for instance, we identify leaves as fields by name, we'd have to introduce substitute names. (I haven't checked to see whether ROOT already does that at creation-time.)

@tamasgal
Copy link
Contributor

Yep I see, you're right!

@briederer
Copy link

This is indeed very interesting.
I played around a little bit to look at the different responses by ROOT (v6.18/04).

root [1] TTree *tree = new TTree("nums","some numbers");
root [2] float f = 1;
root [3] tree -> Branch("float",&f,"/F");
Warning in <TBranch::TBranch>: No name was given to the leaf number '0' in the leaflist of the branch 'float'.
root [4] tree -> Branch("float2",&f,"float2/F");
root [5] float g[1] = {1};
root [8] tree -> Branch("float_arr",&g,"[1]/F");
root [9] tree -> Branch("float_arr2",&g,"float2[1]/F");
root [10] tree -> Print()
******************************************************************************
*Tree    :nums      : some numbers                                           *
*Entries :        1 : Total =            2926 bytes  File  Size =          0 *
*        :          : Tree compression factor =   1.00                       *
******************************************************************************
*Br    0 :float     : /F                                                     *
*Entries :        1 : Total  Size=        652 bytes  One basket in memory    *
*Baskets :        0 : Basket Size=      32000 bytes  Compression=   1.00     *
*............................................................................*
*Br    1 :float2    : float2/F                                               *
*Entries :        1 : Total  Size=        655 bytes  One basket in memory    *
*Baskets :        0 : Basket Size=      32000 bytes  Compression=   1.00     *
*............................................................................*
*Br    2 :float_arr : [1]/F                                                  *
*Entries :        1 : Total  Size=        652 bytes  One basket in memory    *
*Baskets :        0 : Basket Size=      32000 bytes  Compression=   1.00     *
*............................................................................*
*Br    3 :float_arr2 : float2[1]/F                                           *
*Entries :        1 : Total  Size=        673 bytes  One basket in memory    *
*Baskets :        0 : Basket Size=      32000 bytes  Compression=   1.00     *
*............................................................................*
root [11] tree -> Scan("*","","colsize=28");
****************************************************************************************************************************************
*    Row   *        float.float.__noname0 *                       float2 *         float_arr.float_arr. * float_arr2.float_arr2.float2 *
****************************************************************************************************************************************
*        0 *                            1 *                            1 *                            1 *                            1 *
****************************************************************************************************************************************

So ROOT only throws a warning when using nameless leaves for basic variables (i.e. non-arrays) like in the Branch float. However, when using unnamed arrays like in float_arr it seems to be no problem.
I didn't notice this until now, because usually when I am saving basic variables I use only tree->Branch("branchname",&var) and ROOT translates this automatically to tree->Branch("branchname",&var,"branchname/typeof(var)")

This regex was one consequence of that; there might be others. If, for instance, we identify leaves as fields by name, we'd have to introduce substitute names. (I haven't checked to see whether ROOT already does that at creation-time.)

Concerning the naming of the leaves this seems to behave in a very interesting behavior. While in tree->Print(); it looks like that the unnamed leave Branch (*Br 0 :float : /F *) does not have something like a 'default'-name assigned to it. However, in tree->Scan(); it seems like that there is some substructure to that. I haven't had time yet to look more into that but it thought already this small insight might be helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unable to parse higher-dimensional (basic) arrays in leaves
3 participants