Skip to content
This repository has been archived by the owner on Jun 21, 2022. It is now read-only.

Add support to write Jagged Arrays to TTrees #477

Merged
merged 10 commits into from
May 1, 2020
Merged

Conversation

reikdas
Copy link
Collaborator

@reikdas reikdas commented Apr 26, 2020

Fixes #354

The user interface is described in the tests. In particular to be noted is the new dependence flag. For eg - https://github.com/scikit-hep/uproot/compare/write-jagged#diff-5f81591ed2ee25a817de2554fd35f608R1861

Writing Jagged Arrays of the int8 datatype is a little bit tricky and does not work yet - I am facing some difficulties in even getting the numbers out of a file created by ROOT. (This is for when using ROOT to read the values. uproot can read out the values correctly, even for the Jagged Arrays of type int8 written by uproot) Also, since uproot reads the data correctly, that makes it trickier to debug.

Weirdly, the travis test for reading a jagged array branch of type >i8 created by uproot, using ROOT fails - https://travis-ci.org/github/scikit-hep/uproot/jobs/679767938#L1994. The weird part is that it always passes on my local system(ROOT master) whereas it always fails on Travis(ROOT from conda). I would really appreciate any help in debugging this.

An update to the README should probably accompany this once the user interface has been finalised.

@reikdas reikdas requested a review from jpivarski April 26, 2020 16:49
Copy link
Member

@jpivarski jpivarski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple of argument name changes.

The int8 thing is weird. It only seems to affect certain Python versions, which is rather disturbing. Can you try installing two Python versions with and without the bug using conda environments (e.g. 3.6 and 3.7 or 3.7 and 3.8). Be sure to use ROOT from conda in both cases, since a ROOT installation mismatched to Python version should not work.

uproot/write/objects/TTree.py Outdated Show resolved Hide resolved
uproot/write/objects/TTree.py Outdated Show resolved Hide resolved
uproot/write/compress.py Outdated Show resolved Hide resolved
@jpivarski
Copy link
Member

@HDembinski, as long as you're not writing jagged arrays of int8, this should be ready to test.

Note that the argument names are likely to change. I suggested that "dependence" should become "lengths"; do you have other suggestions?

(In Awkward0, this quantity is called "counts", but that's changing in Awkward1 to avoid confusion with the reducer "count". The Awkward1 name for this, "ak.num", might be a little confusing out of context as a "num" argument because what's needed here is the name of the branch with the number of values in each entry, which singular "num" doesn't suggest. It's okay for "ak.num" because that has an "axis" argument.)

@reikdas
Copy link
Collaborator Author

reikdas commented Apr 26, 2020

The int8 thing is weird. It only seems to affect certain Python versions, which is rather disturbing. Can you try installing two Python versions with and without the bug using conda environments (e.g. 3.6 and 3.7 or 3.7 and 3.8). Be sure to use ROOT from conda in both cases, since a ROOT installation mismatched to Python version should not work.

@jpivarski
Just to clarify - the int8 Jagged Array branch is tricky to implement and I have not done so yet(and there are no tests currently for reading them via ROOT).
On the other hand, I have implemented >i8 Jagged Array branch writing and that works (on my local system running ROOT master). But the test validating this fails on Travis.
When you say that you noticed "it only seems to affect certain Python versions", is that because you saw that the Python 3.5 and Python 3.7 tests pass? The Python 3.5 and Python 3.7 tests pass because the writing tests do not run on those instances. I think for some reason ROOT is unable to be installed for those Python versions(strange because I am able to setup ROOT from conda with Python 3.7).

To debug this I set up a virtual environment running ROOT installed via conda for Python 3.7. But here I found that ROOT is unable to properly read TTrees containing >i8 type Jagged Array branches written by ROOT itself. This means there are no errors in uproot's creation of the >i8 type Jagged Array branch.
There are 2 possibilities for this behaviour -

  1. ROOT in conda has a bug (maybe it is on some commit where such a bug was introduced in ROOT)
  2. I am creating the file in a weird way that somehow manages to work for the ROOT commit I am on.

Here is how I created the file -

#include "TFile.h"
#include "TTree.h"
#include "TBranch.h"

void jagged3() {
    TFile *f;
    f = new TFile("jagged.root", "RECREATE");
    f->SetCompressionLevel(0);
    TTree *t = new TTree("t", "");
    Int_t n = 1;
    t->Branch("n", &n, "n/I");
    int64_t branch[100];
    int64_t count = 0;
    t->Branch("branch", branch, "branch[n]/L");
    for (int i=0; i<3; i++) {
        for (int j=0; j<n; j++) {
            branch[j] = count;
            count = count + 1;
        }
        t->Fill();
        n = n + 1;
    }
    t->Write();
    f->Close();
}

Reading it via my local system ROOT (master) for Python 3.8 -

[reik@reik-msi ~]$ python
Python 3.8.2 (default, Apr  8 2020, 14:31:25) 
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import ROOT
>>> f = ROOT.TFile.Open("jagged.root")
>>> tree = f.Get("t")
>>> for i, event in enumerate(tree):
...     print([x for x in event.branch])
... 
[0]
[1, 2]
[3, 4, 5]
>>> quit()

Reading it on my virtual machine running ROOT (from conda) for Python 3.7 -

(base) reik@reik-VirtualBox:~$ python
Python 3.7.6 | packaged by conda-forge | (default, Mar 23 2020, 23:03:20) 
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import ROOT
>>> f = ROOT.TFile.Open("jagged.root")
>>> tree = f.Get("t")
>>> for i, event in enumerate(tree):
...     print([x for x in event.branch])
... 
[0]
[1]
[3]

@jpivarski
Copy link
Member

I can reproduce this. (My primary ROOT version comes from Conda.) Also, it's not the file—I could do the SetBranchAddress thing in C++ and all the data were there—it's the Python interface.

Note that we can do this:

>>> import ROOT
>>> f = ROOT.TFile.Open("jagged.root")
>>> tree = f.Get("t")
>>> tree.Scan()
***********************************************
*    Row   * Instance *         n *    branch *
***********************************************
*        0 *        0 *         1 *         0 *
*        1 *        0 *         2 *         1 *
*        1 *        1 *         2 *         2 *
*        2 *        0 *         3 *         3 *
*        2 *        1 *         3 *         4 *
*        2 *        2 *         3 *         5 *
***********************************************
6

The data are definitely there. I'd ask @chrisburr about ROOT in Conda, but this is such a low-level manifestation of whatever the problem might be that I doubt the error could be discovered this way. I'll keep thinking about what would be a way to delve more deeply.

Also, I misunderstood earlier; I thought you had been saying that 8-bit integers (i.e. char) wasn't working—it's 8-byte integers (i.e. int64_t). Strangely, SetBranchAddress doesn't like me calling 8-byte integers an int64_t: it wants Long64_t (ROOT's own typedef). I don't know if that's related.

Sometimes, there can be 32-bit vs 64-bit issues across platforms because of how the 32-bit → 64-bit transition happened about 10 years ago. I still have to treat Windows as a special case because of how NumPy works under Windows.

The main issue here, though, is that the "count" seems to always be 1 in PyROOT. The issue isn't with the branch branch; it's with the n branch...

@reikdas reikdas force-pushed the write-jagged branch 3 times, most recently from 1736305 to 3f68423 Compare May 1, 2020 14:39
@reikdas
Copy link
Collaborator Author

reikdas commented May 1, 2020

I have replaced the earlier >i8 test with one that does not give an error in Travis.

@jpivarski
Copy link
Member

Are you done making changes to this PR, and would it produce an error message for any data types that aren't correctly covered ("correct" = uproot writes it, ROOT reads it)? If so, I'll merge it.

Thanks!

@reikdas
Copy link
Collaborator Author

reikdas commented May 1, 2020

I am done with the PR :) and yes, unsupported types will raise a NotImplementedError - https://github.com/scikit-hep/uproot/pull/477/files#diff-f040211c30d67567ed553c981f8bdabaR62

@jpivarski jpivarski merged commit 3719117 into master May 1, 2020
@jpivarski jpivarski deleted the write-jagged branch May 1, 2020 16:42
[10, 11, 12]])

with uproot.recreate(filename, compression=None) as f:
f["t"] = uproot.newtree({"branch": uproot.newbranch(numpy.dtype(">i4"), counter="n")})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe rename "counter" to "size"?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opened PR #481

@HDembinski
Copy link
Member

Thank you for adding this, this is a huge step forward for those of us who want to use uproot also to write hierarchical trees.

@HDembinski, as long as you're not writing jagged arrays of int8, this should be ready to test.

I looked into the tests, seems all good. Do you want me to try it out anyway?

Regarding int8, I actually have a branch of that type, it stores bit flags. I may be able to work around this. I don't have a solution for this issue, but I may be able to share some insight what could be the problem... The issues you found could be related to the some messy details regarding int8, uint8, char, signed char, and unsigned char. For one, ROOT has trouble distinguishing char from int8, when doing TTree::Scan, it prints int8 as char, even if the type of the branch is int8.

Apart from that, depending on the system, int8 may be just an alias to char, while on others these are two distinct types for the compiler. The different char types are somewhat special in C and C++ https://stackoverflow.com/questions/75191/what-is-an-unsigned-char
https://stackoverflow.com/questions/16138237/when-is-uint8-t-%E2%89%A0-unsigned-char

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Write TTrees with Jagged Array data
3 participants