## uproot overview

Uproot is a pure Python + Numpy reader of ROOT files.

   * Without a C++ layer, there are no memory ownership issues between C++ and Python.
   * Different design: instead of delivering event objects, uproot delivers columns of data as (jagged) arrays.
   * Not hampered by slow Python execution because data in ROOT files are laid out as (jagged) arrays: just need to cast them as Numpy arrays.

_(Disclosure: I'm the author of uproot.)_

In uproot, files, directories within files, and TTrees/TBranches behave like Python dicts.

In [2]:
import uproot
file = uproot.open("http://scikit-hep.org/uproot/examples/Event.root")
file.keys()

[b'ProcessID0;1', b'htime;1', b'T;1', b'hstat;1']

In [3]:
file["ProcessID0"]

<TProcessID b'ProcessID0' at 0x7f35c07ad8d0>

In [4]:
file["htime"]

<b'TH1F' b'htime' 0x7f35c07ab188>

In [7]:
tree = file["T"]
tree

<TTree b'T' at 0x7f35c0765780>

In [21]:
tree.keys()   # allkeys()

[b'event']

To get a sense of what a TTree contains, use `show`.

In [22]:
tree.show()

event                      TStreamerInfo              None
TObject                    TStreamerInfo              None
fUniqueID                  TStreamerBasicType         asdtype('>u4')
fBits                      TStreamerBasicType         asdtype('>u4')

fType[20]                  TStreamerBasicType         asdtype("('i1', (20,))")
fEventName                 TStreamerBasicType         asstring(4)
fNtrack                    TStreamerBasicType         asdtype('>i4')
fNseg                      TStreamerBasicType         asdtype('>i4')
fNvertex                   TStreamerBasicType         asdtype('>u4')
fFlag                      TStreamerBasicType         asdtype('>u4')
fTemperature               TStreamerBasicType         asdtype('>f4', 'float64')
fMeasures[10]              TStreamerBasicType         asdtype("('>i4', (10,))")
fMatrix[4][4]              TStreamerBasicType         asdtype("('>f4', (4, 4))", "('<f8', (4, 4))")
fClosestDistance           TStreamerBasicPointer      None
fEv

To read a (jagged) array, call `array` or `arrays`.

In [26]:
tree["fTracks.fMass2"].array()

<JaggedArray [[4.5 4.5 4.5 ... 4.5 4.5 4.5] [4.5 4.5 4.5 ... 4.5 4.5 4.5] [8.90625 8.90625 8.90625 ... 8.90625 8.90625 8.90625] ... [4.5 4.5 4.5 ... 4.5 4.5 4.5] [4.5 4.5 4.5 ... 4.5 4.5 4.5] [8.90625 8.90625 8.90625 ... 8.90625 8.90625 8.90625]] at 0x7f35c06c75c0>

In [27]:
tree.array("fTracks.fMass2")

<JaggedArray [[4.5 4.5 4.5 ... 4.5 4.5 4.5] [4.5 4.5 4.5 ... 4.5 4.5 4.5] [8.90625 8.90625 8.90625 ... 8.90625 8.90625 8.90625] ... [4.5 4.5 4.5 ... 4.5 4.5 4.5] [4.5 4.5 4.5 ... 4.5 4.5 4.5] [8.90625 8.90625 8.90625 ... 8.90625 8.90625 8.90625]] at 0x7f35bc427710>

In [28]:
tree.arrays(["fTracks.fMass2", "fTracks.fCharge"])

{b'fTracks.fMass2': <JaggedArray [[4.5 4.5 4.5 ... 4.5 4.5 4.5] [4.5 4.5 4.5 ... 4.5 4.5 4.5] [8.90625 8.90625 8.90625 ... 8.90625 8.90625 8.90625] ... [4.5 4.5 4.5 ... 4.5 4.5 4.5] [4.5 4.5 4.5 ... 4.5 4.5 4.5] [8.90625 8.90625 8.90625 ... 8.90625 8.90625 8.90625]] at 0x7f35bc55d940>,
 b'fTracks.fCharge': <JaggedArray [[1.0 1.0 1.0 ... 1.0 1.0 0.0] [1.0 0.0 0.0 ... 0.0 1.0 -1.0] [-1.0 1.0 1.0 ... -1.0 1.0 1.0] ... [1.0 1.0 1.0 ... 0.0 -1.0 1.0] [0.0 0.0 1.0 ... 1.0 0.0 1.0] [1.0 -1.0 0.0 ... 0.0 0.0 1.0]] at 0x7f35bc55d5f8>}

## Interpretations

The translation from ROOT data to an array is given by the branch's `interpretation` (if it has one).

In [29]:
tree["fNtrack"].interpretation

asdtype('>i4')

In [30]:
tree["fTemperature"].interpretation

asdtype('>f4', 'float64')

In [31]:
tree["fMatrix[4][4]"].interpretation

asdtype("('>f4', (4, 4))", "('<f8', (4, 4))")

In [32]:
tree["fTracks.fMass2"].interpretation

asjagged(asfloat16(0.0, 0.0, 8, dtype([('exponent', 'u1'), ('mantissa', '>u2')]), dtype('float32')))

In [33]:
tree["fTracks.fCharge"].interpretation

asjagged(asdouble32(-1.0, 1.0, 2, dtype('>u4'), dtype('float64')))

In [40]:
tree["fH"].interpretation

asgenobj(TH1F)

If a branch has no `interpretation`, it can't be read. Either it's a no-data branch (exists just for structure) or it's an instance of uproot's incompleteness.

In [47]:
print(tree["fTracks.fPointValue"].interpretation)   # as of April 2019, this one has no interpretation

None


The bytes can be read and even divided along entry boundaries, but we don't yet know how to turn the bytes into an array.

In [48]:
uproot.asdebug

asjagged(asdtype('uint8'))

In [49]:
tree["fTracks.fPointValue"].array(uproot.asdebug)

<JaggedArray [[1 85 85 ... 170 170 170] [0 1 85 ... 85 85 85] [0 1 85 ... 85 85 0] ... [0 1 85 ... 170 170 170] [0 0 1 ... 170 170 170] [1 85 85 ... 170 170 170]] at 0x7f35c072bb70>

If the desired arrays are too big to load into memory, you can restrict it to a range of entries (uproot only reads as many baskets as it needs to).

In [41]:
tree.numentries

1000

In [42]:
tree["fMatrix[4][4]"].numbaskets

5

In [43]:
tree["fMatrix[4][4]"].array(entrystart=600, entrystop=800)

array([[[-1.48842251,  0.22893755,  0.73481667,  0.        ],
        [-0.81405222,  0.32352245,  3.30477071,  0.        ],
        [ 1.07104862,  1.1267221 ,  1.35107005,  0.        ],
        [ 0.        ,  0.        ,  0.        ,  0.        ]],

       [[ 0.72956365,  0.36551571, -0.92489249,  0.        ],
        [-0.08615459,  0.70054299,  0.84200442,  0.        ],
        [-0.27986413,  1.12187469,  2.70830393,  0.        ],
        [ 0.        ,  0.        ,  0.        ,  0.        ]],

       [[ 1.50197184,  0.69582516, -0.31665999,  0.        ],
        [-0.70513338,  1.77296209,  1.1221925 ,  0.        ],
        [-1.39828825,  0.46608233,  3.8889432 ,  0.        ],
        [ 0.        ,  0.        ,  0.        ,  0.        ]],

       ...,

       [[-0.44362187,  0.12667903,  0.78897899,  0.        ],
        [-0.55786729,  0.78306931,  1.66213036,  0.        ],
        [ 1.32299483,  4.01059055,  5.46123695,  0.        ],
        [ 0.        ,  0.        ,  0.        ,  0.

Typically, you'd use this in a loop over chunks of entries. In the body of the loop, you put your 