# Accessing TTree data with PyROOT (and others)

Very likely, most of your data are in ROOT TTrees. Here are various ways of getting the data out.

In [None]:
import ROOT

In [None]:
# Note: this is TFile::Open(...)
file = ROOT.TFile.Open("http://scikit-hep.org/uproot/examples/HZZ.root")
tree = file.Get("events")

In [None]:
for event in tree:
    print(event.MET_px, event.MET_py)

That worked, but it would be uncomfortable on large datasets because expressions like `event.MET_px` does a lot of processing _in each event_ to produce a number. The same reasons to use Numpy over Python for loops applies here (to an even greater degree).

Fortunately, ROOT now has methods to access TTree data as Numpy arrays.

In [None]:
tree.AsMatrix(["MET_px", "MET_py"])

The labels (`MET_px`, `MET_py`) are missing, but here's a way to get them:

In [None]:
data, labels = tree.AsMatrix(["MET_px", "MET_py"], return_labels=True)
labels

And use them:

In [None]:
recarray = data.reshape(-1).view([(x, data.dtype) for x in labels])
recarray

In [None]:
recarray["MET_px"]

In [None]:
recarray["MET_py"]

Another way to use them: as a Pandas DataFrame.

In [None]:
import pandas
pandas.DataFrame(data, columns=labels)

## Nested structure

Not all data have one value per event/entry.

In [None]:
for event in tree:
    print([x for x in event.Muon_Px])

These kinds of data don't have a Numpy analogue, so `AsMatrix` can't read them.

In [None]:
tree.AsMatrix(["Muon_Px"])

ROOT doesn't (yet) have a method for this, but two Python packages, root_numpy and uproot, do.

In [None]:
import root_numpy
jaggedarray1 = root_numpy.tree2array(tree, ["Muon_Px"])
jaggedarray1

This is a Numpy array of Python objects, each of which is a small Numpy array. It's randomly distributed in memory and can't be sliced as a multidimensional object.

In [None]:
# the pointer to the array for event 1 is not 16 bytes after the pointer to the array for event 2
jaggedarray1[1][0].ctypes.data - jaggedarray1[0][0].ctypes.data

In [None]:
jaggedarray1[:, 0]

In uproot, these "jagged arrays" are a builtin type.

In [None]:
import uproot
tree2 = uproot.open("http://scikit-hep.org/uproot/examples/HZZ.root")["events"]
jaggedarray2 = tree2.array("Muon_Px")
jaggedarray2

This object is built out of large, contiguous Numpy arrays and has slicing operations.

In [None]:
print(jaggedarray2.offsets)
print(jaggedarray2.content)

In [None]:
jaggedarray2[:10, 0]

## Arbitrary objects

PyROOT should be able to read any kind of object, as it's a front end for ROOT.

In [None]:
file2 = ROOT.TFile.Open("http://scikit-hep.org/uproot/examples/HZZ-objects.root")
tree3 = file2.Get("events")

In [None]:
for event in tree3:
    print(event.muonp4, [x.Pt() for x in event.muonp4])

Non-numeric data can't be read with root_numpy.

In [None]:
root_numpy.tree2array(tree3, ["muonp4"])

But uproot can read some custom classes, including user-defined classes.

In [None]:
tree4 = uproot.open("http://scikit-hep.org/uproot/examples/HZZ-objects.root")["events"]
jaggedarray3 = tree4.array("muonp4")
jaggedarray3

In [None]:
jaggedarray3.pt

In [None]:
jaggedarray3[:10].mass

## Performance

PyROOT for loops should only be used for small datasets or small tests; root_numpy is compiled code, generally faster but its "array of arrays" structure is an issue; uproot is generally fastest (streamlined Numpy calls).

In [None]:
%%timeit
for event in tree:
    [x for x in event.Muon_Px]

In [None]:
%%timeit
root_numpy.tree2array(tree, ["Muon_Px"])

In [None]:
%%timeit
tree2.array("Muon_Px")

## Summary

Which method you choose depends on accessibility of ROOT, what types of data you need to read, and the scale of your dataset.

<table style="font-size: 22pt; margin-top: 50px">
    <tr style="font-weight: bold"><td>Method</td><td>relationship to ROOT</td><td>Data types</td><td>Performance</td></tr>
    <tr><td>PyROOT for loop</td><td>part of ROOT</td><td>any</td><td>slow</td></tr>
    <tr><td>PyROOT AsMatrix</td><td>part of ROOT</td><td>one number per entry</td><td>fast</td></tr>
    <tr><td>root_numpy</td><td>compiles into a ROOT version</td><td>number(s) per entry</td><td>fast for one number per entry</td></tr>
    <tr><td>uproot</td><td>independent implementation</td><td>number(s) per entry, most objects</td><td>fast</td></tr>
</table>