<br><br><br><br><br>

# Getting ROOT data into Numpy

<br><br><br><br><br>

<br><br><br><br><br><br>

<span style="font-size: 24px; margin-left: 230px"><b>Method 1: PyROOT</b></span>

<br><br><br><br><br><br>

In [2]:
import ROOT

# Note: this is TFile::Open(...)
file = ROOT.TFile.Open("http://scikit-hep.org/uproot/examples/HZZ.root")
tree = file.Get("events")

for i, event in enumerate(tree):
    print(event.MET_px, event.MET_py)
    if i > 10:
        break

5.912771224975586 2.5636332035064697
24.76520347595215 -16.349109649658203
-25.78508758544922 16.237131118774414
8.619895935058594 -22.78654670715332
5.393138885498047 -1.3100523948669434
-3.7594752311706543 -19.417020797729492
23.962148666381836 -9.049156188964844
-57.533348083496094 -20.48767852783203
42.416194915771484 -94.35086059570312
-1.9144694805145264 -23.96303367614746
19.710058212280273 4.645508766174316
-35.538055419921875 -14.753822326660156


In [3]:
# That worked, but it would be too slow for large datasets.
# PyROOT does A LOT of processing each time it is invoked, so we don't want that in a loop over events.

# However, there's an alternative that goes straight to Numpy:
tree.AsMatrix(["MET_px", "MET_py"])

array([[  5.91277122,   2.5636332 ],
       [ 24.76520348, -16.34910965],
       [-25.78508759,  16.23713112],
       ...,
       [ 18.10164642,  50.29071808],
       [ 79.87519073, -52.35145187],
       [ 19.71374893,  -3.59541821]])

In [5]:
# If you need the branch names, you can get them with return_labels=True

data, labels = tree.AsMatrix(["MET_px", "MET_py"], return_labels=True)
print(f"labels = {labels}")

import pandas
pandas.DataFrame(data, columns=labels)

labels = ['MET_px', 'MET_py']


Unnamed: 0,MET_px,MET_py
0,5.912771,2.563633
1,24.765203,-16.349110
2,-25.785088,16.237131
3,8.619896,-22.786547
4,5.393139,-1.310052
5,-3.759475,-19.417021
6,23.962149,-9.049156
7,-57.533348,-20.487679
8,42.416195,-94.350861
9,-1.914469,-23.963034


In [7]:
# If your data have more than one value per event, PyROOT can handle that...

for i, event in enumerate(tree):
    print(len(event.Muon_Px), "muons:", [x for x in event.Muon_Px])
    if i > 10:
        break

2 muons: [-52.89945602416992, 37.7377815246582]
1 muons: [-0.8164593577384949]
2 muons: [48.987831115722656, 0.8275666832923889]
2 muons: [22.08833122253418, 76.6919174194336]
2 muons: [45.171321868896484, 39.75095748901367]
2 muons: [9.228110313415527, -5.793715000152588]
2 muons: [12.538717269897461, 29.541839599609375]
1 muons: [34.883758544921875]
2 muons: [-53.16697311401367, 11.491869926452637]
2 muons: [-67.01485443115234, -18.118755340576172]
2 muons: [15.983028411865234, 34.68440628051758]
2 muons: [-70.51190948486328, -38.028743743896484]


In [8]:
# ...but tree.AsMatrix cannot.

tree.AsMatrix(["Muon_Px"])

array([[0.],
       [0.],
       [0.],
       ...,
       [0.],
       [0.],
       [0.]])

Error in <TTreeReaderValueBase::GetBranchDataType()>: Must use TTreeReaderArray to read branch Muon_Px: it contains an array or a collection.
Error in <TTreeReaderValueBase::CreateProxy()>: The branch Muon_Px contains data of type {UNDETERMINED TYPE}, which does not have a dictionary.


<img src="img/03-coming-soon-1.png" width="93%">

<img src="img/03-coming-soon-2.png" width="93%">

<br><br><br><br><br><br>

<span style="font-size: 24px; margin-left: 230px"><b>Method 2: root_numpy</b></span>

<br><br><br><br><br><br>

In [11]:
# root_numpy is built on top of C++ ROOT, not PyROOT
import root_numpy

# it can extract Numpy arrays from a PyROOT tree
root_numpy.tree2array(tree, ["MET_px", "MET_py"])

# or directly from a filename/treename
root_numpy.root2array("http://scikit-hep.org/uproot/examples/HZZ.root",
                      "events",
                      ["MET_px", "MET_py"])

array([(  5.912771,   2.5636332), ( 24.765203, -16.34911  ),
       (-25.785088,  16.237131 ), ..., ( 18.101646,  50.290718 ),
       ( 79.87519 , -52.35145  ), ( 19.713749,  -3.5954182)],
      dtype=[('MET_px', '<f4'), ('MET_py', '<f4')])

And it uses all the same tricks as `TTree::Draw` to loop over events with minimal overhead.

<img src="img/root-numpy-fast.png" width="100%">