root_numpy is a Python extension module that provides an efficient interface between ROOT and NumPy. root_numpy's internals are compiled C++ and can therefore handle large amounts of data much faster than equivalent pure Python implementations.
With your ROOT data in NumPy form, make use of NumPy's broad library, including fancy indexing, slicing, broadcasting, random sampling, sorting, shape transformations, linear algebra operations, and more. See this introductory tutorial to get started. NumPy is the fundamental library of the scientific Python ecosystem. Using NumPy arrays opens up many new possibilities beyond what ROOT offers. Convert your TTrees into NumPy arrays and use SciPy for numerical integration and optimization, matplotlib for plotting, pandas for data analysis, statsmodels for statistical modelling, scikit-learn for machine learning, and perform quick exploratory analysis in interactive environments like IPython, especially IPython's popular notebook feature.
At the core of root_numpy are powerful and flexible functions for converting
ROOT TTrees into NumPy
recarrays
or structured arrays
as well as converting NumPy arrays back into ROOT TTrees. root_numpy can
convert branches of strings and basic types such as bool, int, float, double,
etc. as well as variable-length and fixed-length multidimensional arrays and 1D
or 2D vectors of basic types and strings. root_numpy can also create columns in
the output array that are expressions involving the TTree branches (i.e.
'vect.Pt() / 1000'
) similar to TTree::Draw()
.
For example, get a NumPy structured or record array from a TTree (copy and paste the following examples into your Python prompt):
import ROOT
from root_numpy import root2array, root2rec, tree2rec
from root_numpy.testdata import get_filepath
filename = get_filepath('test.root')
# Convert a TTree in a ROOT file into a NumPy structured array
arr = root2array(filename, 'tree')
# The TTree name is always optional if there is only one TTree in the file
# Convert a TTree in a ROOT file into a NumPy record array
rec = root2rec(filename, 'tree')
# Get the TTree from the ROOT file
rfile = ROOT.TFile(filename)
intree = rfile.Get('tree')
# Convert the TTree into a NumPy record array
rec = tree2rec(intree)
Include specific branches or expressions and only entries passing a selection:
rec = tree2rec(intree,
branches=['x', 'y', 'sqrt(y)', 'TMath::Landau(x)', 'cos(x)*sin(y)'],
selection='z > 0',
start=0, stop=10, step=2)
The above conversion creates an array with five columns from the branches x and y where z is greater than zero and only looping on the first ten entries in the tree while skipping every second entry.
Now convert our array back into a TTree:
from root_numpy import array2tree, array2root
# Rename the fields
rec.dtype.names = ('x', 'y', 'sqrt_y', 'landau_x', 'cos_x_sin_y')
# Convert the NumPy record array into a TTree
tree = array2tree(rec, name='tree')
# Dump directly into a ROOT file without using PyROOT
array2root(rec, 'selected_tree.root', 'tree')
root_numpy also provides a function for filling a ROOT histogram from a NumPy array:
from ROOT import TH2D, TCanvas
from root_numpy import fill_hist
import numpy as np
# Fill a ROOT histogram from a NumPy array
hist = TH2D('name', 'title', 20, -3, 3, 20, -3, 3)
fill_hist(hist, np.random.randn(1E6, 2))
canvas = TCanvas(); hist.Draw('LEGO2')
and a function for creating a random NumPy array by sampling a ROOT function or histogram:
from ROOT import TF2, TH1D
from root_numpy import random_sample
# Sample a ROOT function
func = TF2('func', 'sin(x)*sin(y)/(x*y)')
arr = random_sample(func, 1E6)
# Sample a ROOT histogram
hist = TH1D('hist', 'hist', 10, -3, 3)
hist.FillRandom('gaus')
arr = random_sample(hist, 1E6)
Also see the root2hdf5 script in the rootpy package that uses root_numpy and PyTables to convert all TTrees in a ROOT file into the HDF5 format.