Skip to content
This repository has been archived by the owner on Jan 7, 2023. It is now read-only.

Commit

Permalink
Merge pull request #264 from ndawe/master
Browse files Browse the repository at this point in the history
Docs: include benchmark plot for tree2array
  • Loading branch information
ndawe committed Jul 20, 2016
2 parents 38b683b + 88c46da commit 8022e9e
Show file tree
Hide file tree
Showing 8 changed files with 223 additions and 80 deletions.
82 changes: 3 additions & 79 deletions README.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
.. -*- mode: rst -*-
`[see full documentation] <http://rootpy.github.com/root_numpy/>`_
`[see the full documentation] <http://rootpy.github.com/root_numpy/>`_

root_numpy: The interface between ROOT and NumPy
================================================
Expand Down Expand Up @@ -46,83 +46,7 @@ of basic types and strings. root_numpy can also create columns in the output
array that are expressions involving the TTree branches similar to
``TTree::Draw()``.

For example, get a structured NumPy array from a TTree (copy and paste the
following examples into your Python prompt):
Did we mention that root_numpy is fast?

.. code-block:: python
.. image:: benchmarks/bench_tree2array.png

from root_numpy import root2array, tree2array
from root_numpy.testdata import get_filepath
filename = get_filepath('test.root')
# Convert a TTree in a ROOT file into a NumPy structured array
arr = root2array(filename, 'tree')
# The TTree name is always optional if there is only one TTree in the file
# Or first get the TTree from the ROOT file
import ROOT
rfile = ROOT.TFile(filename)
intree = rfile.Get('tree')
# and convert the TTree into an array
array = tree2array(intree)
Include specific branches or expressions and only entries passing a selection:

.. code-block:: python
array = tree2array(intree,
branches=['x', 'y', 'sqrt(y)', 'TMath::Landau(x)', 'cos(x)*sin(y)'],
selection='z > 0',
start=0, stop=10, step=2)
The above conversion creates an array with five columns from the branches
x and y where z is greater than zero and only looping on the first ten entries
in the tree while skipping every second entry.

Now convert our array back into a TTree:

.. code-block:: python
from root_numpy import array2tree, array2root
# Rename the fields
array.dtype.names = ('x', 'y', 'sqrt_y', 'landau_x', 'cos_x_sin_y')
# Convert the NumPy array into a TTree
tree = array2tree(array, name='tree')
# Or write directly into a ROOT file without using PyROOT
array2root(array, 'selected_tree.root', 'tree')
root_numpy also provides a function for filling a ROOT histogram from a NumPy
array:

.. code-block:: python
from ROOT import TH2D
from root_numpy import fill_hist
import numpy as np
# Fill a ROOT histogram from a NumPy array
hist = TH2D('name', 'title', 20, -3, 3, 20, -3, 3)
fill_hist(hist, np.random.randn(1000000, 2))
hist.Draw('LEGO2')
and a function for creating a random NumPy array by sampling a ROOT function
or histogram:

.. code-block:: python
from ROOT import TF2, TH1D
from root_numpy import random_sample
# Sample a ROOT function
func = TF2('func', 'sin(x)*sin(y)/(x*y)')
arr = random_sample(func, 1000000)
# Sample a ROOT histogram
hist = TH1D('hist', 'hist', 10, -3, 3)
hist.FillRandom('gaus')
arr = random_sample(hist, 1000000)
1 change: 1 addition & 0 deletions benchmarks/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
*.pkl
Binary file added benchmarks/bench_tree2array.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
90 changes: 90 additions & 0 deletions benchmarks/bench_tree2array.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
from rootpy.io import TemporaryFile
import rootpy
from root_numpy import array2tree
import numpy as np
import uuid
import random
import string
import timeit
import pickle
import platform
import matplotlib.pyplot as plt

with open('hardware.pkl', 'r') as pkl:
info = pickle.load(pkl)

# construct system hardware information string
hardware = '{cpu}\nStorage: {hdd}\nROOT-{root} Python-{python} NumPy-{numpy}'.format(
cpu=info['CPU'], hdd=info['HDD'],
root=rootpy.ROOT_VERSION, python=platform.python_version(),
numpy=np.__version__)

rfile = TemporaryFile()

def randomword(length):
return ''.join(random.choice(string.lowercase) for i in range(length))

def make_tree(entries, branches=10, dtype=np.double):
dtype = np.dtype([(randomword(20), dtype) for idx in range(branches)])
array = np.zeros(entries, dtype=dtype)
return array2tree(array, name=uuid.uuid4().hex)

# time vs entries
num_entries = np.logspace(1, 7, 20, dtype=np.int)
root_numpy_times = []
root_times = []
for entries in num_entries:
print(entries)
iterations = 20 if entries < 1e5 else 4
tree = make_tree(entries, branches=1)
branchname = tree.GetListOfBranches()[0].GetName()
root_numpy_times.append(
min(timeit.Timer('tree2array(tree)',
setup='from root_numpy import tree2array; from __main__ import tree').repeat(3, iterations)) / iterations)
root_times.append(
min(timeit.Timer('draw("{0}", "", "goff")'.format(branchname),
setup='from __main__ import tree; draw = tree.Draw').repeat(3, iterations)) / iterations)

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 5))

ax1.plot(num_entries, root_numpy_times, '-o', label='root_numpy.tree2array()', linewidth=1.5)
ax1.plot(num_entries, root_times, '--o', label='ROOT.TTree.Draw()', linewidth=1.5)
ax1.set_xscale("log", nonposx='clip')
ax1.set_yscale("log", nonposx='clip')
ax1.legend(loc='lower right', frameon=False, fontsize=12)
ax1.set_ylabel('time [s]')
ax1.set_xlabel('number of entries')
ax1.text(0.05, 0.95, 'tree contains a single branch',
verticalalignment='top', horizontalalignment='left',
transform=ax1.transAxes, fontsize=12)
ax1.text(0.05, 0.85, hardware,
verticalalignment='top', horizontalalignment='left',
transform=ax1.transAxes, fontsize=10)

# time vs branches
num_branches = np.linspace(1, 10, 10, dtype=np.int)
root_numpy_times = []
root_times = []
for branches in num_branches:
print(branches)
tree = make_tree(1000000, branches=branches)
branchnames = [branch.GetName() for branch in tree.GetListOfBranches()]
branchname = ':'.join(branchnames)
root_numpy_times.append(
min(timeit.Timer('tree2array(tree)',
setup='from root_numpy import tree2array; from __main__ import tree').repeat(3, 3)) / 3)
root_times.append(
min(timeit.Timer('draw("{0}", "", "goff candle")'.format(branchname),
setup='from __main__ import tree; draw = tree.Draw').repeat(3, 3)) / 3)

ax2.plot(num_branches, root_numpy_times, '-o', label='root_numpy.tree2array()', linewidth=1.5)
ax2.plot(num_branches, root_times, '--o', label='ROOT.TTree.Draw()', linewidth=1.5)
ax2.legend(loc='lower right', frameon=False, fontsize=12)
ax2.set_ylabel('time [s]')
ax2.set_xlabel('number of branches')
ax2.text(0.05, 0.95, 'tree contains 1M entries per branch',
verticalalignment='top', horizontalalignment='left',
transform=ax2.transAxes, fontsize=12)

fig.tight_layout()
fig.savefig('bench_tree2array.png', transparent=True)
42 changes: 42 additions & 0 deletions benchmarks/sysinfo.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
"""
Get system hardware information
http://stackoverflow.com/a/4194146/1002176
"""
import cpuinfo
import sys, os, fcntl, struct
import pickle

if os.geteuid() > 0:
print("ERROR: Must be root to use")
sys.exit(1)

with open(sys.argv[1], "rb") as fd:
# tediously derived from the monster struct defined in <hdreg.h>
# see comment at end of file to verify
hd_driveid_format_str = "@ 10H 20s 3H 8s 40s 2B H 2B H 4B 6H 2B I 36H I Q 152H"
# Also from <hdreg.h>
HDIO_GET_IDENTITY = 0x030d
# How big a buffer do we need?
sizeof_hd_driveid = struct.calcsize(hd_driveid_format_str)

# ensure our format string is the correct size
# 512 is extracted using sizeof(struct hd_id) in the c code
assert sizeof_hd_driveid == 512

# Call native function
buf = fcntl.ioctl(fd, HDIO_GET_IDENTITY, " " * sizeof_hd_driveid)
fields = struct.unpack(hd_driveid_format_str, buf)
hdd = fields[15].strip()

cpu = cpuinfo.get_cpu_info()['brand']

print(cpu)
print("Hard Drive Model: {0}".format(hdd))

info = {
'CPU': cpu,
'HDD': hdd,
}

with open('hardware.pkl', 'w') as pkl:
pickle.dump(info, pkl)
1 change: 1 addition & 0 deletions docs/benchmarks
85 changes: 85 additions & 0 deletions docs/start.rst
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,91 @@ root_numpy should now be ready to use::
dtype=[('n_int', '<i4'), ('f_float', '<f4'), ('d_double', '<f8')])


A Quick Tutorial
================

For example, get a structured NumPy array from a TTree (copy and paste the
following examples into your Python prompt):

.. code-block:: python
from root_numpy import root2array, tree2array
from root_numpy.testdata import get_filepath
filename = get_filepath('test.root')
# Convert a TTree in a ROOT file into a NumPy structured array
arr = root2array(filename, 'tree')
# The TTree name is always optional if there is only one TTree in the file
# Or first get the TTree from the ROOT file
import ROOT
rfile = ROOT.TFile(filename)
intree = rfile.Get('tree')
# and convert the TTree into an array
array = tree2array(intree)
Include specific branches or expressions and only entries passing a selection:

.. code-block:: python
array = tree2array(intree,
branches=['x', 'y', 'sqrt(y)', 'TMath::Landau(x)', 'cos(x)*sin(y)'],
selection='z > 0',
start=0, stop=10, step=2)
The above conversion creates an array with five columns from the branches
x and y where z is greater than zero and only looping on the first ten entries
in the tree while skipping every second entry.

Now convert our array back into a TTree:

.. code-block:: python
from root_numpy import array2tree, array2root
# Rename the fields
array.dtype.names = ('x', 'y', 'sqrt_y', 'landau_x', 'cos_x_sin_y')
# Convert the NumPy array into a TTree
tree = array2tree(array, name='tree')
# Or write directly into a ROOT file without using PyROOT
array2root(array, 'selected_tree.root', 'tree')
root_numpy also provides a function for filling a ROOT histogram from a NumPy
array:

.. code-block:: python
from ROOT import TH2D
from root_numpy import fill_hist
import numpy as np
# Fill a ROOT histogram from a NumPy array
hist = TH2D('name', 'title', 20, -3, 3, 20, -3, 3)
fill_hist(hist, np.random.randn(1000000, 2))
hist.Draw('LEGO2')
and a function for creating a random NumPy array by sampling a ROOT function
or histogram:

.. code-block:: python
from ROOT import TF2, TH1D
from root_numpy import random_sample
# Sample a ROOT function
func = TF2('func', 'sin(x)*sin(y)/(x*y)')
arr = random_sample(func, 1000000)
# Sample a ROOT histogram
hist = TH1D('hist', 'hist', 10, -3, 3)
hist.FillRandom('gaus')
arr = random_sample(hist, 1000000)
Have Questions or Found a Bug?
==============================

Expand Down
2 changes: 1 addition & 1 deletion root_numpy/info.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,5 +6,5 @@
|_| \___/ \___/ \__|___|_| |_|\__,_|_| |_| |_| .__/ \__, | {0}
|_____| |_| |___/
"""
__version__ = '4.5.1.dev0'
__version__ = '4.5.2.dev0'
__doc__ = __doc__.format(__version__) # pylint:disable=redefined-builtin

0 comments on commit 8022e9e

Please sign in to comment.