## uproot overview

Uproot is a pure Python + Numpy reader of ROOT files.

   * Without a C++ layer, there are no memory ownership issues between C++ and Python.
   * Different design: instead of delivering event objects, uproot delivers columns of data as (jagged) arrays.
   * Not hampered by slow Python execution because data in ROOT files are laid out as (jagged) arrays: just need to cast them as Numpy arrays.

_(Disclosure: I'm the author of uproot.)_

In uproot, files, directories within files, and TTrees/TBranches behave like Python dicts.

In [None]:
import uproot
file = uproot.open("http://scikit-hep.org/uproot/examples/Event.root")
file.keys()

In [None]:
file["ProcessID0"]

In [None]:
file["htime"]

In [None]:
tree = file["T"]
tree

In [None]:
tree.keys()   # allkeys()

To get a sense of what a TTree contains, use `show`.

In [None]:
tree.show()

To read a (jagged) array, call `array` or `arrays`.

In [None]:
tree["fTracks.fMass2"].array()

In [None]:
tree.array("fTracks.fMass2")

In [None]:
tree.arrays(["fTracks.fMass2", "fTracks.fCharge"])

## Interpretations

The translation from ROOT data to an array is given by the branch's `interpretation` (if it has one).

In [None]:
tree["fNtrack"].interpretation

In [None]:
tree["fTemperature"].interpretation

In [None]:
tree["fMatrix[4][4]"].interpretation

In [None]:
tree["fTracks.fMass2"].interpretation

In [None]:
tree["fTracks.fCharge"].interpretation

In [None]:
tree["fH"].interpretation

If a branch has no `interpretation`, it can't be read. Either it's a no-data branch (exists just for structure) or it's an instance of uproot's incompleteness.

In [None]:
print(tree["fTracks.fPointValue"].interpretation)   # as of April 2019, this one has no interpretation

The bytes can be read and even divided along entry boundaries, but we don't yet know how to turn the bytes into an array.

In [None]:
uproot.asdebug

In [None]:
tree["fTracks.fPointValue"].array(uproot.asdebug)

Complex classes are generated based on the ROOT file's self-describing streamers, but they aren't necessarily fast to read (more Python than Numpy).

In [None]:
tree["fH"].interpretation

In [None]:
histograms = tree["fH"].array()
histograms

In [None]:
histograms[0].__dict__

## Fitting into memory constraints

Restricting the range of entries avoids reading too many baskets (chunks on disk).

In [None]:
tree.numentries

In [None]:
tree["fMatrix[4][4]"].numbaskets

In [None]:
tree["fMatrix[4][4]"].array(entrystart=600, entrystop=800)

Typically, you'd want to read chunk of entries from all interesting branches, do some work, then move on to the next chunk: use `iterate`.

In [None]:
import numpy
for arrays in tree.iterate(["fTracks.fPx", "fTracks.fPy"], entrysteps=300):
    mag = numpy.sqrt(arrays[b"fTracks.fPx"]**2 + arrays[b"fTracks.fPy"]**2)
    print(len(mag), mag[0][0])

The same for a set of files is `uproot.iterate` (supply file names with wildcards and tree name).

In [None]:
# no wildcards for XRootD and HTTP
filenames = ["http://scikit-hep.org/uproot/examples/HZZ" + x + ".root" for x in ["", "-zlib", "-lz4", "-lzma"]]
for arrays in uproot.iterate(filenames, "events", ["Muon_Px", "Muon_Py"]):
    mag = numpy.sqrt(arrays[b"Muon_Px"]**2 + arrays[b"Muon_Py"]**2)
    print(len(mag), mag[1][0])

## Caching

If you 

## Parallel processing

## Lazy evaluation

## Dask (distributed computing)