## uproot overview

Uproot is a pure Python + Numpy reader of ROOT files.

   * Without a C++ layer, there are no memory ownership issues between C++ and Python.
   * Different design: instead of delivering event objects, uproot delivers columns of data as arrays.
   * Not hampered by slow Python execution because data in ROOT files are laid out as arrays: they just need to be cast as Numpy arrays.

_(Disclaimer: I'm the author of uproot.)_

In uproot, files, directories within files, and TTrees/TBranches behave like Python dicts.

In [1]:
import uproot
file = uproot.open("http://scikit-hep.org/uproot/examples/Event.root")
file.keys()

[b'ProcessID0;1', b'htime;1', b'T;1', b'hstat;1']

In [10]:
file["ProcessID0"]   # , file["ProcessID0"]._fTitle

<TProcessID b'ProcessID0' at 0x7f1add182358>

In [7]:
file["htime"].show()

                   0                                                     0.38739
                   +-----------------------------------------------------------+
[-inf, 0) 0.021839 |***                                                        |
[0, 1)    0.33352  |***************************************************        |
[1, 2)    0.30403  |**********************************************             |
[2, 3)    0.32452  |*************************************************          |
[3, 4)    0.35097  |*****************************************************      |
[4, 5)    0.36894  |********************************************************   |
[5, 6)    0.30728  |***********************************************            |
[6, 7)    0.30681  |***********************************************            |
[7, 8)    0.34156  |****************************************************       |
[8, 9)    0.16151  |*************************                                  |
[9, 10)   0        |        

In [8]:
tree = file["T"]
tree

<TTree b'T' at 0x7f1add1bd7f0>

In [9]:
tree.keys()   # allkeys()

[b'event']

To get a sense of what a TTree contains, use `show`.

In [11]:
tree.show()

event                      TStreamerInfo              None
TObject                    TStreamerInfo              None
fUniqueID                  TStreamerBasicType         asdtype('>u4')
fBits                      TStreamerBasicType         asdtype('>u4')

fType[20]                  TStreamerBasicType         asdtype("('i1', (20,))")
fEventName                 TStreamerBasicType         asstring(4)
fNtrack                    TStreamerBasicType         asdtype('>i4')
fNseg                      TStreamerBasicType         asdtype('>i4')
fNvertex                   TStreamerBasicType         asdtype('>u4')
fFlag                      TStreamerBasicType         asdtype('>u4')
fTemperature               TStreamerBasicType         asdtype('>f4', 'float64')
fMeasures[10]              TStreamerBasicType         asdtype("('>i4', (10,))")
fMatrix[4][4]              TStreamerBasicType         asdtype("('>f4', (4, 4))", "('<f8', (4, 4))")
fClosestDistance           TStreamerBasicPointer      None
fEv

To read a (jagged) array, call `array` or `arrays`.

In [12]:
tree["fTracks.fMass2"].array()

<JaggedArray [[4.5 4.5 4.5 ... 4.5 4.5 4.5] [4.5 4.5 4.5 ... 4.5 4.5 4.5] [8.90625 8.90625 8.90625 ... 8.90625 8.90625 8.90625] ... [4.5 4.5 4.5 ... 4.5 4.5 4.5] [4.5 4.5 4.5 ... 4.5 4.5 4.5] [8.90625 8.90625 8.90625 ... 8.90625 8.90625 8.90625]] at 0x7f1add132e80>

In [13]:
tree.array("fTracks.fMass2")

<JaggedArray [[4.5 4.5 4.5 ... 4.5 4.5 4.5] [4.5 4.5 4.5 ... 4.5 4.5 4.5] [8.90625 8.90625 8.90625 ... 8.90625 8.90625 8.90625] ... [4.5 4.5 4.5 ... 4.5 4.5 4.5] [4.5 4.5 4.5 ... 4.5 4.5 4.5] [8.90625 8.90625 8.90625 ... 8.90625 8.90625 8.90625]] at 0x7f1add107dd8>

In [14]:
tree.arrays(["fTracks.fMass2", "fTracks.fCharge"])

{b'fTracks.fMass2': <JaggedArray [[4.5 4.5 4.5 ... 4.5 4.5 4.5] [4.5 4.5 4.5 ... 4.5 4.5 4.5] [8.90625 8.90625 8.90625 ... 8.90625 8.90625 8.90625] ... [4.5 4.5 4.5 ... 4.5 4.5 4.5] [4.5 4.5 4.5 ... 4.5 4.5 4.5] [8.90625 8.90625 8.90625 ... 8.90625 8.90625 8.90625]] at 0x7f1add182080>,
 b'fTracks.fCharge': <JaggedArray [[1.0 1.0 1.0 ... 1.0 1.0 0.0] [1.0 0.0 0.0 ... 0.0 1.0 -1.0] [-1.0 1.0 1.0 ... -1.0 1.0 1.0] ... [1.0 1.0 1.0 ... 0.0 -1.0 1.0] [0.0 0.0 1.0 ... 1.0 0.0 1.0] [1.0 -1.0 0.0 ... 0.0 0.0 1.0]] at 0x7f1add182240>}

## Interpretations

The translation from ROOT data to an array is given by the branch's `interpretation` (if it has one).

In [15]:
tree["fNtrack"].interpretation

asdtype('>i4')

In [16]:
tree["fTemperature"].interpretation

asdtype('>f4', 'float64')

In [17]:
tree["fMatrix[4][4]"].interpretation

asdtype("('>f4', (4, 4))", "('<f8', (4, 4))")

In [18]:
tree["fTracks.fMass2"].interpretation

asjagged(asfloat16(0.0, 0.0, 8, dtype([('exponent', 'u1'), ('mantissa', '>u2')]), dtype('float32')))

In [19]:
tree["fTracks.fCharge"].interpretation

asjagged(asdouble32(-1.0, 1.0, 2, dtype('>u4'), dtype('float64')))

In [20]:
tree["fH"].interpretation

asgenobj(TH1F)

If a branch has no `interpretation`, it can't be read. Either it's a no-data branch (exists just for structure) or it's an instance of uproot's incompleteness.

In [21]:
print(tree["fTracks.fPointValue"].interpretation)   # as of April 2019, this one has no interpretation

None


The bytes can be read and even divided along entry boundaries, but we don't yet know how to turn the bytes into an array.

In [22]:
uproot.asdebug

asjagged(asdtype('uint8'))

In [23]:
tree["fTracks.fPointValue"].array(uproot.asdebug)

<JaggedArray [[1 85 85 ... 170 170 170] [0 1 85 ... 85 85 85] [0 1 85 ... 85 85 0] ... [0 1 85 ... 170 170 170] [0 0 1 ... 170 170 170] [1 85 85 ... 170 170 170]] at 0x7f1ad0292240>

Complex classes are generated based on the ROOT file's self-describing streamers, but they aren't necessarily fast to read (more Python than Numpy).

In [24]:
tree["fH"].interpretation

asgenobj(TH1F)

In [25]:
histograms = tree["fH"].array()
histograms

<ObjectArray [<b'TH1F' b'hstat' 0x7f1add108bd8> <b'TH1F' b'hstat' 0x7f1add108278> <b'TH1F' b'hstat' 0x7f1add126138> ... <b'TH1F' b'hstat' 0x7f1ad01fc818> <b'TH1F' b'hstat' 0x7f1ad0322ae8> <b'TH1F' b'hstat' 0x7f1add1264a8>] at 0x7f1add11fc50>

In [27]:
[x for x in dir(histograms[0]) if x.startswith("_f")]

['_fBarOffset',
 '_fBarWidth',
 '_fBinStatErrOpt',
 '_fBuffer',
 '_fBufferSize',
 '_fContour',
 '_fEntries',
 '_fFillColor',
 '_fFillStyle',
 '_fFunctions',
 '_fLineColor',
 '_fLineStyle',
 '_fLineWidth',
 '_fMarkerColor',
 '_fMarkerSize',
 '_fMarkerStyle',
 '_fMaximum',
 '_fMinimum',
 '_fName',
 '_fNcells',
 '_fNormFactor',
 '_fOption',
 '_fSumw2',
 '_fTitle',
 '_fTsumw',
 '_fTsumw2',
 '_fTsumwx',
 '_fTsumwx2',
 '_fXaxis',
 '_fYaxis',
 '_fZaxis',
 '_fields',
 '_format',
 '_format1',
 '_format2',
 '_format3',
 '_format4']

## Fitting into memory constraints

Restricting the range of entries avoids reading too many baskets (chunks on disk).

In [28]:
tree.numentries

1000

In [29]:
tree["fMatrix[4][4]"].numbaskets

5

In [30]:
tree["fMatrix[4][4]"].array(entrystart=600, entrystop=800)

array([[[-1.48842251,  0.22893755,  0.73481667,  0.        ],
        [-0.81405222,  0.32352245,  3.30477071,  0.        ],
        [ 1.07104862,  1.1267221 ,  1.35107005,  0.        ],
        [ 0.        ,  0.        ,  0.        ,  0.        ]],

       [[ 0.72956365,  0.36551571, -0.92489249,  0.        ],
        [-0.08615459,  0.70054299,  0.84200442,  0.        ],
        [-0.27986413,  1.12187469,  2.70830393,  0.        ],
        [ 0.        ,  0.        ,  0.        ,  0.        ]],

       [[ 1.50197184,  0.69582516, -0.31665999,  0.        ],
        [-0.70513338,  1.77296209,  1.1221925 ,  0.        ],
        [-1.39828825,  0.46608233,  3.8889432 ,  0.        ],
        [ 0.        ,  0.        ,  0.        ,  0.        ]],

       ...,

       [[-0.44362187,  0.12667903,  0.78897899,  0.        ],
        [-0.55786729,  0.78306931,  1.66213036,  0.        ],
        [ 1.32299483,  4.01059055,  5.46123695,  0.        ],
        [ 0.        ,  0.        ,  0.        ,  0.

Typically, you'd want to read chunk of entries from all interesting branches, do some work, then move on to the next chunk: use `iterate`.

In [31]:
import numpy
for arrays in tree.iterate(["fTracks.fPx", "fTracks.fPy"], entrysteps=300):
    mag = numpy.sqrt(arrays[b"fTracks.fPx"]**2 + arrays[b"fTracks.fPy"]**2)
    print(len(mag), mag[0][0])

300 2.1687002
300 1.9124396
300 0.6829921
100 0.81746


The same for a set of files is `uproot.iterate` (supply file names with wildcards and tree name).

In [32]:
# no wildcards for XRootD and HTTP
filenames = ["http://scikit-hep.org/uproot/examples/HZZ" + x + ".root" for x in ["", "-zlib", "-lz4", "-lzma"]]
for arrays in uproot.iterate(filenames, "events", ["Muon_Px", "Muon_Py"]):
    mag = numpy.sqrt(arrays[b"Muon_Px"]**2 + arrays[b"Muon_Py"]**2)
    print(len(mag), mag[1][0])

2231 24.417913
190 38.49425
2231 24.417913
190 38.49425
2231 24.417913
190 38.49425
2231 24.417913
190 38.49425


## Encodings, outputtypes, and Pandas

In the previous examples, `tree.arrays` returns a dict of arrays. Branch names have no encoding, so the keys of these dicts are bytestrings (a little annoying in Python 3). Here are some things you can do about that.

In [33]:
arrays = tree.arrays(["fTracks.fPx", "fTracks.fPy"], namedecode="utf-8")
arrays

{'fTracks.fPx': <JaggedArray [[0.8419713973999023 -0.7517185211181641 -1.2051571607589722 ... 0.13719533383846283 -0.6462298035621643 -0.9517862200737] [1.604776382446289 -1.0758386850357056 -1.3932744264602661 ... 0.22056634724140167 -0.6132166981697083 -0.6215075850486755] [-0.07160589098930359 0.6618200540542603 -1.2717599868774414 ... 1.976452350616455 0.5097638368606567 -1.3322412967681885] ... [1.3563624620437622 -0.5129222869873047 -0.2272467464208603 ... 0.1329408884048462 0.047294143587350845 0.37788328528404236] [-0.15198972821235657 0.8554615378379822 0.5253331065177917 ... -0.47042855620384216 -1.096403956413269 1.0178945064544678] [-0.5756134986877441 1.0133289098739624 2.032027006149292 ... 1.3431921005249023 0.6243281364440918 -0.4790661036968231]] at 0x7f1ad02a29e8>,
 'fTracks.fPy': <JaggedArray [[-1.998585820198059 -1.0374469757080078 0.43163737654685974 ... -0.6108384728431702 1.6723541021347046 -0.5587738156318665] [-1.4657866954803467 0.9896608591079712 -0.403286218

In [34]:
px, py = tree.arrays(["fTracks.fPx", "fTracks.fPy"], outputtype=tuple)
print(px)
print(py)

[[0.8419713973999023 -0.7517185211181641 -1.2051571607589722 ... 0.13719533383846283 -0.6462298035621643 -0.9517862200737] [1.604776382446289 -1.0758386850357056 -1.3932744264602661 ... 0.22056634724140167 -0.6132166981697083 -0.6215075850486755] [-0.07160589098930359 0.6618200540542603 -1.2717599868774414 ... 1.976452350616455 0.5097638368606567 -1.3322412967681885] ... [1.3563624620437622 -0.5129222869873047 -0.2272467464208603 ... 0.1329408884048462 0.047294143587350845 0.37788328528404236] [-0.15198972821235657 0.8554615378379822 0.5253331065177917 ... -0.47042855620384216 -1.096403956413269 1.0178945064544678] [-0.5756134986877441 1.0133289098739624 2.032027006149292 ... 1.3431921005249023 0.6243281364440918 -0.4790661036968231]]
[[-1.998585820198059 -1.0374469757080078 0.43163737654685974 ... -0.6108384728431702 1.6723541021347046 -0.5587738156318665] [-1.4657866954803467 0.9896608591079712 -0.4032862186431885 ... -1.4356743097305298 0.013640280812978745 -1.5964593887329102] [-0.

In [35]:
import collections
arrays = tree.arrays(["fNtrack", "fNseg", "fNvertex"], outputtype=collections.namedtuple)
print(arrays.fNtrack[:5], arrays.fNseg[:5], arrays.fNvertex[:5])

[600 604 603 594 595] [6000 6029 6019 5923 5949] [19 13 14  6 14]


In [39]:
import pandas
tree.arrays(["fTracks.fP*"], outputtype=pandas.DataFrame)  # , flatten=True)

Unnamed: 0_level_0,fTracks.fPx,fTracks.fPy,fTracks.fPz
entry,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,"[0.8419714, -0.7517185, -1.2051572, -0.427116,...","[-1.9985858, -1.037447, 0.43163738, 1.1485726,...","[2.1687002, 1.2811624, 1.2801229, 1.2254171, 0..."
1,"[1.6047764, -1.0758387, -1.3932744, 1.3841597,...","[-1.4657867, 0.98966086, -0.40328622, 2.375182...","[2.1734393, 1.4617994, 1.4504666, 2.7490704, 1..."
2,"[-0.07160589, 0.66182005, -1.27176, 0.1688395,...","[-0.96637154, 0.63050216, 0.80289334, -1.44528...","[0.96902084, 0.91407806, 1.5039984, 1.4551162,..."
3,"[-0.14304866, -0.72015625, -0.26054904, -1.597...","[-0.52942014, -0.42783213, 0.13375106, -0.6989...","[0.5484055, 0.83765465, 0.29287395, 1.7436261,..."
4,"[-0.35032487, 0.25517753, 0.81522655, 3.515412...","[-0.36952665, -0.07806556, 0.8490566, 0.163935...","[0.50919294, 0.26685166, 1.1770691, 3.519233, ..."
5,"[0.9705175, -0.2339002, -0.50332534, 0.3029228...","[0.3593279, -0.03988636, -0.0053454647, -1.029...","[1.0349014, 0.23727669, 0.5033537, 1.0735381, ..."
6,"[-1.2503744, 0.060332876, -1.322312, -0.742683...","[-0.89010864, -0.13585812, 0.27518225, 0.98975...","[1.5348387, 0.14865223, 1.3506422, 1.2374127, ..."
7,"[-1.0393323, -0.8376865, 0.23049998, -1.224901...","[-0.47842094, -1.1893959, -1.1487722, 0.992551...","[1.1441582, 1.4547788, 1.1716689, 1.5765604, 1..."
8,"[1.8042065, -1.4835553, -0.48568684, -0.218936...","[-0.3811606, 0.27416706, -0.33988178, -0.40725...","[1.8440294, 1.5086763, 0.59279954, 0.46237206,..."
9,"[-1.0214751, -1.0191854, -1.0693055, -0.221288...","[0.23385774, -1.3407346, -0.39071348, 0.367159...","[1.0479031, 1.6841342, 1.1384513, 0.42868945, ..."


If you're outputting to Pandas, you probably want to `namedecode` and `flatten`, so there are `tree.pandas.df`, `tree.pandas.iterate` methods and an `uproot.pandas.iterate` function for convenience.

In [40]:
filenames = "http://scikit-hep.org/uproot/examples/HZZ.root"
for df in uproot.pandas.iterate(filenames, "events", ["MET_p*", "Muon_P*"], entrysteps=1000):
    print(df)

                   MET_px     MET_py     Muon_Px     Muon_Py     Muon_Pz
entry subentry                                                          
0     0          5.912771   2.563633  -52.899456  -11.654672   -8.160793
      1          5.912771   2.563633   37.737782    0.693474  -11.307582
1     0         24.765203 -16.349110   -0.816459  -24.404259   20.199968
2     0        -25.785088  16.237131   48.987831  -21.723139   11.168285
      1        -25.785088  16.237131    0.827567   29.800508   36.965191
3     0          8.619896 -22.786547   22.088331  -85.835464  403.848450
      1          8.619896 -22.786547   76.691917  -13.956494  335.094208
4     0          5.393139  -1.310052   45.171322   67.248787  -89.695732
      1          5.393139  -1.310052   39.750957   25.403667   20.115053
5     0         -3.759475 -19.417021    9.228110   40.554379  -14.642164
      1         -3.759475 -19.417021   -5.793715  -30.295189   42.954376
6     0         23.962149  -9.049156   12.538717  -

## Caching

Uproot does not cache the arrays that you read (except raw data in HTTP and XRootD transfers). If you pass through the same data more than once, it might pay to cache it.

Any dict-like object may be used as a cache. Simplest case: a real dict (keep forever cache).

In [41]:
cache = {}
tree.arrays("fH", cache=cache)
list(cache.keys())

['AAE/Inm8OqIR6bsCAwGowL7v;T;fH;_variable(asjagged(asdtype(Lu1(),Lu1())),TH1F);0-1000']

In [42]:
tree.arrays("fH", cache=cache)   # gets it from the dict, not the file

{b'fH': <ObjectArray [<b'TH1F' b'hstat' 0x7f19a40ac958> <b'TH1F' b'hstat' 0x7f19a40ac4f8> <b'TH1F' b'hstat' 0x7f19a40aca48> ... <b'TH1F' b'hstat' 0x7f19a40accc8> <b'TH1F' b'hstat' 0x7f19a40aca48> <b'TH1F' b'hstat' 0x7f19a40ac958>] at 0x7f19a409acf8>}

To put an upper limit on memory use, use an `ArrayCache` (which evicts the least recently accessed).

In [43]:
cache = uproot.cache.ArrayCache(limitbytes=1024**3)   # 1 GB
tree.arrays("fH", cache=cache)
list(cache.keys())

['AAE/Inm8OqIR6bsCAwGowL7v;T;fH;_variable(asjagged(asdtype(Lu1(),Lu1())),TH1F);0-1000']

## Parallel processing

In rare cases (e.g. dominated by LZMA decompression), it can be advantageous to read the data in parallel. If you're dominated by processing, just split up your job.

In [44]:
from concurrent.futures import ThreadPoolExecutor

executor = ThreadPoolExecutor(4)    # split work into 4 threads

arrays = tree.arrays(["fTracks.fP*"], executor=executor, blocking=False)
arrays

<function uproot.tree.TTreeMethods.arrays.<locals>.wait()>

The optional `blocking=False` argument means "return a `wait` function." Reading and decompressing continue while you do other things; calling the function returns the result, waiting if necessary.

In [45]:
arrays()

{b'fTracks.fPx': <JaggedArray [[0.8419713973999023 -0.7517185211181641 -1.2051571607589722 ... 0.13719533383846283 -0.6462298035621643 -0.9517862200737] [1.604776382446289 -1.0758386850357056 -1.3932744264602661 ... 0.22056634724140167 -0.6132166981697083 -0.6215075850486755] [-0.07160589098930359 0.6618200540542603 -1.2717599868774414 ... 1.976452350616455 0.5097638368606567 -1.3322412967681885] ... [1.3563624620437622 -0.5129222869873047 -0.2272467464208603 ... 0.1329408884048462 0.047294143587350845 0.37788328528404236] [-0.15198972821235657 0.8554615378379822 0.5253331065177917 ... -0.47042855620384216 -1.096403956413269 1.0178945064544678] [-0.5756134986877441 1.0133289098739624 2.032027006149292 ... 1.3431921005249023 0.6243281364440918 -0.4790661036968231]] at 0x7f19845590f0>,
 b'fTracks.fPy': <JaggedArray [[-1.998585820198059 -1.0374469757080078 0.43163737654685974 ... -0.6108384728431702 1.6723541021347046 -0.5587738156318665] [-1.4657866954803467 0.9896608591079712 -0.4032862

## Lazy evaluation

Another common pattern is lazy evaluation: get an array-like object and only read/decompress when you access it. If you supply an `ArrayCache`, you can also limit its memory use.

In [46]:
arrays = tree.lazyarrays(["fTracks.fP*"], namedecode="utf-8")
arrays

{'fTracks.fPx': <LazyArray 'fTracks.fPx' at 7f1984559320>,
 'fTracks.fPy': <LazyArray 'fTracks.fPy' at 7f1984562f98>,
 'fTracks.fPz': <LazyArray 'fTracks.fPz' at 7f1984562f60>}

In [47]:
arrays["fTracks.fPx"][:10]   # now it reads from the first basket

array([array([ 8.4197140e-01, -7.5171852e-01, -1.2051572e+00, -4.2711601e-01,
        4.8118811e-03,  8.6452603e-01, -3.7308547e-01,  9.5344178e-02,
       -8.9896351e-01, -2.3916152e+00, -1.0216956e-01, -4.4974676e-01,
        2.1311782e-01, -5.5862206e-01, -2.9856989e-01,  5.8155060e-01,
       -2.3101001e+00, -1.3299326e+00,  1.2919817e+00,  6.6530263e-01,
       -5.6052423e-01,  9.3061823e-01,  9.2848825e-01, -2.8553912e-01,
       -7.9315138e-01,  2.7008659e-01, -6.1491287e-01,  2.7956313e-01,
        1.5337652e+00, -4.0093550e-01, -7.5571841e-01, -3.6551660e-01,
        1.1955417e+00,  8.5257524e-01,  4.4540587e-01,  1.4535275e-01,
        2.7430198e+00, -1.7702219e+00, -7.9603024e-02, -3.2680574e-01,
       -1.2933043e+00, -4.7682145e-01,  1.5263202e+00,  9.0846038e-01,
       -1.8990219e+00, -3.5483268e-01, -2.0330288e+00, -1.2489893e+00,
        5.5240005e-01, -7.2525185e-01,  9.9672735e-01, -3.4164926e-01,
        5.8170229e-02,  1.8998255e+00, -3.8737425e-01, -4.7478389e-02,

In [48]:
arrays["fTracks.fPx"][-10:]   # now it reads from the last basket

array([array([-1.20912433e+00,  3.19892354e-02, -1.31226468e+00,  7.47482121e-01,
       -1.11453080e+00, -3.81207407e-01, -1.29521573e+00, -7.15954229e-02,
        1.99054015e+00,  5.00470877e-01,  2.68100142e-01, -5.48259795e-01,
       -4.18161154e-01,  2.24478111e-01,  4.97956187e-01, -9.16865945e-01,
        6.06663108e-01,  2.00942874e+00, -1.13250935e+00,  8.85253727e-01,
       -6.88559353e-01,  6.94867432e-01, -1.89073741e+00, -2.28581831e-01,
        1.37090400e-01,  7.56101087e-02,  5.63866019e-01, -1.58229515e-01,
       -3.31018895e-01, -2.27573514e+00, -3.96576613e-01, -2.08722568e+00,
        2.43174389e-01, -7.06911087e-02, -6.19469061e-02,  4.20583665e-01,
        6.79735005e-01, -2.27353752e-01,  5.50895393e-01, -9.02295470e-01,
       -1.13958704e+00, -1.88105679e+00,  1.13187647e+00,  1.30694747e+00,
        1.40683675e+00,  2.16116711e-01,  2.25246940e-02, -7.68851817e-01,
       -1.37073743e+00,  1.99914336e+00,  1.07138693e+00, -1.71644723e+00,
       -3.91767204

## Dask (parallel processing)

Dask is a parallel processing framework based on lazy evaluation. Similar functions produce Dask arrays and Dask DataFrames.

In [49]:
filenames = "http://scikit-hep.org/uproot/examples/HZZ.root"
arrays = uproot.daskarrays(filenames, "events", ["MET_p*", "Muon_P*"])
arrays

{b'MET_px': dask.array<array, shape=(2421,), dtype=float32, chunksize=(2421,)>,
 b'MET_py': dask.array<array, shape=(2421,), dtype=float32, chunksize=(2421,)>,
 b'Muon_Px': dask.array<array, shape=(2421,), dtype=object, chunksize=(2231,)>,
 b'Muon_Py': dask.array<array, shape=(2421,), dtype=object, chunksize=(2231,)>,
 b'Muon_Pz': dask.array<array, shape=(2421,), dtype=object, chunksize=(2231,)>}

In [50]:
df = uproot.daskframe(filenames, "events", ["MET_p*", "Muon_P*"])
df

Unnamed: 0_level_0,MET_px,MET_py,Muon_Px,Muon_Py,Muon_Pz
npartitions=2,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
0,float32,float32,object,object,object
2231,...,...,...,...,...
2420,...,...,...,...,...
