# uproot 3 features

This notebook is primarily about what you can do in uproot 3 that you couldn't do in uproot 2. See the [main tutorial](https://mybinder.org/v2/gh/scikit-hep/uproot/master?filepath=binder%2Ftutorial.ipynb) for a basic introduction to uproot.

The main differences are:

   * **more modularization:** uproot is concerned purely with file I/O; functions for interacting with histograms and physics objects like TLorentzVectors have been moved to [uproot-methods](https://github.com/scikit-hep/uproot-methods). Nested data (jagged arrays) have moved to the new [awkward-array](https://github.com/scikit-hep/awkward-array) library. Also, uproot no longer has its own implementation of dict-like caches— instead, we use the third-party [cachetools](https://cachetools.readthedocs.io/en/latest/) library. As a user, this means that you can use whichever combination of versions you need: e.g. bleeding-edge object methods with a stable file I/O.
   * **jagged and object array operations:** now when you read data with structure or class definitions, you can perform Numpy-like operations on them. More on that below.
   * **writing ROOT files:** basic support for writing ROOT files has begun, with more on the way.

Let's get started!

In [35]:
import math
import numpy
import uproot
events = uproot.open("../tests/samples/HZZ-objects.root")["events"]

## Object arrays

Since uproot 2, you could read TTree branches describing class objects (defined by ROOT's streamers). This includes STL collections, TVectors, and even histograms— if you should want to store histograms in a TTree. However, it also meant leaving the high-speed Numpy world for slow pure-Python code. In uproot 3, fixed-size objects are interpreted as `awkward.ObjectArrays`, which do most operations in a vectorized way across the fields of the data.

In [2]:
MET = events.array("MET")
MET

<ArrayMethods [TVector2(5.9128, 2.5636) TVector2(24.765, -16.349) TVector2(-25.785, 16.237) ... TVector2(18.102, 50.291), TVector2(79.875, -52.351), TVector2(19.714, -3.5954)] at 7f1f182baef0>

The array above appears to contain `TVector2` objects. However, it only creates `TVector2` objects on demand (such as in the print-out). What we have actually loaded is the individual fields (`fX` and `fY`) as separate Numpy arrays.

In [6]:
MET.columns

['fX', 'fY']

In [7]:
MET["fX"]

array([  5.91277122,  24.76520348, -25.78508759, ...,  18.10164642,
        79.87519073,  19.71374893])

In [8]:
MET["fY"]

array([  2.5636332 , -16.34910965,  16.23713112, ...,  50.29071808,
       -52.35145187,  -3.59541821])

Although you can pull out any element as a `TVector2` Python object (creating it on demand),

In [37]:
MET[2], math.sqrt(MET[2].x**2 + MET[2].y**2)

(TVector2(-25.785, 16.237), 30.471546871754967)

or (gasp!) iterate over them in a Python for loop,

In [36]:
for i, met in enumerate(MET):
    print(met, math.sqrt(met.x**2 + met.y**2))
    if i > 10:
        break

TVector2(5.9128, 2.5636) 6.444616261735072
TVector2(24.765, -16.349) 29.67505163503274
TVector2(-25.785, 16.237) 30.471546871754967
TVector2(8.6199, -22.787) 24.362457116812326
TVector2(5.3931, -1.3101) 5.5499715598881885
TVector2(-3.7595, -19.417) 19.777622472715098
TVector2(23.962, -9.0492) 25.61389850143991
TVector2(-57.533, -20.488) 61.072343275467276
TVector2(42.416, -94.351) 103.44669393597131
TVector2(-1.9145, -23.963) 24.039388019581505
TVector2(19.71, 4.6455) 20.250114726294257
TVector2(-35.538, -14.754) 38.47893782676562


it's much faster to do operations across the whole array.

In [38]:
numpy.sqrt(MET.x**2 + MET.y**2)

array([ 6.44461626, 29.67505164, 30.47154687, ..., 53.4492837 ,
       95.50246389, 20.03893533])

Physics objects like `TVector2` have many methods, which are computed in Python if applied to a Python object but computed in Numpy if applied to a whole array.

In [14]:
MET[2].phi()

2.5796134389921948

In [15]:
MET.phi()

array([ 0.40911176, -0.58348763,  2.57961344, ...,  1.22529377,
       -0.58017296, -0.1803985 ])

Numpy operations apply elementwise to this array of `TVector2`, just as they would to Numpy arrays. Addition and multiplication are handled in physically meaningful ways.

In [21]:
MET + MET     # or numpy.add(MET, MET)

<ArrayMethods [TVector2(11.826, 5.1273) TVector2(49.53, -32.698) TVector2(-51.57, 32.474) ... TVector2(36.203, 100.58), TVector2(159.75, -104.7), TVector2(39.427, -7.1908)] at 7f1efd133128>

In [18]:
MET * 2       # same output

<ArrayMethods [TVector2(11.826, 5.1273) TVector2(49.53, -32.698) TVector2(-51.57, 32.474) ... TVector2(36.203, 100.58), TVector2(159.75, -104.7), TVector2(39.427, -7.1908)] at 7f1efd69dba8>

In [20]:
try:
    MET + 2
except TypeError:
    print("You can't add scalars to vectors (or multiply vectors to vectors).")

You can't add scalars to vectors (or multiply vectors to vectors).


In [22]:
MET.dot(MET)

array([  41.53307876,  880.60868954,  928.51516876, ..., 2856.8259281 ,
       9120.72060822,  401.55892909])

In [23]:
MET.mag()

array([ 6.44461626, 29.67505164, 30.47154687, ..., 53.4492837 ,
       95.50246389, 20.03893533])

In [24]:
MET.delta_phi(MET)

array([0., 0., 0., ..., 0., 0., 0.])

In [29]:
MET.isparallel(MET)

array([ True,  True,  True, ...,  True,  True,  True])

## Jagged array operations

Just as in uproot 2, array-valued and `std::vector`-valued branches are presented as `JaggedArrays`. Unlike uproot 2, these are now being developed in an external library, awkward-array, with a rich set of operations that extend Numpy's built-in rules for broadcasting and indexing arrays.

In [27]:
muoniso = events.array("muoniso")
muoniso

<JaggedArray [[4.2001534 2.1510613] [2.1880474] [1.4128217 3.3835042] ... [3.7629452], [0.5508107], [0.]] at 7f1efd133710>

Just as with object arrays, this is not a array of thousands of one- and two-element arrays; this is efficiently stored as contiguous arrays that generate subarrays on demand.

In [40]:
muoniso[2]

array([1.4128217, 3.3835042], dtype=float32)

In [43]:
for i, event_muoniso in enumerate(muoniso):
    for particle_muoniso in event_muoniso:
        print(particle_muoniso, end="\t")
    print()
    if i > 10:
        break

4.2001534	2.1510613	
2.1880474	
1.4128217	3.3835042	
2.7284882	0.5522966	
0.0	0.8563976	
0.0	1.4929442	
0.6231756	0.0	
2.4025257	
0.0	0.0	
0.0	1.7698176	
2.0015755	0.6041591	
0.0	0.76338214	


In [44]:
muoniso.content

array([4.2001534, 2.1510613, 2.1880474, ..., 3.7629452, 0.5508107,
       0.       ], dtype=float32)

In [46]:
muoniso.counts

array([2, 1, 2, ..., 1, 1, 1])

What's new is that you can manipulate jagged arrays without resorting to for loops. This makes for more succinct code (a few characters, rather than the indented body of a for loop), but it's also much faster because it is implemented in Numpy (vectorized; contiguous memory access).

In the following, we multiply each muon isolation variable by the muon charge (just an example, not physically meaningful) and maintain the structure of which muon belongs to which event.

In [48]:
muoniso * events.array("muonq")

<JaggedArray [[ 4.20015335 -2.1510613 ] [2.18804741] [ 1.41282165 -3.38350415] ... [-3.76294518], [-0.55081069], [-0.]] at 7f1efd09afd0>

Naturally, the jagged structure of the two arrays must match.

In [50]:
try:
    muoniso + events.array("electroniso")
except IndexError:
    print("Not all events have the same number of electrons as muons.")

Not all events have the same number of electrons as muons.


But if you operate on a jagged array and a 