## Awkward arrays: jaggedness and more

In [04-ttree-data-pyroot.ipynb](04-ttree-data-pyroot.ipynb), we saw some examples of jagged and object data. Uproot uses a package called "awkward" to deal with them.

This section focuses on various kinds of awkward arrays and what you can do with them (including making them less awkward: into pure Numpy arrays!).

Everything that comes out of uproot is a Numpy array, a `JaggedArray`, a `Table`, an `ObjectArray`, or some combination.

In [6]:
import uproot
a = uproot.open("http://scikit-hep.org/uproot/examples/HZZ.root")["events"].array("Muon_Px")
a

<JaggedArray [[-52.89945602416992 37.7377815246582] [-0.8164593577384949] [48.987831115722656 0.8275666832923889] ... [-29.756786346435547] [1.1418697834014893] [23.913206100463867]] at 0x703898643ef0>

In [12]:
type(a)

awkward.array.jagged.JaggedArray

In [13]:
type(a.content)

numpy.ndarray

In [7]:
b = uproot.open("http://scikit-hep.org/uproot/examples/HZZ-objects.root")["events"].array("muonp4")
b

<JaggedArrayMethods [[TLorentzVector(-52.899, -11.655, -8.1608, 54.779) TLorentzVector(37.738, 0.69347, -11.308, 39.402)] [TLorentzVector(-0.81646, -24.404, 20.2, 31.69)] [TLorentzVector(48.988, -21.723, 11.168, 54.74) TLorentzVector(0.82757, 29.801, 36.965, 47.489)] ... [TLorentzVector(-29.757, -15.304, -52.664, 62.395)] [TLorentzVector(1.1419, 63.61, 162.18, 174.21)] [TLorentzVector(23.913, -35.665, 54.719, 69.556)]] at 0x7038985c26a0>

In [14]:
type(b)

awkward.array.objects.JaggedArrayMethods

In [15]:
type(b.content)

awkward.array.objects.ObjectArrayMethods

In [16]:
type(b.content.content)

awkward.array.table.Table

In [18]:
type(b.content.content.contents["fX"])

numpy.ndarray

If ROOT managed to "split" the objects into columns, then the data are in a columnar state: each attribute represented by a contiguous array.

In [19]:
b.content.content.contents["fX"]

array([-52.89945602,  37.73778152,  -0.81645936, ..., -29.75678635,
         1.14186978,  23.9132061 ])

In [20]:
b.content.content.contents["fY"]

array([-11.65467167,   0.69347358, -24.40425873, ..., -15.30385876,
        63.60956955, -35.66507721])

Even if the data are "unsplit," they're presented as a bag of bytes and a Python function to interpret them, as an `ObjectArray`.

In [23]:
c = uproot.open("http://scikit-hep.org/uproot/examples/Event.root")["T"].array("fH")
c

<ObjectArray [<b'TH1F' b'hstat' 0x70389b138c78> <b'TH1F' b'hstat' 0x70389b138958> <b'TH1F' b'hstat' 0x70389832ed68> ... <b'TH1F' b'hstat' 0x703898356c28> <b'TH1F' b'hstat' 0x703898356e08> <b'TH1F' b'hstat' 0x70389832e958>] at 0x7038981ef710>

In [25]:
c.content     # bag of bytes

<JaggedArray [[64 0 3 ... 0 0 0] [64 0 3 ... 0 0 0] [64 0 3 ... 0 0 0] ... [64 0 3 ... 0 0 0] [64 0 3 ... 0 0 0] [64 0 3 ... 0 0 0]] at 0x7038981eff98>

In [26]:
c.generator   # interpretation class

TH1F

In [31]:
c[500].show()

                0                                                           12.6
                +--------------------------------------------------------------+
[-inf, 0)    0  |                                                              |
[0, 0.01)    10 |*************************************************             |
[0.01, 0.02) 12 |***********************************************************   |
[0.02, 0.03) 5  |*************************                                     |
[0.03, 0.04) 6  |******************************                                |
[0.04, 0.05) 5  |*************************                                     |
[0.05, 0.06) 8  |***************************************                       |
[0.06, 0.07) 7  |**********************************                            |
[0.07, 0.08) 5  |*************************                                     |
[0.08, 0.09) 4  |********************                                          |
[0.09, 0.1)  2  |********** 

A `JaggedArray` is a list of unequal-sized sublists, encoded as a continuous array of `content` divided up by an array of `offsets`.

In [32]:
import awkward
x = awkward.fromiter([[1.1, 2.2, 3.3], [], [4.4, 5.5]])
x

<JaggedArray [[1.1 2.2 3.3] [] [4.4 5.5]] at 0x70389820d1d0>

In [33]:
x.content

array([1.1, 2.2, 3.3, 4.4, 5.5])

In [34]:
x.offsets

array([0, 3, 3, 5])

A `Table` is an array of `Row` records, encoded as a continuous array of each column in its `contents` dict.

In [35]:
x = awkward.fromiter([{"x": 1, "y": 1.1}, {"x": 2, "y": 2.2}, {"x": 3, "y": 3.3}])
x

<Table [<Row 0> <Row 1> <Row 2>] at 0x70389820d898>

In [36]:
x.tolist()

[{'x': 1, 'y': 1.1}, {'x': 2, 'y': 2.2}, {'x': 3, 'y': 3.3}]

In [37]:
x.contents["x"]

array([1, 2, 3])

In [38]:
x.contents["y"]

array([1.1, 2.2, 3.3])

An `ObjectArray` is a virtual array of objects, represented by some array `content` and a `generator` that creates each object on demand.

In [40]:
class Point:
    def __init__(self, x, y):
        self.x = x
        self.y = y
    def __repr__(self):
        return "Point({0}, {1})".format(self.x, self.y)

x = awkward.fromiter([Point(1, 1.1), Point(2, 2.2), Point(3, 3.3)])
x

<ObjectArray [Point(1, 1.1) Point(2, 2.2) Point(3, 3.3)] at 0x70389852c048>

In [41]:
x.content

<Table [<Row 0> <Row 1> <Row 2>] at 0x7038981efeb8>

In [42]:
x.content.contents["x"]

array([1, 2, 3])

In [43]:
x.content.contents["y"]

array([1.1, 2.2, 3.3])

## Jagged operations

As much as possible, awkward arrays act like Numpy arrays. For `JaggedArray`, this means obeying Numpy's slicing rules for multidimensional arrays.

In [45]:
x = awkward.fromiter([[1.1, 2.2, 3.3, 4.4], [5.5, 6.6], [7.7, 8.8, 9.9]])
x

<JaggedArray [[1.1 2.2 3.3 4.4] [5.5 6.6] [7.7 8.8 9.9]] at 0x70389820db38>

In [46]:
# take the first two inner lists
x[:2]

<JaggedArray [[1.1 2.2 3.3 4.4] [5.5 6.6]] at 0x70389820d550>

In [49]:
# take the first two numbers in each inner list
x[:, :2]

<JaggedArray [[1.1 2.2] [5.5 6.6] [7.7 8.8]] at 0x70389820d4a8>

In [48]:
# mask outer lists
x[[True, False, True]]

<JaggedArray [[1.1 2.2 3.3 4.4] [7.7 8.8 9.9]] at 0x70389820d5f8>

In [50]:
# mask inner lists
x[awkward.fromiter([[True, False, True, False], [False, True], [True, True, False]])]

<JaggedArray [[1.1 3.3] [6.6] [7.7 8.8]] at 0x7038981ef940>

Reductions (min, max, sum, ...) turn Numpy arrays into scalars. They turn jagged arrays into Numpy arrays by applying the operation to each inner list.

In [51]:
x

<JaggedArray [[1.1 2.2 3.3 4.4] [5.5 6.6] [7.7 8.8 9.9]] at 0x70389820db38>

In [52]:
x.min()

array([1.1, 5.5, 7.7])

In [53]:
x.max()

array([4.4, 6.6, 9.9])

Empty sublists return the identity element.

In [55]:
x = awkward.fromiter([[1.1, 2.2, 3.3], [], [4.4, 5.5]])
x

<JaggedArray [[1.1 2.2 3.3] [] [4.4 5.5]] at 0x7038981ef390>

In [56]:
x.sum()

array([6.6, 0. , 9.9])

In [57]:
x.max()

array([ 3.3, -inf,  5.5])

There's also an equivalent of `argmin/argmax` that returns jagged arrays of indexes.

In [58]:
x

<JaggedArray [[1.1 2.2 3.3] [] [4.4 5.5]] at 0x7038981ef390>

In [59]:
indexes = x.argmax()
indexes

<JaggedArray [[2] [] [1]] at 0x7038981efe10>

What's this useful for? Maximizing by one attribute and applying to another.

In [60]:
y = awkward.fromiter([[300, 200, 100], [], [500, 400]])
y

<JaggedArray [[300 200 100] [] [500 400]] at 0x7038981ef080>

In [61]:
y[indexes]

<JaggedArray [[100] [] [400]] at 0x7038981ef3c8>

In Numpy, this is "advanced indexing."

Numpy's "universal functions" can be applied to awkward arrays. They apply element-by-element and maintain structure.

In [62]:
x

<JaggedArray [[1.1 2.2 3.3] [] [4.4 5.5]] at 0x7038981ef390>

In [64]:
import numpy
numpy.sqrt(x)

<JaggedArray [[1.0488088481701516 1.4832396974191326 1.816590212458495] [] [2.0976176963403033 2.345207879911715]] at 0x7038981ef0f0>

This allows us to compute things with awkward arrays as we would Numpy arrays—as long as the structure matches.

In [66]:
x + y**2

<JaggedArray [[90001.1 40002.2 10003.3] [] [250004.4 160005.5]] at 0x7038981ef2e8>

Numpy has a concept of "broadcasting," in which an array and a scalar may be operated element-by-element, by duplicating the scalar to match the array.

In [67]:
numpy.array([1.1, 2.2, 3.3, 4.4, 5.5]) + 100

array([101.1, 102.2, 103.3, 104.4, 105.5])

The jagged equivalent of this is broadcasting a Numpy array to match a jagged array:

In [68]:
x

<JaggedArray [[1.1 2.2 3.3] [] [4.4 5.5]] at 0x7038981ef390>

In [69]:
x + numpy.array([100, 200, 300])

<JaggedArray [[101.1 102.2 103.3] [] [304.4 305.5]] at 0x703899e9fef0>

Physics case: consider a jagged array of timing measurements.

In [70]:
times = awkward.fromiter([[4.4, 2.6, 3.5, -0.6], [1.8, 7.4], [], [9.5, 5.2, 8.5]])
times

<JaggedArray [[4.4 2.6 3.5 -0.6] [1.8 7.4] [] [9.5 5.2 8.5]] at 0x703899e9fe80>

Corrections (`t0`) may be applied globally:

In [72]:
times - 0.6

<JaggedArray [[3.8000000000000003 2.0 2.9 -1.2] [1.2000000000000002 6.800000000000001] [] [8.9 4.6000000000000005 7.9]] at 0x703899e9fcf8>

Applied per event:

In [73]:
times - numpy.array([0.6, 1.2, -0.4, 3.3])

<JaggedArray [[3.8000000000000003 2.0 2.9 -1.2] [0.6000000000000001 6.2] [] [6.2 1.9000000000000004 5.2]] at 0x703899e9f9b0>

Or by detector id:

In [74]:
times

<JaggedArray [[4.4 2.6 3.5 -0.6] [1.8 7.4] [] [9.5 5.2 8.5]] at 0x703899e9fe80>

In [75]:
detid = awkward.fromiter([[101, 274, 333, 97], [522, 427], [], [931, 634, 555]])
detid

<JaggedArray [[101 274 333 97] [522 427] [] [931 634 555]] at 0x703899e9f7b8>

In [None]:
lookup = awkward.SparseArray()