# 3. Awkward Array

<br><br><br><br><br>

## What about an array of lists?

In [1]:
import awkward as ak
import numpy as np
import uproot

In [4]:
events = uproot.open("data/HZZ.root:events")
events.show()

name                 | typename                 | interpretation                
---------------------+--------------------------+-------------------------------
NJet                 | int32_t                  | AsDtype('>i4')
Jet_Px               | float[]                  | AsJagged(AsDtype('>f4'))
Jet_Py               | float[]                  | AsJagged(AsDtype('>f4'))
Jet_Pz               | float[]                  | AsJagged(AsDtype('>f4'))
Jet_E                | float[]                  | AsJagged(AsDtype('>f4'))
Jet_btag             | float[]                  | AsJagged(AsDtype('>f4'))
Jet_ID               | bool[]                   | AsJagged(AsDtype('bool'))
NMuon                | int32_t                  | AsDtype('>i4')
Muon_Px              | float[]                  | AsJagged(AsDtype('>f4'))
Muon_Py              | float[]                  | AsJagged(AsDtype('>f4'))
Muon_Pz              | float[]                  | AsJagged(AsDtype('>f4'))
Muon_E               | float[]  

In [5]:
events["Muon_Px"].array()

<Array [[-52.9, 37.7], ... 1.14], [23.9]] type='2421 * var * float32'>

In [7]:
events["Muon_Px"].array(entry_stop=20).tolist()

[[-52.89945602416992, 37.7377815246582],
 [-0.8164593577384949],
 [48.987831115722656, 0.8275666832923889],
 [22.08833122253418, 76.6919174194336],
 [45.171321868896484, 39.75095748901367],
 [9.228110313415527, -5.793715000152588],
 [12.538717269897461, 29.541839599609375],
 [34.883758544921875],
 [-53.16697311401367, 11.491869926452637],
 [-67.01485443115234, -18.118755340576172],
 [15.983028411865234, 34.68440628051758],
 [-70.51190948486328, -38.028743743896484],
 [58.94381332397461],
 [-15.587870597839355],
 [-122.33011627197266, -1.0597527027130127],
 [-46.70415496826172, 39.020023345947266],
 [51.29465866088867, 17.45092010498047],
 [43.28120040893555],
 [-45.92393493652344, 22.549766540527344],
 [43.29360580444336, -33.28158187866211, -4.376191139221191]]

This is what Awkward Array was made for. NumPy's equivalent is cumbersome and inefficient.

In [9]:
jagged_numpy = events["Muon_Px"].array(entry_stop=20, library="np")
jagged_numpy

array([array([-52.899456,  37.73778 ], dtype=float32),
       array([-0.81645936], dtype=float32),
       array([48.98783  ,  0.8275667], dtype=float32),
       array([22.088331, 76.69192 ], dtype=float32),
       array([45.17132 , 39.750957], dtype=float32),
       array([ 9.22811 , -5.793715], dtype=float32),
       array([12.538717, 29.54184 ], dtype=float32),
       array([34.88376], dtype=float32),
       array([-53.166973,  11.49187 ], dtype=float32),
       array([-67.014854, -18.118755], dtype=float32),
       array([15.983028, 34.684406], dtype=float32),
       array([-70.51191 , -38.028744], dtype=float32),
       array([58.943813], dtype=float32),
       array([-15.587871], dtype=float32),
       array([-122.33012  ,   -1.0597527], dtype=float32),
       array([-46.704155,  39.020023], dtype=float32),
       array([51.29466, 17.45092], dtype=float32),
       array([43.2812], dtype=float32),
       array([-45.923935,  22.549767], dtype=float32),
       array([ 43.293606, -33.

What if I want the first item in each list as an array?

In [11]:
np.array([x[0] for x in jagged_numpy])

array([ -52.899456  ,   -0.81645936,   48.98783   ,   22.088331  ,
         45.17132   ,    9.22811   ,   12.538717  ,   34.88376   ,
        -53.166973  ,  -67.014854  ,   15.983028  ,  -70.51191   ,
         58.943813  ,  -15.587871  , -122.33012   ,  -46.704155  ,
         51.29466   ,   43.2812    ,  -45.923935  ,   43.293606  ],
      dtype=float32)

This violates the rule from [1-python-performance.ipynb](1-python-performance.ipynb): don't iterate in Python.

In [12]:
jagged_awkward = events["Muon_Px"].array(entry_stop=20, library="ak")
jagged_awkward

<Array [[-52.9, 37.7], ... 43.3, -33.3, -4.38]] type='20 * var * float32'>

In [13]:
jagged_awkward[:, 0]

<Array [-52.9, -0.816, 49, ... -45.9, 43.3] type='20 * float32'>

<br><br><br><br><br>

### Jaggedness in Pandas

It can be done by putting the distinction between events in the DataFrame index.

In [16]:
events.arrays(filter_name="Muon_*", library="pd")

Unnamed: 0_level_0,Unnamed: 1_level_0,Muon_Px,Muon_Py,Muon_Pz,Muon_E,Muon_Charge,Muon_Iso
entry,subentry,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
0,0,-52.899456,-11.654672,-8.160793,54.779499,1,4.200153
0,1,37.737782,0.693474,-11.307582,39.401695,-1,2.151061
1,0,-0.816459,-24.404259,20.199968,31.690445,1,2.188047
2,0,48.987831,-21.723139,11.168285,54.739788,1,1.412822
2,1,0.827567,29.800508,36.965191,47.488857,-1,3.383504
...,...,...,...,...,...,...,...
2416,0,-39.285824,-14.607491,61.715790,74.602982,-1,1.080880
2417,0,35.067146,-14.150043,160.817917,165.203949,-1,3.427752
2418,0,-29.756786,-15.303859,-52.663750,62.395161,-1,3.762945
2419,0,1.141870,63.609570,162.176315,174.208633,-1,0.550811


But if you want multiple particles, they can't be in the same DataFrame.

(A DataFrame has only one index; how would you relate jet subentry #1 with muon subentry #1?)

In [18]:
dataframes = events.arrays(filter_name="/(Muon_|Jet_).*/", library="pd")

In [19]:
len(dataframes)

2

In [20]:
dataframes[0]

Unnamed: 0_level_0,Unnamed: 1_level_0,Jet_Px,Jet_Py,Jet_Pz,Jet_E,Jet_btag,Jet_ID
entry,subentry,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
1,0,-38.874714,19.863453,-0.894942,44.137363,-1.0,True
3,0,-71.695213,93.571579,196.296432,230.346008,-1.0,True
3,1,36.606369,21.838793,91.666283,101.358841,-1.0,True
3,2,-28.866419,9.320708,51.243221,60.084141,-1.0,True
4,0,3.880162,-75.234055,-359.601624,367.585480,-1.0,True
...,...,...,...,...,...,...,...
2417,0,-33.196457,-59.664749,-29.040150,74.944725,-1.0,True
2417,1,-26.086025,-19.068407,26.774284,42.481457,-1.0,True
2418,0,-3.714818,-37.202377,41.012222,55.950581,-1.0,True
2419,0,-36.361286,10.173571,226.429214,229.577988,-1.0,True


In [21]:
dataframes[1]

Unnamed: 0_level_0,Unnamed: 1_level_0,Muon_Px,Muon_Py,Muon_Pz,Muon_E,Muon_Charge,Muon_Iso
entry,subentry,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
0,0,-52.899456,-11.654672,-8.160793,54.779499,1,4.200153
0,1,37.737782,0.693474,-11.307582,39.401695,-1,2.151061
1,0,-0.816459,-24.404259,20.199968,31.690445,1,2.188047
2,0,48.987831,-21.723139,11.168285,54.739788,1,1.412822
2,1,0.827567,29.800508,36.965191,47.488857,-1,3.383504
...,...,...,...,...,...,...,...
2416,0,-39.285824,-14.607491,61.715790,74.602982,-1,1.080880
2417,0,35.067146,-14.150043,160.817917,165.203949,-1,3.427752
2418,0,-29.756786,-15.303859,-52.663750,62.395161,-1,3.762945
2419,0,1.141870,63.609570,162.176315,174.208633,-1,0.550811


Again, that's why we have Awkward Array.

In [24]:
array = events.arrays(filter_name="/(Muon_|Jet_).*/", library="ak", how="zip")
array

<Array [{Jet: [], Muon: [, ... Iso: 0}]}] type='2421 * {"Jet": var * {"Px": floa...'>

In [25]:
array.Jet

<Array [[], [{Px: -38.9, ... ID: True}], []] type='2421 * var * {"Px": float32, ...'>

In [26]:
array.Jet.Px

<Array [[], [-38.9], ... [-36.4, -15.3], []] type='2421 * var * float32'>

In [27]:
array.Muon

<Array [[{Px: -52.9, Py: -11.7, ... Iso: 0}]] type='2421 * var * {"Px": float32,...'>

In [28]:
array.Muon.Px

<Array [[-52.9, 37.7], ... 1.14], [23.9]] type='2421 * var * float32'>

In [29]:
ak.num(array.Jet), ak.num(array.Muon)

(<Array [0, 1, 0, 3, 2, 2, ... 0, 1, 2, 1, 2, 0] type='2421 * int64'>,
 <Array [2, 1, 2, 2, 2, 2, ... 2, 1, 1, 1, 1, 1] type='2421 * int64'>)

<br><br><br><br><br>

## Awkward Array is a general-purpose library: NumPy-like idioms on JSON-like data

<img src="img/pivarski-one-slide-summary.svg" style="width: 70%">