In [1]:
#Set up notebook if awkward array not isntalled
!pip install awkward



# Awkward Array quickstart
From the awkward array documentation found [here](https://awkward-array.org/doc/main/), "Awkward Array is a library for nested, variable-sized data, including arbitrary-length lists, records, mixed types, and missing data, using NumPy-like idioms." Awkward array functions make columnar analysis straightforward to implement. Columnar analysis is advantageous to row-wise analysis as it drastically reduces run time by utilizing memory more effectively. In this section, we will discuss building awkward arrays and some simple operations. 

## Creating an Awkward Array
First import `awkward` and create a simple jagged array.

In [5]:
import awkward as ak
array = ak.Array([[5,4,4],[2],[3,5]])
array

Awkward array entries can be read the same way as a python list or numpy array. Lets read the 3rd row's 2nd entry as an example.

In [3]:
print(array[2,1])

5


# Matrix Manipulations
Let's perform some simple manipulations with this array. Suppose we want to multiply the values in a row together. In awkward array this is accomplished by calling `ak.prod()` and defining which axis to multiply along `axis = 1`.

In [8]:
row_product = ak.prod(array, axis=1)
print(row_product)

[80, 2, 15]


If you instead set `axis=0`, it will multiply the scale factors along the columns of this array. This operation has reduced the dimension of the array from 2 to 1.

We can similarly sum along a given axis by calling `ak.sum()`.

In [10]:
sum = ak.sum(array, axis=1)
print(sum)

[13, 2, 8]


# Boolean Operations
Awkward array can also perform boolean operations on matrices. This is often done when evaluating selection criteria. Let's obtain an array of booleans containing the information of which entries are $> 3$.

In [14]:
selection = array > 3
print(selection)

[[True, True, True], [False], [False, True]]


With `selection`, we can obtain information about which rows contain at least one entry satisfying the criterion by calling `ak.any()`.

In [15]:
good_row = ak.any(selection, axis = 1)
print(good_row)

[True, False, True]


`ak.any()` is somewhat analogous to an OR operation, and the method `ak.all()` is analogous to an AND operation. 

In [16]:
print(ak.all(selection, axis = 1))

[True, False, False]


# Masking (Applying Cuts)
Now consider that we may want to remove entries that dont match our criterion. This is done by masking the array. W pass an array of booleans that define what entries to keep or exclude. Masking can be done at any depth of the array. Let's first 'cut' out all the entries less than 3 and compare the original array with the new cut array.

In [17]:
# Recalculate masking array
selection = array > 3
#Pass cut as an array of booleans
masked_array = scale_factors[selection]
print("Original array", array)
print("Cut array", masked_array)

Original array [[5, 4, 4], [2], [3, 5]]
Cut array [[5, 4, 4], [], [5]]


The entries with value less than or equal to 3 have been removed from the array. If instead we want to remove rows without one entry greater than 3, we can do.

In [18]:
rows_pass = ak.any(selection, axis = 1)
masked_array = array[rows_pass]
print(masked_array)

[[5, 4, 4], [3, 5]]


# Conclusion
Awkward array methods allow one to vastly speed up their analysis. This won't be realized in arrays of this size, but for TTrees with 100k+ events this can drastically reduce run time. In the next section, we will navigate a TFile in `uproot` and perform some basic analysis.