# PURSUE Python for HEP: Dictionaries and Arrays

## Review of dictionary interface
* In dicitonaries, a key is mapped to some sort of value of arbitrary type. The key can be a string or integer.

In [None]:
some_dict = {
    "word": 1,
    "another word": 2,
    45: 29
}

some_dict["word"]

* You can add new key-value pairs in the following way:

In [None]:
some_dict["cool key"] = "values can be of any type!"

some_dict

* You can obtain the values and keys using the following methods

In [None]:
print(some_dict.values())
print(some_dict.keys())

* And, if you are going to iterate over the dictionary and accesing both simulaneously is something you'd like to do, you can use `.items()`

In [None]:
for key, value in some_dict.items():
    print(key, value)

## A Quick Review of Arrays
* Lists and arrays are similar, but not the same thing in Python. Here are the main differences between then
  * Type and structure
    * Lists can store elements of different data types, are dynamically sized and are implemented as linked lists
    * Numpy arrays can only store the same data type, has a fixed size once created and is implemented as contiguous blocks of memory.
  * Performance
    * Lists < Numpy
    * Numpy makes use of highly optimized compiled code
    * Allows for vectorized operations
  * Features
    * Numpy has a wider range of mathematical functions and operations, supports broadcasting and provides advanced indexing and slicing capabilities
* When running numerical computations involving a collection of numbers, the usual object of choice is the NumPy array. With these we can perform fast computations.

In [None]:
import numpy as np

arr1 = np.array(
    [1, 2, 3, 4, 5]
)

arr2 = np.array(
    [5, 4, 3, 2, 1]
)

In [None]:
%%timeit
# Slow way
sumrslt = []
for i, j in zip(arr1, arr2):
    sumrslt.append(i + j)
sumarr = np.array(sumrslt)

In [None]:
%%timeit
# Faster way!
rsltarr = arr1 + arr2 # It also looks much nicer!

* Vectorized operations such as the ones allowed by numpy are not only faster, but, as you can see, have a more elegant syntax.
* Numpy allows us to perform a slew of vectorized operations. In the following cell, we illustrate these operations.

In [None]:
# Mathematical operations
print("{} - {} = {}".format(arr1, arr2, arr1 - arr2))
print("{} / {} = {}".format(arr1, arr2, arr1 / arr2))
print("{} * {} = {}".format(arr1, arr2, arr1 * arr2))
print("{} % {} = {}".format(arr1, arr2, arr1 % arr2))
print()

# Comparison operations
eq_comp = arr1 == arr2
print("{} == {} = {}".format(arr1, arr2, eq_comp))

ineq_comp = arr1 != arr2
print("{} != {} = {}".format(arr1, arr2, ineq_comp))

print("{} < {} = {}".format(arr1, arr2, arr1 < arr2))
print()

# Logical operations
print("NOT: ~{} = {}".format(eq_comp, ~eq_comp))
print("AND: {} & {} = {}".format(ineq_comp, eq_comp, ineq_comp & eq_comp))
print("OR: {} | {} = {}".format(eq_comp, ineq_comp, eq_comp | ineq_comp))


* You can also slice Numpy arrays similarly to how you would slice. Consider the following 3D array

<div style="text-align: center;">
  <img src="./assets/array3d-highlight1.png" alt="array1" style="width: 300px"/>
</div>


In [None]:
num_elems = 2 * 3 * 5
arr1d = np.arange(num_elems)
arr1d

In [None]:
arr3d = arr1d.reshape(2, 3, 5)
arr3d

* If we wanted to get the highlighted elements, we could do something like this:

In [None]:
arr3d[:, 1:, 1:]

**Exercise:** How would you slice this array in order to get the elements highlighed in blue in the following image?

<div style="text-align: center;">
  <img src="./assets/array3d-highlight2.png" alt="array1" style="width: 300px"/>
</div>

In [None]:
# Answer here

* In addition to slicing, you can also filter Numpy arrays using arrays of booleans of the same shape as the array you wish to filter. 
* This can be useful in HEP when, for instance, you only want to keep events where a particle of interest has a $p_T$ above a certain amount. For Numpy arrays, we can do this type of filtering like this.

In [None]:
print(f"arr1 = {arr1}")
print(f"arr2 = {arr2}")
filter_arr = arr1 != arr2
print(f"filter = {filter_arr}")
arr1[filter_arr]

In [None]:
some_array = np.array([0.0, 1.1, 2.2, 3.3, 4.4, 5.5, 6.6, 7.7, 8.8, 9.9])
boolean_array = np.array(
    [True, True, True, True, True, False, True, False, True, False]
)

some_array[boolean_array]

## Limitations of Numpy

* One of the limitations of NumPy is that it only allows us to construct homogeneous arrays. However, in HEP, data is often *not* homogeneous: you might have that each event have different numbers of the same particle, each one having their own $p_T$, $\eta$, etc. In order to see this, we will load some data to get us started.

In [None]:
# This will produce an error! Read the error message careful and try to understand it.
np.array([
    [1, 2, 3],
    [1],
    [5, 7, 8, 100]
])