<a href="https://www.hydroffice.org/epom/"><img src="images/000_000_epom_logo.png" alt="ePOM" title="Open ePOM home page" align="center" width="12%" alt="Python logo\"></a>

<a href="https://piazza.com/e-learning_python_for_ocean_mapping/fall2019/om100/home"><img src="images/help.png" alt="ePOM" title="Ask questions on Piazza.com" align="right" width="10%" alt="Piazza.com\"></a>
# Array Operations

The [Introduction to Numpy](COMP_000_Intro_to_Numpy.ipynb) notebook has provided an overview of some of the main [NumPy](https://www.numpy.org/) functionalities such as: 

* Creation of arrays with specific values or from existing lists.
* Access to specific values in an array (**indexing**).
* Extraction of sub-arrays from an array (**slicing**).

The task of this notebook is to extend such a basic knowledge of Numpy, highlighting its flexible and optimized computation powers.

<img align="left" width="6%" style="padding-right:10px;" src="images/key.png">

Operations on NumPy arrays can be very fast, but also very slow. The key is to use [**vectorized operation**](https://en.wikipedia.org/wiki/Array_programming) that operates on entire arrays at once.

 We will also apply notions from the [Introduction to Matplotlib](VIS_000_Intro_to_Matplotlib.ipynb) notebook to visualize and, thus, better clarify the applied operations.

Before starting to use `numpy` and `matplotlib`, you have to execute the following cell:

In [None]:
%load_ext autoreload
%autoreload 2
%matplotlib inline

import sys
import os
import matplotlib.pyplot as plt
import numpy as np
import matplotlib

sys.path.append(os.getcwd())

Similarly to what was done in previous notebooks, the cell below retrieves and prints the Numpy and Matplotlib versions:

In [None]:
print("Numpy version: %s" % (np.__version__, ))
print("Matplotlib version: %s" % (matplotlib.__version__, ))

***

## Creating and Reshaping an Array with Random Values

Numpy also provides means to create array of random value. For instance, calling [`random.random()`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.random.random.html) creates an array of [uniformly distributed random values](https://en.wikipedia.org/wiki/Uniform_distribution_(continuous)) between 0 and 1.

<img align="left" width="6%" style="padding-right:10px;" src="images/info.png">

Passing a fixed value to [`random.seed()`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.random.seed.html) ensures that the same random values are generated each time that the code below is executed:

<img align="left" width="6%" style="padding-right:10px;" src="images/info.png">

Numpy arrays can be visualized by Matplotlib in a few way. A popular solution is to call [`imshow()`](https://matplotlib.org/api/_as_gen/matplotlib.pyplot.imshow.html).

In [None]:
np.random.seed(8880)  # seed for reproducibility

arr = np.random.random((1, 12))
print(arr)

plt.imshow(arr)
plt.colorbar()
plt.show()

The above cell created an array composed of a single row with 12 values. Numpy provides also the option of reshaping existing arrays using `reshape()`. For instance, the cell below reshape the array created above to an array with 3 rows and 4 columns:

In [4]:
arr2 = arr.reshape((3, 4))
print(arr2)

plt.imshow(arr2)
plt.colorbar()
plt.show()

<img align="left" width="6%" style="padding-right:10px;" src="images/key.png">

In order to use `reshape()`, the size of the initial array must match the size requested for the reshaped array.

The `reshape()` can also be used to convert one-dimensional arrays into higher dimensional arrays. 

In [5]:
arr = np.array([12.0, 14.2, 23.1, 12.3, 9.1])
print("1D array shape: %s" % (arr.shape,))
print(arr)

arr2 = arr.reshape((1, 5))
print("2D array shape with 1 row: %s" % (arr2.shape,))
print(arr2)

arr3 = arr.reshape((5, 1))
print("2D array shape with 1 column: %s" % (arr3.shape,))
print(arr3)

---

## Splitting and Concatenating Arrays

The cell below creates two arrays with random values:

In [6]:
arr_a = np.random.random((3, 4))
print(arr_a)
plt.imshow(arr_a)
plt.colorbar()
plt.show()

arr_b = np.random.random((3, 4))
print(arr_b)
plt.imshow(arr_b)
plt.colorbar()
plt.show()

<img align="left" width="6%" style="padding-right:10px;" src="images/key.png">

The joining of two arrays (**concatenation**) can be obtained using [`concatenate()`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.concatenate.html).

In [7]:
arr_c = np.concatenate([arr_a, arr_b])

print(arr_c)
plt.imshow(arr_c)
plt.colorbar()
plt.show()

As shown in the plot above, the default behavior is to concatenate along the first axis (vertically, for 2D arrays).

If you want to concatenate horizontally, set the `axis` parameter to `1`:

In [8]:
arr_d = np.concatenate([arr_a, arr_b], axis=1)

print(arr_d)
plt.imshow(arr_d)
plt.colorbar()
plt.show()

Splitting an array is supported using [`split()`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.split.html#numpy.split) and the number of equal arrays required along the passed axis. 

The cell below re-create the original `arr_a` and `arr_b` arrays:

In [9]:
arr_e, arr_f = np.split(arr_d, 2, axis=1)

print(arr_e)
plt.imshow(arr_e)
plt.colorbar()
plt.show()

print(arr_f)
plt.imshow(arr_f)
plt.colorbar()
plt.show()

---

## Array Vectorization

Mainly due to the dynamic and interpreted nature of the language, Python performs some operations very slowly. 

This computation issue manifests itself when a number of small operations are being repeated. A common case with such an issue is the looping over arrays to operate on each element.

In [10]:
arr = np.random.random((40, 50))

plt.imshow(arr)
plt.colorbar()
plt.show()

The code below define a `divide_by()` function that loops through an array and divide each array element by the passed value:

In [11]:
def divide_by(array, value):
    output = np.empty_like(array)
    for r in range(array.shape[0]):  # row looping
        for c in range(array.shape[1]):  # column looping
            output[r, c] = array[r, c] / value
    return output

We will now use the `%timeit` special method to benchmark the execution of the `compute_mean()`:

In [12]:
%timeit divide_by(arr, 2.0)

484 µs ± 1.68 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


The resulting execution time is quite slow given the processing power of any modern machine. What happens if we called the corresponding vectorized operation?

In [13]:
%timeit arr / 2.0

1.62 µs ± 4.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


The above vectorized operation is hundreds of times faster! This is because the real bottleneck was not in the division in itself (modern machine are highly optimized for this kind of operations), but in the several, internal operations performed by Python at each loop cycle. Calling the Numpy vectorized operation simply avoided them!

<img align="left" width="6%" style="padding-right:10px;" src="images/info.png">

At each loop cycle, Python has first to evaluate the object types (**type-checking**) and dynamically look up for the function to use for each type (**function dispatch**). With the vectorized operator, these details are known at compilation time and the operation is computed more efficiently.

### Arithmetic Operators

The standard [Python arithmetic operators](../python_basics/001_Variables_and_Types.ipynb#Numeric-Operators) are available as Numpy vectorized operations:

In [14]:
a = np.array([10.0, -8.0, 5.0, -7.0, 2.0])
print("a       =", a)
print("-a      =", -a)
print("a  +  8 =", a + 8)
print("a  -  3 =", a - 3)
print("a  *  3 =", a * 3)
print("a  /  2 =", a / 2)
print("a  // 2 =", a // 2)  # floor division
print("a  %  5 =", a % 5)  # modulo operator
print("a  ** 2 =", a ** 2)  # power operator

a       = [10. -8.  5. -7.  2.]
-a      = [-10.   8.  -5.   7.  -2.]
a  +  8 = [18.  0. 13.  1. 10.]
a  -  3 = [  7. -11.   2. -10.  -1.]
a  *  3 = [ 30. -24.  15. -21.   6.]
a  /  2 = [ 5.  -4.   2.5 -3.5  1. ]
a  // 2 = [ 5. -4.  2. -4.  1.]
a  %  5 = [0. 2. 0. 3. 2.]
a  ** 2 = [100.  64.  25.  49.   4.]


Each of the above operators has a corresponding Numpy function. For instance, the power operator (`**`) and `np.power()`.

In [15]:
print("a  ** 2 =", np.power(a, 2))

a  ** 2 = [100.  64.  25.  49.   4.]


### Mathematical and Trigonometric Functions

Numpy provides an extended set of mathematical and trigonometric vectorized functions. The below cell lists a few of them:

In [16]:
a = np.linspace(-np.pi, np.pi, 3)  # np.pi provides the pi mathematical constant
print("a       =", a)
print("abs(a)  =", np.abs(a))
print("sin(a)  =", np.sin(a))
print("cos(a)  =", np.cos(a))
print("tan(a)  =", np.tan(a))

a       = [-3.14159265  0.          3.14159265]
abs(a)  = [3.14159265 0.         3.14159265]
sin(a)  = [-1.2246468e-16  0.0000000e+00  1.2246468e-16]
cos(a)  = [-1.  1. -1.]
tan(a)  = [ 1.2246468e-16  0.0000000e+00 -1.2246468e-16]


### Aggregation Functions

NumPy provides vectorized aggregation functions for working on arrays.

In [17]:
b = np.array([[2.5, -2.0, 0.3, 1.0, 1.0],
              [1.5, -1.0, -0.5, -1.0, 1.2]])

The `sum()` function returns the cumulative sum for all the array values.

In [18]:
b.sum()

3.0

If you want to aggregate along rows or columns, you set the `axis` parameter:

In [19]:
print("cumulative sum along rows: %s" % b.sum(axis=1))
print("cumulative sum along colums: %s" % b.sum(axis=0))

cumulative sum along rows: [2.8 0.2]
cumulative sum along colums: [ 4.  -3.  -0.2  0.   2.2]


---

## Broadcasting

<img align="left" width="6%" style="padding-right:10px;" src="images/key.png">

Broadcasting allows operations to be performed on arrays with different sizes.

NumPy broadcasting provides another means of optimizing operations using vectorization. 

These are the set of rules applied in broadcasting to determine the interaction between a pair of arrays:

1. If the arrays differ in the number of dimensions, the shape of the one with fewer dimensions is padded with ones on its leading (left) side.
2. After 1., if the shape of the two arrays does not match in any dimension, the array with shape equal to 1 has that dimension stretched to match the other shape.
3. An error is raised when the sizes disagree in any dimension, and neither size is equal to 1.

In the cells below, rules 1 and 2 are applied to `b`:

In [20]:
a = np.ones((2, 3))
print("a shape: %s" % (a.shape,))
print(a)
b = np.array([0, 1, 2])  # 1D row array
print("b shape: %s" % (b.shape,))
print(b)

c = a + b
print("a + b shape: %s" % (c.shape,))
print(c)

a shape: (2, 3)
[[1. 1. 1.]
 [1. 1. 1.]]
b shape: (3,)
[0 1 2]
a + b shape: (2, 3)
[[1. 2. 3.]
 [1. 2. 3.]]


In [21]:
a = np.ones((3, 2))
print("a shape: %s" % (a.shape,))
print(a)
b = np.array([[0], [1], [2]])  # 1D column array
print("b shape: %s" % (b.shape,))
print(b)

c = a + b
print("a + b shape: %s" % (c.shape,))
print(c)

a shape: (3, 2)
[[1. 1.]
 [1. 1.]
 [1. 1.]]
b shape: (3, 1)
[[0]
 [1]
 [2]]
a + b shape: (3, 2)
[[1. 1.]
 [2. 2.]
 [3. 3.]]


The cell below raises an error because of rule 3.

In [22]:
a = np.ones((3, 2))
print("a shape: %s" % (a.shape,))
print(a)
b = np.array([0, 1, 2])  # 1D row array
print("b shape: %s" % (b.shape,))
print(b)

c = a + b
print("a + b shape: %s" % (c.shape,))
print(c)

a shape: (3, 2)
[[1. 1.]
 [1. 1.]
 [1. 1.]]
b shape: (3,)
[0 1 2]


ValueError: operands could not be broadcast together with shapes (3,2) (3,) 

---

<img align="left" width="6%" style="padding-right:10px; padding-top:10px;" src="images/refs.png">

## Useful References

* [The official Python 3.6 documentation](https://docs.python.org/3.6/index.html)
* [Programming Basics with Python](https://github.com/hydroffice/python_basics)
* The Numpy Package:
  * [Website](https://www.numpy.org/)
  * [`random.random()`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.random.random.html)
  * [`random.seed()`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.random.seed.html) 
* The Matplotlib Package:
  * [Website](https://matplotlib.org/)
  * [Documentation](https://matplotlib.org/users/index.html)
  * [`imshow()`](https://matplotlib.org/api/_as_gen/matplotlib.pyplot.imshow.html)
* [Uniform Distribution](https://en.wikipedia.org/wiki/Uniform_distribution_(continuous))

<img align="left" width="5%" style="padding-right:10px;" src="images/email.png">

*For issues or suggestions related to this notebook, write to: epom@ccom.unh.edu*

<!--NAVIGATION-->
[Adopting NumPy in Class Methods >](COMP_001_Adopting_NumPy_in_Class_Methods.ipynb) | [Contents](index.ipynb) | 