# `nb02`: Tables

![](figures/nb02/numpy.png)

Numpy is the core library for scientific computing in Python. It provides a high-performance multidimensional array object, and tools for working with these arrays. 

To use Numpy, we first need to import the `numpy` package:

In [None]:
import numpy as np

# Arrays

A Numpy `array` is a table of values, all of the same type, and is indexed by a tuple of  integers. The number of dimensions is the rank of the array; the shape of an array is a tuple of integers giving the size of the array along each dimension.

In [None]:
a = np.array([[1, 2, 3, 4],
              [5, 6, 7, 8],
              [9, 10, 11, 12]])
a

In [None]:
a.dtype

In [None]:
a.itemsize

In [None]:
a.ndim

In [None]:
a.shape

## Array creation

In [None]:
# from a list
a = np.array([2, 3, 4])
a

In [None]:
b = np.array([1.2, 3.5, 5.1])
b

In [None]:
b.dtype

In [None]:
# from a list, with a specified dtype
c = np.array([[1, 2], [3, 4]], dtype=complex)
c

In [None]:
# with placeholders
a = np.zeros((3, 4), dtype=int)
a

In [None]:
b = np.ones((3, 4, 2), dtype=int)
b

In [None]:
c = np.empty((2, 3))
c

In [None]:
# with a range
d = np.arange(10, 30, 5)  # same as d = np.array(range(10, 30, 5))
d

In [None]:
d = np.linspace(10, 30, num=10)
d

In [None]:
# with random numbers
e = np.random.random((3, 3))
e

## Shape manipulation

The shape of an array can be changed with various functions.

In [None]:
a = np.array([[1, 2, 3, 4],
              [5, 6, 7, 8],
              [9, 10, 11, 12]])
a.shape

In [None]:
# flatten
a.ravel()

In [None]:
# modify the shape
a.reshape(6, 2)

In [None]:
# transpose
a.T

<div class="alert alert-success">
    
**Exercise**. Build the following 2d array (without typing it in explicitly):
```
[[1,  6, 11],
 [2,  7, 12],
 [3,  8, 13],
 [4,  9, 14],
 [5, 10, 15]]
```

</div>

## Internals

In [None]:
a = np.arange(1, 10).reshape(3, 3)
a

In [None]:
b = a
b = b.reshape((-1,))
b

In [None]:
b[0] = -1
b

In [None]:
a

What's going on? The `reshape` operation does not create a new array, but only a different _view_ of `a`. Therefore, `a` and `b` share the same contiguous data block in memory.

<img src="./figures/nb02/ndarray.png" width="75%" />

In [None]:
aT = a.T
aT[0, 0] = 10
aT

In [None]:
a

In [None]:
aT.base is a.base

In [None]:
a.strides, aT.strides

In [None]:
a.__array_interface__["data"][0]

In [None]:
b.__array_interface__["data"][0]

In [None]:
c = a.copy()
c.__array_interface__["data"][0]

# Basic operations

In Numpy, basic operations are called [universal functions](https://numpy.org/doc/stable/reference/ufuncs.html#available-ufuncs) (`ufunc`). They all operate on arrays in an element-by-element fashion. More specifically, a ufunc is a vectorized wrapper for a function that takes a fixed number of specific inputs and produces a fixed number of specific outputs. 

In [None]:
# np.add?

As an example, basic mathematical operations are implemented as ufuncs. They operate elementwise on arrays, and are available both as operator overloads and as functions in the numpy module:

In [None]:
x = np.array([[1, 2], [3 ,4]], dtype=float)
y = np.array([[5, 6], [7, 8]], dtype=float)

# Elementwise sum
print(x + y)
print(np.add(x, y))

In [None]:
# Elementwise difference
print(x - y)
print(np.subtract(x, y))

In [None]:
# Elementwise product
print(x * y)
print(np.multiply(x, y))

In [None]:
# Elementwise division
print(x / y)
print(np.divide(x, y))

Numpy also provides functions which are designed to operate on sequences of numbers, such as the `sum` function. Sequential functions can act on an array's entries as if they form a single sequence, or act on subsequences of the array's entries, according to the array's axes.

In [None]:
a = np.array([[1, 2, 3, 4],
              [5, 6, 7, 8],
              [9, 10, 11, 12]])
np.sum(a)

In [None]:
np.sum(a, axis=0)

In [None]:
np.sum(a, axis=1)

<div class="alert alert-success">
    
**Exercise.** Compute `x^y` element-wise, where `x` and `y` are two arrays.

</div>

<div class="alert alert-success">
    
**Exercise.** Compute the Euclidean distance between the arrays `x` and `y`.

</div>

<div class="alert alert-success">
    
**Exercise.** Let `a = np.arange(24).reshape(4, 6)`. Compute the mean value of each row and each column.

</div>

<div class="alert alert-danger">

Because of the homogeneity of the array's entries, Numpy is able to delegate the task of performing mathematical operations to optimized, compiled C code. For this reason, performing extensive iterations (e.g. via ‘for-loops’) to perform repeated mathematical computations should nearly always be replaced by the use of vectorized functions on arrays. **This informs the entire design and usage paradigm of Numpy.**

</div>

In [None]:
%%timeit
total = np.sum(np.arange(10000))

In [None]:
%%timeit
total = 0
for i in np.arange(10000):
    total += i

In [None]:
%%timeit
total = 0
a = np.arange(10000)
for i in range(10000):
    total += a[i]

In [None]:
%%timeit 
a = np.random.rand(1000)
b = np.random.rand(1000)
np.dot(a, b)

In [None]:
%%timeit 
a = np.random.rand(1000)
b = np.random.rand(1000)
total = 0
for i in range(1000):
    total += a[i] * b[i]

# Indexing, slicing, iterating

## 1d arrays

One-dimensional arrays can be indexed, sliced, and iterated over, much like lists and other Python sequences. Indexing and slicing create views of arrays.

In [None]:
a = np.arange(10) ** 2
a

In [None]:
a[2:5]

In [None]:
a[:6:2] = 100
a

In [None]:
a[::-1]

In [None]:
for i in a:
    print(i)

## nd arrays

Multidimensional arrays take one index per axis. These indices are given in a tuple separated by commas:

In [None]:
b = np.arange(20).reshape(5, 4) ** 2
b

In [None]:
b[2, 1]

In [None]:
b[0:5, 1:3:2]

In [None]:
b[:, 1]

In [None]:
b[1:3, :]

In [None]:
b[:, 1:3]

In [None]:
b[-1]

In [None]:
c = np.array([[[0, 1, 2], 
               [10, 12, 13]],
              [[100, 101, 102],
               [110, 112, 112]]])
c

In [None]:
c.shape

In [None]:
c[1, ...]

In [None]:
c[1, :, :]

In [None]:
c[..., 2]

In [None]:
c[:, :, 2]

Iterating over multidimensional arrays is done with respect to the first axis:

In [None]:
for row in b:
    print(row)

In [None]:
for element in b.flat:
    print(element)

## Fancy indexing

When you index into numpy arrays using slicing, the resulting array view will always be a subarray of the original array. Instead, integer array indexing allows you to construct arbitrary arrays using the data from another array. 

In [None]:
x = np.arange(10, 1, -1)
x

In [None]:
x[np.array([3, 3, 1, 8])]

In [None]:
x[np.array([3, 3, -3, 8])]

In [None]:
x = np.arange(9).reshape(3, 3)
print(x)
x[np.array([0, 2]), np.array([1, 0])]

Arrays can also be indexed with Boolean arrays:

In [None]:
sin = np.sin(np.linspace(0, 2*np.pi, num=20))
sin

In [None]:
sin > 0

In [None]:
sin[sin > 0.0]

<div class="alert alert-success">
    
**Exercise**. Let `a = np.arange(9).reshape(3, 3)`. Swap the first and the second rows.

</div>

# Broadcasting

Broadcasting is a powerful mechanism that allows Numpy to work with arrays of different shapes when performing arithmetic operations. Frequently we have a smaller array and a larger array, and we want to use the smaller array multiple times to perform some operation on the larger array.

For example, suppose that we want to add a constant vector to each row of a matrix. We could do it like this:

In [None]:
# without broadcasting
x = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])
v = np.array([0, 1, 2])
y = np.empty_like(x)   

for i in range(4):
    y[i, :] = x[i, :] + v

y

In [None]:
# with broadcasting
x = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])
v = np.array([0, 1, 2])
y = x + v 
y

The line `y = x + v` works even though `x` has shape `(4, 3)` and `v` has shape `(3,)` due to broadcasting; this line works as if v actually had shape `(4, 3)`, where each row was a copy of `v`, and the sum was performed elementwise.

Broadcasting two arrays together follows these rules:

1. If the arrays do not have the same rank, prepend the shape of the lower rank array with 1s until both shapes have the same length.
2. The two arrays are said to be compatible in a dimension if they have the same size in the dimension, or if one of the arrays has size 1 in that dimension.
3. The arrays can be broadcast together if they are compatible in all dimensions.
4. After broadcasting, each array behaves as if it had shape equal to the elementwise maximum of shapes of the two input arrays.
5. In any dimension where one array had size 1 and the other array had size greater than 1, the first array behaves as if it were copied along that dimension.

In Numpy, all universal functions support broadcasting!

Here are some applications of broadcasting:

In [None]:
x = np.array([[1,2,3], [4,5,6]])  # x has shape (2, 3)
v = np.array([1, 2, 3])  # v has shape (3,)
w = np.array([4, 5])     # w has shape (2,)

x

In [None]:
# Add a vector to each row of a matrix

# x has shape (2, 3) and v has shape (3,) so they broadcast to (2, 3),
# giving the following matrix:

x + v

In [None]:
# Add a vector to each column of a matrix

# x has shape (2, 3) and w has shape (2,).
# If we transpose x then it has shape (3, 2) and can be broadcast
# against w to yield a result of shape (3, 2); transposing this result
# yields the final result of shape (2, 3) which is the matrix x with
# the vector w added to each column. Gives the following matrix:

(x.T + w).T

In [None]:
# Compute outer product of vectors

# To compute an outer product, we first reshape v to be a column
# vector of shape (3, 1); we can then broadcast it against w to yield
# an output of shape (3, 2), which is the outer product of v and w:

np.reshape(v, (3, 1)) * w

In [None]:
np.outer(v, w)

Broadcasting typically makes your code more concise and faster, so you should strive to use it where possible.

<div class="alert alert-success">
    
**Exercise**. Divide each column of the array `a = np.arange(25).reshape(5, 5)` elementwise with the array `b = np.array([1., 5, 10, 15, 20])`.

</div>

# Routines

Numpy comes with [a large suite](https://numpy.org/doc/stable/reference/routines.html) of routines, including:

- Array creation and manipulation
- String operations
- Datetime support functions
- Functional programming
- I/O
- Linear algebra
- Mathematical functions
- Random sampling
- Sorting, searching and counting
- Statistics

# Wrap-up exercises

## Data statistics (Scipy lectures, 1.4.5.3)

The data in `data/population.txt` describes the populations of hares and lynxes (and carrots) in northern Canada during 20 years:

In [None]:
data = np.loadtxt("data/populations.txt")
year, hares, lynxes, carrots = data.T

import matplotlib.pyplot as plt
plt.plot(year, hares, label="hares")
plt.plot(year, lynxes, label="lynxes")
plt.plot(year, carrots, label="carrots")
plt.legend()
plt.show()

Compute and print, based on the data in the file:

1. The mean and std of the populations of each species for the years in the period.
2. Which year each species had the largest population.
3. Which species has the largest population for each year.
4. Which years any of the populations is above 50000. 
5. The top 2 years for each species when they had the lowest populations.
6. Compare (plot) the change in hare population (see `np.gradient`) and the number of lynxes. Check correlations (see `np.corrcoef`).

... all without for-loops.

In [None]:
hares.mean(), lynxes.mean(), carrots.mean()

In [None]:
np.mean(data[:, 1:], axis=0)

In [None]:
year[np.argmax(data[:, 1:], axis=0)]

In [None]:
species = np.argmax(data[:, 1:], axis=1)
np.array(["hares", "lynxes", "carrots"])[species]

In [None]:
year[np.any(data[:, 1:] > 50000, axis=1)]

In [None]:
year[np.argsort(data[:, 1:], axis=0)[:2]]

In [None]:
plt.plot(year, np.gradient(hares), label="rate of change of hares")
plt.plot(year, lynxes, label="lynxes")
plt.legend()
plt.plot()

In [None]:
np.corrcoef(np.gradient(hares), lynxes)

## Mandelbrot (Scipy lectures, 1.4.5.5)

Write a script that computes the Mandelbrot fractal. The Mandelbrot iteration:
```python
N_max = 50
threshold = 50
c = x + 1j*y
z = 0
for j in range(N_max):
    z = z**2 + c
```

A point `(x, y)` belongs to the Mandelbrot set if `|z| < threshold`. 

Compute the Mandelbrot in the following way:
1. Build a grid of `c = x + 1j * y` values in the range `[-2, 1]x[-1.5, 1.5]`.
2. Compute the Mandelbrot iteration.
3. Form the 2d Boolean mask indicating which points are in the set.
4. Display the result with `plt.imshow()`.