# Module 2.1: Numerical Python (`numpy`)

## Numpy
Numpy <Num-*pie*> is a python package for 'numerical python'. It is a library that provides a multidimensional array object (`ndarray`), various derived objects (such as masked arrays and matrices), and an assortment of routines for fast operations on arrays, including mathematical, logical, shape manipulation, sorting, selecting, I/O, discrete Fourier transforms, basic linear algebra, basic statistical operations, random simulation and much more.

Numpy itself is not a high-level analysis package, but it is the fundamental building block on which many other packages are built. It is arguably the foundation upon which the entire scientific python ecosystem is built. 

### Ndarray
The `ndarray` is the core object that can be used to store **homogeneous** data. It is a table of elements (usually numbers), **all of the same type**, indexed by a tuple of positive integers. In Numpy dimensions are called `axes`. The number of axes is called the `rank` and the `shape` of an array is a tuple of integers giving the size of the array along each axis.

We can initialize numpy arrays from nested Python lists, and access elements using square brackets:

In [None]:
import numpy as np  # import numpy library

data = [5, 3.0, 1, 2.75, 4.11, 6, 7, 8.2, 9, 10]

# Create a numpy array from the list `data`
arr = np.array(data)
arr

Nested lists can be converted to multidimensional arrays using the same `array` function. For example, the following code produces a two-dimensional array:


In [None]:
# Nested (list of lists) to 2D array
data = [[1, 2, 3, 4], [5, 6, 7, 8]]
arr = np.array(data)
arr

The `shape` attribute returns the number of rows and columns of the array.

In [None]:
arr.shape

In this 2D example `arr.shape` returns a tuple with two elements, the first is the number of rows and the second is the number of columns.

We can use the ndim attribute to get the number of axes (dimensions) of the array.

In [None]:
arr.ndim

When creating an `ndarray`, we can specify the type of the elements using the `dtype` parameter. If we don't specify the type, Numpy will try to guess the type of the data when the array is created.

The `dtype` attribute returns the type of the elements in the array.

In [None]:
arr.dtype

You can specify the `dtype` of the array when creating it.

In [None]:
arr = np.array([1, 2, 3], dtype=float)
print(arr)
print(arr.dtype)

You can change the `dtype` of an existing array (cast to another `dtype`) using the `astype` method. The `astype` method creates a new array (a copy of the data), and does not change the original array itself.

In [None]:
int_arr = arr.astype(int)

print(arr.dtype)

print(int_arr.dtype)

Similar to the base `range` function, numpy has a `arange` function that returns an array that returns an array containing evenly spaced values within a given interval. 

Like `range`, the values are generated within the half-open interval [`start`, `stop`). The `start` value is inclusive, while the `stop` value is exclusive.

In [None]:
arr = np.arange(0, 10)
print(arr)

arr = np.arange(1, 101)
print(arr)

We can also change the shape of an array using the `reshape` method.

In [None]:
# Create a 10x10 array of integers from 1 to 100
arr = np.arange(1, 101).reshape(10, 10)
print(arr)

### Creating 'placeholder' arrays
It is sometimes useful to create arrays with pre-defined values, for example an array of zeros, an array of ones, or an array with a range of values. Numpy provides a number of functions to create such arrays:

In [None]:
# Create a 1D array of zeros
arr = np.zeros(10)
print(arr)

# Create a 2D array, 4x10 of ones
arr = np.ones((4, 10))
print(arr)

# Create a 3D array, 4x4x4 filled with a specific value (999)
arr = np.full((4, 4, 4), 999)
print(arr)

# Createe a 2D array, 3x3, of random integers between 0-9
# Set seed to 0. This is very important to maintain reproducibility.
rand = np.random.default_rng(0)  # Create a random number generator
arr = rand.randint(0, 10, (3, 3))
print(arr)

### Array indexing and slicing
Numpy arrays can be indexed and sliced in a manner similar to Python lists. 

1D arrays are indexed and sliced in _exactly_ the same way as lists.

In [None]:
arr = np.linspace(1, 10, 10)
print(arr)

# Get the 3rd element
print(arr[2])

# Get elements in the range [0, 3) (the first three elements)
print(arr[:3])  # same as arr[0:3]

Multi-dimensional arrays are indexed using a *comma-separated tuple of indices*.

To get the first row of the array we can use the following code:


In [None]:
# Create a 2D array, 4x5 of evenly spaced values between 1-20
arr = np.linspace(1, 20, 20).reshape(4, 5)
print(arr)

# Get the first row of the array
arr[0, :]  # same as arr[0]. Note the different behaviors of 1D and 2D arrays

To get the first column of the array `arr` we use a colon `:` to indicate that we want all rows, and the index `0` to indicate that we want the first column:

In [None]:
# Get the first column of the array
arr[:, 0]


#### Boolean indexing
We can use boolean indexing to select elements from an array based on a condition. For example, to get all elements in the array `arr` that are greater than a given value.


In [None]:
arr = np.arange(1, 26).reshape(5, 5)

# Identify all elements in the array that are greater than 10
arr > 10

In [None]:
thresh_idx = arr > 10

arr[thresh_idx]

In [None]:
# Modifies in-place. Be careful with in-place operations.
arr[thres_idx] += 100

arr

### Array operations
Numpy arrays support arithmetic operations such as addition, subtraction, multiplication, division and exponentiation. The operations are applied element-wise.


In [None]:
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

# Element-wise addition
print(a + b)

# Element-wise exponentiation
print(a ** b)

# Element-wise multiplication
print(a * b)

### Broadcasting
Broadcasting is a powerful mechanism that allows numpy to work with arrays of different shapes when performing arithmetic operations. Frequently we have a smaller array and a larger array, and we want to use the smaller array multiple times to perform some operation on the larger array.

Example: https://jakevdp.github.io/PythonDataScienceHandbook/02.05-computation-on-arrays-broadcasting.html

For example, suppose that we want to add a different constant vector to each row of a matrix. We could do it like this:

In [None]:
a = np.arange(1, 10).reshape(3, 3)
print(a)

# The reshape here is necessary to allow for broadcasting along the rows.
# If we didn't do this, we would have an array of shape(1, 3) which would broadcast along the columns instead.
b = np.array([10, 50, 1000]).reshape(3, 1)
print(b)

print(a + b)

### Summary statistics
Numpy has a number of built-in functions that can be used to compute summary statistics of arrays.

In [None]:
# The `sum` function returns the sum of all elements in the array.

arr = np.arange(1, 11).reshape(2, 5)
print(arr)

print(arr.sum())

In [None]:
# The `mean` function returns the mean of all elements in the array.
print(arr.mean())

We can use the `axis` parameter to specify whether an operation should be performed on the rows or columns of an array. For example, to compute the mean of each row, we can specify `axis=1`:

In [None]:
# row means
print(arr.mean(axis=1))

# column means
print(arr.mean(axis=0))

### Pseudorandom number generation
NumPy has a number of functions for generating random numbers from various probability distributions.

For example, we can use the `normal` function to generate an array of 100 samples from the standard normal distribution (mean 0 and variance 1):

In [None]:
samples = np.random.normal(size=(1, 100))

print(samples)

print(samples.mean())

print(samples.std())

Noticee the mean and standard deviation of the samples are ~0 and ~1, respectively, by default.

We can also generate random numbers from other distributions, try some of the following:

`rand = np.random.default_rng(0)`

- `rand.binomial`
- `rand.poisson`
- `rand.exponential`
- `rand.uniform`
- `rand.gamma`
- `rand.beta`
- `rand.chisquare`

In [None]:
# Try generating samples from the above distribution with a larger sample size (e.g. 1000).