# NumPy

Computers are designed so that it is more efficient to apply an operation on an array of numbers at once, in bulk, rather than on each number in turn.
Pure Python code, using only the Python standard library, do not enable you to do that.

## Gaining functionalities with libraries

This can be remedied by using an additional [software library](https://en.wikipedia.org/wiki/Library_(computing)), which is, roughly speaking, a collection of code that provides functionalities to perform operations on a given task or domain.
You might have heard, or will hear about, libraries dedicated to machine learning such as [scikit-learn](https://en.wikipedia.org/wiki/Scikit-learn) or Google's [TensorFlow](https://en.wikipedia.org/wiki/TensorFlow) to build neural networks.

The most widely used library in Python to store arrays of numbers and perform computations on them work is [NumPy](https://en.wikipedia.org/wiki/NumPy).
NumPy is a library for scientific computing, providing optimized data structures and operations, mathematical functions, linear algebra routines, etc

NumPy's optimized implementation in [the C programming language](https://en.wikipedia.org/wiki/C_(programming_language) enables you to benefit from this, which means your operations on numbers can run faster than in pure Python.


To use NumPy in your notebook or program, you need to import it as follows

In [None]:
import numpy as np

Here we imported the NumPy library, and gave it a shorter name `np` that we can use whenever we need to use one of its functionalities.

## Array creation

NumPy arrays can be created from Python lists

In [None]:
my_array = np.array([1, 2, 3])
my_array

array([1, 2, 3])

The above is a 1-dimensional array, extremely similar to a list, but NumPy supports arrays of arbitrary dimension.

For example, we can create two-dimensional arrays (e.g. to store a [matrix](https://en.wikipedia.org/wiki/Matrix_(mathematics))) as follows

In [None]:
my_2d_array = np.array([[1, 2, 3], [4, 5, 6]])
my_2d_array

array([[1, 2, 3],
       [4, 5, 6]])

The array having *2 dimensions* means that we can access individual elements using *a pair of coordinates*.

For instance, we can retrieve the value `6` in the array by giving its location, on the second line (index `1`), third column (index `2`):

In [None]:
my_2d_array[1, 2]

6

> #### 🗒 Note
> Indices in a NumPy array are 0-based, just like indices in a Python list.

NumPy arrays have a `shape` attribute that provides information about the number of dimensions, and the length of each dimension.

In [None]:
print(my_array.shape)
print(my_2d_array.shape)

(3,)
(2, 3)


`my_array` has 1 dimension of length 3.

`my_2d_array` has 2 dimensions of length 2 and 3, respectively.

> #### 🗒 Note
> An *attribute* in Python is a variable that is attached to an object, the same way a method is attached to an object (cf. `list.append` earlier).

The notations for *indexing* and *slicing*, that we saw on Python lists, enable to access entire rows or columns.

For instance, the second row consists of all elements whose:
* row number (first coordinate) is `1`,
* column number (second coordinate) is every possible value in the range of column numbers.

In [None]:
my_2d_array[1, :]

array([4, 5, 6])

Similarly, the third column consists of all elements whose:
* row number (first coordinate) is every possible value in the range of line numbers,
* column number (second coordinate) is `2`.

In [None]:
my_2d_array[:, 2]

array([3, 6])

You will very frequently encounter a "shortcut" notation, where only the indices or slices for 1 or a few first dimensions are explicitly provided, and all subsequent dimensions are implicitly taken as whole ranges (`:`).

This means we can select the second row with the alternative notation:

In [None]:
my_2d_array[1]
# this is equivalent to: my_2d_array[1, :]

array([4, 5, 6])

> #### 🗒 Note
> There is no shortcut notation to select a column in a 2-dimensional array, because all preceding dimensions must be explicitly indexed or sliced.
>
>However, if we worked with an hypothetical 3-dimensional array called `my_3d_array`, the notation `my_3d_array[:, 2]` would be a shortcut to select all elements in the third "column". This implicitly includes elements at all indices of the third dimension, ie. it is equivalent to `my_3d_array[:, 2, :]`.

Contrary to Python lists, NumPy arrays must have a type and all elements of the array must have the same type.
Here, `dtype` is another attribute of the array.

In [None]:
my_array.dtype

dtype('int64')

The main types are `int32` (32-bit integers), `int64` (64-bit integers), `float32` (32-bit real values) and `float64` (64-bit real values).

The `dtype` can be specified when creating the array

In [None]:
my_array = np.array([1, 2, 3], dtype=np.float64)
my_array.dtype

dtype('float64')

Let us check what this array contains.

In [None]:
my_array

array([1., 2., 3.])

Because we explicitly specified the dtype of the array to be created, the given integer values `1`, `2`, `3` have been converted to float values `1.`, `2.`, `3.` (short for `1.0`, `2.0`, `3.0` respectively).

We can create arrays of all zeros using the numpy function `zeros`.

Here we create a matrix (a 2-D array) with 2 rows and 3 columns.

In [None]:
zero_array = np.zeros((2, 3))
zero_array

array([[0., 0., 0.],
       [0., 0., 0.]])

**Exercise.** Check the data type of the newly created array.

In [None]:
zero_array.dtype

dtype('float64')

Similarly, the function `ones` creates an array with 1 in each cell.

**Exercise.** Create an array of 5 rows and 4 columns, filled with ones.

In [None]:
one_array = np.ones((5, 4))
one_array

array([[1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]])

Similar to the `range` function in the Python standard library, in numpy we can create a range of values using the [arange](https://numpy.org/doc/stable/reference/generated/numpy.arange.html) function.

In [None]:
np.arange(5)

array([0, 1, 2, 3, 4])

If a unique argument is provided, the starting value is assumed to be 0, but you can specify the starting (included) and ending (excluded) values

In [None]:
np.arange(3, 5)

array([3, 4])

Another useful routine is `linspace` for creating linearly spaced values in an interval. For instance, to create 10 values in `[0, 1]`, we can use

In [None]:
np.linspace(0, 1, 10)

array([0.        , 0.11111111, 0.22222222, 0.33333333, 0.44444444,
       0.55555556, 0.66666667, 0.77777778, 0.88888889, 1.        ])

The array indeed contains ten values, including `0.` and `1.`, that are linearly spaced.

This might however not correspond to your expectation, or what you would have intended if you had written this instruction yourself.

**Exercise.** Write a modified version of this instruction that instead returns `[0., 0.1, 0.2, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]`.

In [None]:
np.linspace(0, 1, 11)

array([0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1. ])

Another important operation on NumPy arrays is [reshape](https://numpy.org/doc/stable/user/quickstart.html?highlight=reshape#changing-the-shape-of-an-array), for changing the shape of an array

In [None]:
# 1d array
my_array = np.array([1, 2, 3, 4, 5, 6])
my_array

array([1, 2, 3, 4, 5, 6])

In [None]:
# reshape into a 3 rows, 2 cols array
# we call the reshape() method on the my_array object
my_array.reshape(3, 2)

array([[1, 2],
       [3, 4],
       [5, 6]])

You can play with these operations and make sure you understand them well.

## Basic operations

In NumPy, we express computations directly over arrays. This makes the code much more succint.

Arithmetic operations can be performed directly over arrays. For instance, assuming two arrays have a compatible shape, we can add them as follows

In [None]:
array_a = np.array([1, 2, 3])
array_b = np.array([4, 5, 6])
array_a + array_b

array([5, 7, 9])

Compare this with the equivalent computation using a for loop

In [None]:
# create an array with the same shape as array_a, but filled with 0s
array_out = np.zeros_like(array_a)
# sum element-wise
for i in range(len(array_a)):
  array_out[i] = array_a[i] + array_b[i]
array_out

array([5, 7, 9])

Not only this code is more verbose, it will also run much more slowly.

In NumPy, functions that operate on arrays in an element-wise fashion are called [universal functions](https://numpy.org/doc/stable/reference/ufuncs.html). For instance, this is the case of `np.sin`

In [None]:
np.sin(array_a)

array([0.84147098, 0.90929743, 0.14112001])

[Vector inner product](https://en.wikipedia.org/wiki/Dot_product) can be performed using `np.dot`

In [None]:
np.dot(array_a, array_b)

32

### Advanced notions (optional)
When the two arguments to `np.dot` are both 2d arrays, `np.dot` becomes matrix multiplication

In [None]:
# A: random values from a uniform distribution over [0, 1)
array_A = np.random.rand(5, 3)
array_A

array([[0.67820903, 0.43824677, 0.1168565 ],
       [0.87493485, 0.74637326, 0.66845354],
       [0.84174041, 0.56758632, 0.69959319],
       [0.48425096, 0.12418178, 0.82069518],
       [0.35857016, 0.42358637, 0.76601952]])

In [None]:
# B: random values from the standard normal distribution
array_B = np.random.randn(3, 4)
array_B

array([[-0.0939625 , -0.10379045,  0.00377791,  0.89752424],
       [-1.70291436,  1.22072069, -1.41558762,  1.73743372],
       [-0.43694199, -1.88874156, -1.2437126 , -0.59502533]])

In [None]:
np.dot(array_A, array_B)

array([[-0.86108245,  0.24387354, -0.76315039,  1.30060118],
       [-1.64529622, -0.44223258, -1.88461541,  1.68430251],
       [-1.35132458, -0.71585099, -1.67038103,  1.32535037],
       [-0.61556856, -1.44875046, -1.19466969,  0.16205017],
       [-1.08972956, -0.96694841, -1.55097711,  0.60197765]])

Matrix transpose can be done using `.transpose()` or `.T` for short

In [None]:
array_A.T

array([[0.84254571, 0.8729602 , 0.76196144, 0.87053858, 0.55601909],
       [0.40069714, 0.48700109, 0.53893757, 0.76891072, 0.40953449],
       [0.36015303, 0.53530548, 0.19596679, 0.59329035, 0.52866948]])

## Slicing and masking

Like Python lists, NumPy arrays support slicing.

In [None]:
np.arange(10)[5:]

array([5, 6, 7, 8, 9])

We can also select only certain elements from the array, by filtering on a boolean mask that applies if a condition is True.

In [None]:
# create values from 0 to 10 (= up to 9 included)
x = np.arange(10)
# mask all values in x that are greather than 5
mask = x >= 5
# print the mask : False for values 0 to 4 included, True for 5 and up
mask

array([False, False, False, False, False,  True,  True,  True,  True,
        True])

In [None]:
# apply the mask to x itself
x[mask]

array([5, 6, 7, 8, 9])

## Exercises

**Exercise 1.** Create a 3d array of shape (2, 2, 2), containing 8 values. Access individual elements and slices.

In [None]:
a = np.array([[[1, 2],
               [3, 4]],
              [[5, 6],
               [7, 8]]])
a

array([[[1, 2],
        [3, 4]],

       [[5, 6],
        [7, 8]]])

In [None]:
a.shape

(2, 2, 2)

Retrieve values `1` and `5`.

array([1, 5])

**Exercise 2.** Rewrite the relu function (see Python section) using [np.maximum](https://numpy.org/doc/stable/reference/generated/numpy.maximum.html). Check that it works on both a single value and on an array of values.

> **HINT** `0` is greater than any negative integer, but smaller than any positive integer.



In [None]:
def relu_numpy(x):
  # write your function here
  return

In [None]:
# should output 0
relu_numpy(-3)

In [None]:
# should output 2
relu_numpy(2)

In [None]:
# should output [1, 0, 2.5]
relu_numpy(np.array([1, -3, 2.5]))

**Exercise 3.** Rewrite the average of a vector (1d array) using NumPy (without for loop), with [np.sum](https://numpy.org/doc/stable/reference/generated/numpy.sum.html) and the shape of the array.

In [None]:
def average_numpy(vector):
  # write your function here
  return np.sum(vector) / vector.shape

In [None]:
my_vector = np.array([12, 11, 9, 5, 7])
# the result should be 8.8
average_numpy(my_vector)

array([8.8])

Compare with what you get with [np.mean](https://numpy.org/doc/stable/reference/generated/numpy.mean.html)

## Going further

* [Scientific Computing in Python: Introduction to NumPy and Matplotlib](https://sebastianraschka.com/blog/2020/numpy-intro.html)
* [NumPy reference](https://numpy.org/doc/stable/reference/)
* [SciPy lectures](https://scipy-lectures.org/)
* One-hour [tutorial](https://www.youtube.com/watch?v=QUT1VHiLmmI) on Youtube