EPITA 2024 MLRF practice_01-02_numpy v2024-04-18_224843 by Joseph CHAZALON

<div style="overflow: auto; padding: 10px; margin: 10px 0px">
<img alt="Creative Commons License" src='img/CC-BY-4.0.png' style='float: left; margin-right: 20px'>
    
This work is licensed under a [Creative Commons Attribution 4.0 International License](http://creativecommons.org/licenses/by/4.0/).
</div>

# Practice session 1 part 2: NumPy-Fu

Make sure you read and understand everything, and complete all the required actions.
**Required actions** are preceded by the following sign:
![Back to work!](img/work.png)

## Preliminary checks

Perform a couple checks…

In [None]:
# deactivate buggy jupyter completion
%config Completer.use_jedi = False

In [None]:
# Make sure we use Python 3
import sys
if sys.version_info.major != 3:
    print("ERROR: not using Python 3.x")
else:
    print("Great! We're using Python version %s" % sys.version)

## Import the required modules
Notice the **line magic** used to configure how matplotlib output is rendered.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline


## NumPy crash course
NumPy allows you to manipulate n-dimensional arrays (representing matrices, tensors, images…) with a very simple syntax.

### Array creation (1/3)
Here are some examples of array creation:

In [None]:
# Initialize from a sequence
a1 = np.array([1, 2, 3])
a1

In [None]:
# Array have arbitrary dimensions…
a2 = np.array([[[ 0,  1], [ 2,  3], [ 4,  5]],
               [[ 6,  7], [ 8,  9], [10, 11]]])
a2

In [None]:
#…but they need to be consistent
a3 = np.array([[[ 0,  1], [ 2,  3], [ 4,  5]],
               [[ 6,  7], [ 8,  9], [10, 11, 13]]])
a3

<div style="overflow: auto; border-style: solid; border-color: red; border-width: 1px; padding: 10px; margin: 10px 0px">
<img alt="stop" src='img/stop.png' style='float: left; margin-right: 20px'>
    
The previous object was created but contains very strange content…
</div>

### `shape` and `dtype`
Shape and content (data) type are two very important properties to check for arrays.

In [None]:
a1.shape, a1.dtype

In [None]:
a2.shape, a2.dtype

In [None]:
a3.shape, a3.dtype

### Array creation (2/3)

<div style="overflow: auto; border-style: dotted; border-width: 1px; padding: 10px; margin: 10px 0px">
<img alt="work" src='img/work.png' style='float: left; margin-right: 20px'>

**Now try to create some arrays of different types (integers, floating-point numbers, booleans, complex numbers) and shapes.**
</div>

Do not hesitate to check:
- the [online refence](https://docs.scipy.org/doc/numpy-1.16.1/reference/)
- and the [user guide](https://docs.scipy.org/doc/numpy-1.16.1/user/basics.html)

In [None]:
# TODO create a couple of arrays

<div style="overflow: auto; border-style: dotted; border-width: 1px; padding: 10px; margin: 10px 0px">
<img alt="work" src='img/work.png' style='float: left; margin-right: 20px'>
    
**Now check the [documentation about array creation](https://docs.scipy.org/doc/numpy-1.16.1/reference/routines.array-creation.html) and try some other array creation routines.**
</div>

We recommand that you have a look at:
- `zeros`
- `zeros_like`
- `ones`
- `full`
- `empty`
- `eye`

In [None]:
test_shape = (2, 2)

In [None]:
np.zeros(test_shape)

In [None]:
# TODO try the other array creation routines

A very important thing to note with NumPy is that native routines make use of optimized C code which is orders of magnitude faster than Python loops.

**You should always try to avoid writing Python loops to access NumPy arrays, and you should rather try to find a native routine which does the task you are looking for.**

<div style="overflow: auto; border-style: dotted; border-width: 1px; padding: 10px; margin: 10px 0px">
<img alt="work" src='img/work.png' style='float: left; margin-right: 20px'>
    
**Benchmark the initialization time of some big array using a native routine vs using a `for` loop.**
**Make sure you understand the differences between `%time`, `%%time`, `%timeit` and `%%timeit`.**
</div>

In [None]:
# TODO manual initialization: complete this code
size = 1024*1024
a = np.empty(size)
# for ii in range…
# 

In [None]:
# TODO numpy creation and optimized initialization
a = np.empty(size)
# a[?] = ?

In [None]:
# TODO numpy optimized creation and initialization
# a = ??

### Array creation (3/3)
There are other very useful array creation routines to be aware of.
Among my favorites are `arange` and `linspace`.

<div style="overflow: auto; border-style: dotted; border-width: 1px; padding: 10px; margin: 10px 0px">
<img alt="work" src='img/work.png' style='float: left; margin-right: 20px'>
    
**Use iPython's magic `?` to display the documentation for each of those, and create two small arrays.**
</div>

In [None]:
# TODO
np.arange?

### Reshaping
It is easy to change the shape of an array, as long as the new shape is compatible with the original one.

In [None]:
a = np.arange(12)
a.shape

<div style="overflow: auto; border-style: dotted; border-width: 1px; padding: 10px; margin: 10px 0px">
<img alt="work" src='img/work.png' style='float: left; margin-right: 20px'>
    
**Use `reshape(shape)` to give a new shape with 3 dimensions to this array `a`.**
</div>

In [None]:
# TODO reshape

### Apply operations on arrays
All the power of NumPy lies in how we apply operations on arrays.
We can apply operations in 3 different ways:

1. First as **array methods** like this:
```python
a = np.arange(3)
a.max()
```
This technique is useful for operation which consider only the current array.


2. Second by calling a NumPy operation on the array like this:
```python
a = np.linspace(0, 1, 10)
np.cos(a)
```
This second technique is more suitable for mathematical operations which are not directly available as methods, and return an array of the same shape.

3. Third simply by calling natural operations extended to arrays like this:
```python
a = np.arange(0, 3)
b = np.arange(3, 6)
a + b
```

<div style="overflow: auto; border-style: dotted; border-width: 1px; padding: 10px; margin: 10px 0px">
<img alt="work" src='img/work.png' style='float: left; margin-right: 20px'>
    
**Experiment a couple of operations on arrays.**
</div>

In [None]:
# TODO some operations on arrays

### Indexing: access elements
You can also access individual values of arrays using advanced slicing techniques:

In [None]:
aa = np.arange(3*2).reshape((3, 2))
aa

In [None]:
aa[0]

In [None]:
aa[0][1]

We can specify slices for each dimension.

In [None]:
aa[0,1]

In [None]:
aa[1:3]

In [None]:
aa[:,1]

In [None]:
aa[::2,::-1]

We can select multiple values using sequences of indexes, mixing [basic](https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html#basic-slicing-and-indexing) and [advanced](https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html#advanced-indexing) slicing and indexing.

In [None]:
aa

In [None]:
aa[(0,0)]  # equivalent to aa[0,0]

In [None]:
aa[(1,1,2), 0]  # equivalent to aa[(1, 1, 2), (0, 0, 0)] because of broadcast
                # selects aa[1,0], aa[1,0], aa[2,0]

We can even add new axis on the fly:

In [None]:
bb = aa[:, 0, np.newaxis]
bb.shape

Note that `np.newaxis` is actually `None`, so you it is common to use `None` directly.

In [None]:
np.newaxis

In [None]:
bb = aa[:, 0, None]
bb.shape

And you can create **masks** and apply them. This is **very powerful!**

In [None]:
aa = np.arange(10)
mask = aa > 5
mask

In [None]:
aa[mask]

<div style="overflow: auto; border-style: dotted; border-width: 1px; padding: 10px; margin: 10px 0px">
<img alt="work" src='img/work.png' style='float: left; margin-right: 20px'>
    
**Try to extract even numbers in the following `a` array.**
</div>

In [None]:
a = np.array([[1, 0, 2], [3, 7, 9], [1, 0, 2], [3, 7, 9], [3, 7, 9]])
a

In [None]:
# TODO correct this line
a_extracted = a[:]

In [None]:
# Here is a test to check your result
if np.all(a_extracted % 2 == 0):
    print("Looks good!")
else:
    print("Error.")

Make sure to read at least once in your life (no during this session though) [the page about NumPy indexing](https://docs.scipy.org/doc/numpy-1.16.1/user/basics.indexing.html).

### Broadcasting
Broadcasting is a very powerful concept in NumPy, and maybe its greatest strength.
However, it takes times to master it and even then you sometimes get surprised.


According to [the official documentation](https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html):
> The term broadcasting describes how numpy treats arrays with different shapes during arithmetic operations. Subject to certain constraints, the smaller array is “broadcast” across the larger array so that they have compatible shapes. Broadcasting provides a means of vectorizing array operations so that looping occurs in C instead of Python.

It is easy to make use of broadcasting:
- if you add two arrays, then you broadcast their values;
- if you multiply an array by a scalar, you broadcast again;
- and if you want to apply a look up table to an array, then you can perform it using broadcasting.
Of course, those are only a few examples of what broadcasting makes possible.

Let's have a look at some examples now.

First NumPy operations are usually done element-by-element which requires two arrays to have exactly the same shape:

In [None]:
a = np.array([1, 2, 3])
b = np.array([2, 2, 2])
a * b

NumPy’s broadcasting rule relaxes this constraint when the arrays’ shapes meet certain constraints. The simplest broadcasting example occurs when an array and a scalar value are combined in an operation:

In [None]:
a = np.array([1,2,3])
b = 2
a * b

The broadcasting applied in the previous example virtually "streches" `b` to match `a`'s shape.
This can be illustrated by the following figure:
![numpy broadcasting 1](img/practice_01/theory.broadcast_1.gif)

The rule governing whether two arrays have compatible shapes for broadcasting can be expressed in a single sentence.

> **The Broadcasting Rule:**
>
> **In order to broadcast, the size of the trailing axes for both arrays in an operation must either be the same size or one of them must be one.**

Here are more examples (taken from the documentation, again):

In [None]:
a = np.array([[ 0,  0,  0],
              [10, 10, 10],
              [20, 20, 20],
              [30, 30, 30]])
b = np.array([0, 1, 2])
a + b

A two dimensional array multiplied by a one dimensional array results in broadcasting if number of 1-d array elements matches the number of 2-d array columns.
![numpy broadcast2](img/practice_01/theory.broadcast_2.gif)

However, when the trailing dimensions of the arrays are unequal, broadcasting fails because it is impossible to align the values in the rows of the 1st array with the elements of the 2nd arrays for element-by-element addition.
![numpy broadcast fail](img/practice_01/theory.broadcast_3.gif)

In [None]:
a = np.array([[ 0,  0,  0],
              [10, 10, 10],
              [20, 20, 20],
              [30, 30, 30]])
b = np.array([0, 1, 2, 3])
a + b

The following example shows an outer addition operation of two 1-d arrays that produces the same result as the previous (working) example.
Here the `newaxis` index operator inserts a new axis into `a`, making it a two-dimensional 4x1 array.

In [None]:
a = np.array([0.0, 10.0, 20.0, 30.0])
b = np.array([1.0, 2.0, 3.0])
a[:, np.newaxis] + b

The following figure illustrates the stretching of both arrays to produce the desired 4x3 output array.
![numpy broadcast 4](img/practice_01/theory.broadcast_4.gif)

<div style="overflow: auto; border-style: dotted; border-width: 1px; padding: 10px; margin: 10px 0px">
<img alt="work" src='img/work.png' style='float: left; margin-right: 20px'>

**Display the shape of a when we add it a new axis like in the previous example.**
</div>

In [None]:
# TODO display the shape of a when we add it a new axis like in the previous example
a.shape

### Apply an operation along an axis
Most of the aggregation function allow you to specify the axis along which the computation will be performed.
`axis=0` means the first axis, `axis=i` means the $i+1$ axis, **`axis=-1` means the last axis.**

This allows, for example, to compute the warmest month for each city (or the warmest city for each month).

In [None]:
# Some probably buggy stats
data = np.array([
    # January,February,March,April,May,June,July,August,September,October,November,December
    [14,14,16,18,22,25,28,29,26,23,18,15],  # Ajaccio
    [14,14,16,18,22,26,29,29,26,22,17,15],  # Bastia
    [5,7,12,15,20,24,26,26,22,17,10,5],  # Bourg-Saint-Maurice
    [10,11,14,17,21,25,29,28,25,19,14,10],  # Carcassonne
    [6,8,12,15,20,24,27,26,22,17,10,6],  # Grenoble
    [6,8,13,16,21,25,28,27,23,17,11,7],  # Lyon
    [11,13,16,19,23,27,30,30,26,21,15,12],  # Marseille
    [8,10,15,18,22,26,30,29,24,19,12,9],  # Montelimar
    [12,13,16,18,22,26,29,29,25,21,15,12],  # Montpellier
    [13,13,15,17,21,24,27,28,25,21,17,14],  # Nice
    [12,13,16,18,22,26,29,29,25,21,16,13],  # Perpignan
    [13,14,16,18,22,26,30,30,26,21,16,14],  # Toulon
])
months = np.array(["January","February","March","April","May","June","July",
          "August","September","October","November","December"])
cities = np.array(["Ajaccio", "Bastia", "Bourg-Saint-Maurice", "Carcassonne", 
          "Grenoble", "Lyon", "Marseille", "Montelimar", "Montpellier", 
          "Nice", "Perpignan", "Toulon"])

<div style="overflow: auto; border-style: dotted; border-width: 1px; padding: 10px; margin: 10px 0px">
<img alt="work" src='img/work.png' style='float: left; margin-right: 20px'>
    
**Display the warmest month for each city. Use the `argmax` operation on `data` with appropriate `axis` parameter.**
</div>

In [None]:
# TODO use the `argmax` operation on `data`
warmest_months = np.zeros(12, dtype=int)  # FIXME replace this line
warmest_months

In [None]:
list(zip(cities, months[warmest_months]))

<div style="overflow: auto; border-style: dotted; border-width: 1px; padding: 10px; margin: 10px 0px">
<img alt="work" src='img/work.png' style='float: left; margin-right: 20px'>
    
**Display the warmest city for each month. Use the `argmax` operation on `data` with appropriate `axis` parameter.**
</div>

In [None]:
# TODO

###  Gluing arrays together
You can "glue" arrays together as long as their shape is compatible.

In [None]:
a = np.arange(3*2).reshape((3,2))
b = np.arange(3*2).reshape((3,2))

In [None]:
np.hstack((a,b))

In [None]:
np.vstack((a,b))

In [None]:
np.stack((a,b), axis=-1)

### Copies vs views
Array indexing may not copy the memory but returns a view instead.
**In this case, changing the view changes the original array.**
Make sure to make a copy of the original array, or of the view's underlying data, if you do not want to use the same object twice!

The simplest case is when a reference is copied (either during assignment or during a function call).

In [None]:
a = np.array([10, 20, 30])
b = a
b += 1
a

You can use the `copy()` method to perform a deep copy of some array.

<div style="overflow: auto; border-style: dotted; border-width: 1px; padding: 10px; margin: 10px 0px">
<img alt="work" src='img/work.png' style='float: left; margin-right: 20px'>
    
**Use `copy()` to copy `a` values into `b`, then update `b` without changing `a`.**
</div>

In [None]:
# TODO copy a into b instead of creating an extra reference to the same object
a = np.array([10, 20, 30])
b = a
b += 1
a

Slicing an array returns a view of it!

In [None]:
a = np.array([10, 20, 30])
s = a[1:]
s += 1
a

###  Linear algebra and other NumPy tools

Just for the record, NumPy also contains [many linera algebra and other useful routines](https://docs.scipy.org/doc/numpy/reference/routines.html) for statistics, mathematics, random sampling, etc.

You'll discover them progressively.

## Matplotlib survival guide
You can plot data using the simple **stateful** `plt` interface.
You start by creating a figure with
```python
plt.figure()
```
then you plot some data, plots are added to the current figure:
```python
plt.plot([0, 1, 2, 3], [1, 3, 5, 7])
plt.plot([0, 1, 2, 3], [2, 4, 6, 8])
```
and finally you call the rendering function:
```python
plt.show()
```

Here is a more complete example you will be able to reuse:

In [None]:
plt.figure()
plt.plot([0, 1, 2, 3], [1, 3, 5, 7], label='first')
plt.plot([0, 1, 2, 3], [2, 4, 6, 8], label='second')
plt.legend()
plt.title("First figure")
plt.ylabel('some numbers')
plt.xlim(0, 5)
plt.show()

And another one showing two images in two different subfigures.

In [None]:
img1 = plt.imread('img/warning.png')
img2 = plt.imread('img/stop.png')

In [None]:
plt.figure()
plt.subplot(1, 2, 1) # values: total number of rows, total number of columes, index (starting at 1)
plt.imshow(img1)
plt.axis('off')
plt.title("subfig1 title")
plt.subplot(1, 2, 2)
plt.imshow(img2)
plt.axis('on')
plt.show()

Another example with an histogram.

In [None]:
sample_img = plt.imread("img/practice_01/sample_img.png") # matplotlib's imread only supports PNG files
# This is just a numpy array!
plt.figure()
plt.subplot(1, 2, 1)
plt.imshow(sample_img)
plt.axis('off')
plt.title("Image")
plt.subplot(1,2,2)
# numpy ravel() returns a flatten array.
plt.hist(sample_img[..., 0].ravel(), bins=256, fc='r', ec='r', alpha=0.5)
plt.hist(sample_img[..., 1].ravel(), bins=256, fc='g', ec='g', alpha=0.5)
plt.hist(sample_img[..., 2].ravel(), bins=256, fc='b', ec='b', alpha=0.5)
plt.title("Basic color histogram")
plt.show()

There are many possible graph types, and many options to configure colors, legends, markers, to add annotations, etc. You will discover them by practicing and by looking at examples.

Let's just finish this very quick introduction to Matplotlib by pointing out useful resources:
- [Tutorials](https://matplotlib.org/tutorials/index.html) to get the basic concepts;
- [PyPlot examples](https://matplotlib.org/gallery/index.html#pyplots-examples) to copy code samples from;
- [PyPlot API reference](https://matplotlib.org/api/_as_gen/matplotlib.pyplot.html#module-matplotlib.pyplot) for full information.

# Job done!
Great! Now you're ready to move on to the next stage: [Image manipulations](practice_01-03_image-manipulations.ipynb).