# Introduction to the use of notebooks

## Python 3

First things first, for all assignments in this course we will use the programming language *Python*, version 3. Which specific version of 3 you use is not that important, currently the most recent version is 3.11.1 *Python* should already be installed on your *Ubuntu* machine, you can run it by typing

    python3

If it is not installed, just follow the download instructions at [python.org](http://www.python.org). You will also need *pip*, which is a package manager for *Python* with the recursive acronym: *Pip Installs Packages*. Install by typing the following commands on your terminal:

    sudo apt-get install python3-pip
    sudo pip3 install --upgrade pip

## Jupyter notebook

All assignments will be written in [Jupyter notebooks](http://jupyter.org/). These notebooks allow a mixture of nicely formatted text (even supporting $\LaTeX$ for equations), *Python* code, experiment results and graphs, all on the same webpage. In fact, this assignment is also written in a notebook. In order to open it, first install Jupyter  

    sudo pip3 install jupyter

Now in your terminal, navigate to the directory where you placed the files for the assignment and run the command
    
    jupyter notebook

This will print some information about the *Notebook* server in your console, and open a web browser to the URL of the web application. By defaut this is [http://127.0.0.1:8888](http://127.0.0.1:8888). Note that *127.0.0.1* is the home ip-adress, so this is now a webpage running on your own computer!

This first page shows the dashboard, which lists the notebooks available in the current directory. You can create new notebooks from the dashboard with the `New` button (select *Python 3* notebook), or open existing ones. Creating a new notebook will create a new file `Untitled1.ipynb`. The extension `.ipynb` indicates it is a notebook file, you can rename the file to something more descriptive at the top of the page.

### Notebook cells

A notebook consists of a sequence of cells. A cell is a multi-line text input field, and its contents can be executed by pressing `Shift-Enter`, or by clicking the `Run` button at the top of your screen. What exactly this does depends on the type of cell. There are four types of cells: *code cells*, *markdown cells*, *raw cells* and *heading cells*. We will only focus on the first 2; code and markdown. Every cell starts off being a code cell, but its type can be changed by using a dropdown on the toolbar (which will be `Code`, initially).

In a code cell you can write *Python* code. When you run that cell (click on it and press `Shift-Enter`) the code in the cell will run, and the output of the cell will be displayed beneath the cell. Lets try out a very simple code cell below

In [4]:
x = 5
x = x + 2
print(x)

7


This produces the output you might expect, the exact the same result as executing that bit of *Python* code in a terminal. You can modify the contents of the code cell and run it again with `Shift-Enter` to see how the output changes.

Global variables are shared between cells. This means we can still use variables or functions from the first cell in a second cell, like so

In [5]:
y = 2 * x
print(y)

14


Notebooks are expected to be run top to bottom, starting with the first cell and ending with the last. **Failing to run some cells or running cells out of order is likely to result in errors.** For example, if we were to run the second cell before the first has been run the first, we would get an error saying `x` is not defined

### Markdown

*Markdown* is a simple way to format text using some extra symbols like asterisks (`*`) and underscores (`_`). You can do a simple [10 minute tutorial](http://www.markdowntutorial.com) or reference the [CheatSheet](http://commonmark.org/help/) for the available commands.

If you set a notebook cell as a *Markdown* cell, you can write *Markdown* directly in the cell. When you run this cell, the markdown will be formatted to the *rich text*. if you **double-click** the *rich text*, you can go back to editing the markdown code. All these assignment texts are *Markdown* cells and it will be convenient to write longer answers in, instead of using code comments.

# Introduction to Numpy 

First to install numpy, enter the following command in your terminal

    sudo pip3 install numpy

Numpy is a Python library providing a wide variety of operations that can be performed on multi-dimensional arrays. Before you start, we ask you to read the first four sections of this [tutorial](https://www.w3schools.com/python/numpy/numpy_intro.asp) (up to and including the section "Numpy creating arrays"). Note that we already installed numpy, so you can skip this part.

All the information you need for this notebook and the programming assignment, will be sections from the [W3schools tutorial](https://www.w3schools.com/python/numpy/default.asp) and the [official numpy tutorial](https://numpy.org/doc/stable/user/absolute_beginners.html). This is not an extensive introduction to numpy but it includes everything that you will need for the assignments for this course. 

### TODO 1

Define and print a numpy array `matrix_0` representing the matrix
$$
\begin{bmatrix} 2 & 4 & 6 \\ 7& 3 & 5  \end{bmatrix}
$$
After reading the section "Numpy Array Shape" from the W3schools tutorial (or the [section](https://numpy.org/doc/stable/user/absolute_beginners.html#how-do-you-know-the-shape-and-size-of-an-array) from the official numpy tutorial about shapes), print also the shape of `matrix_0` and check that it indeed corresponds to a matrix with $2$ rows and $3$ columns.

In [6]:
import numpy as np

# TODO

assert np.shape(matrix_0) == (2,3), 'Incorrect shape for the matrix'

NameError: name 'matrix_0' is not defined

### TODO 2

Print the first row of `matrix_0` and print the second column of `matrix_0` after reading ["Numpy Array Indexing"](https://www.w3schools.com/python/numpy/numpy_array_indexing.asp) and ["Numpy Array Slicing"](https://www.w3schools.com/python/numpy/numpy_array_slicing.asp) (from the W3schools tutorial).

### TODO 3

Before making this exercise, you should first read the [section](https://numpy.org/doc/stable/user/absolute_beginners.html#how-to-create-an-array-from-existing-data) about `vstack` and `hstack` and the [section](https://numpy.org/doc/stable/user/absolute_beginners.html#how-to-create-a-basic-array) about `np.zeros`. Now you should use `np.vstack` and `np.zeros` to make a new numpy array `matrix_1` obtained by adding a new row to `matrix_0`. This new row should be filled with $0$'s. So `matrix_1` should represent the matrix $$\begin{bmatrix} 2 & 4 & 6 \\ 7 & 3 & 5 \\ 0 & 0 & 0\end{bmatrix}$$.

Make sure to print `matrix_1` to check if your answer is correct.

As you can see in the section ["Numpy Creating Arrays"](https://www.w3schools.com/python/numpy/numpy_creating_arrays.asp) from the W3schools tutorial, numpy arrays can have different dimensions: there are 1D arrays, 2D arrays, 3D arrays, etc. The arrays that we built up to now, `matrix_0` and `matrix_1` are 2D arrays. In general, matrices correspond to 2D array. If you print the shape of a 2D array, it will always be of the form `(M,N)`, where `M` is the number of rows and `N` is the number of columns.

But if we start representing, row vectors or colum vectors, then things get a bit more complicated. A row column, for example $v=[4,1,9]$, can be represented in two ways:
* either as a 1D array: `v = np.array([4,1,9])`, and in this case, the shape of the array is of the form `(N,)`
* or as a 2D array: `v = np.array([[4,1,9]])`, and in this case, the shape of the array is of the form `(1,N)`

Column vectors, however, would correspond to 2D arrays. So for example, $u = [3,4]^T$ is represented by the 2D array `np.array([[3],[4]])`. Note that the shape of an array representing a column vector is always of the form `(N,1)`.

### TODO 4

* Define and print a 1D numpy array `vec_0` representing the row vector $\vec{v} = [4,1,9]$. Print the shape of `vec_0` to make sure that `vec_0` is indeed a 1D array. 

* By reshaping `vec_0` (you need firs to read the [section](https://numpy.org/doc/stable/user/absolute_beginners.html#can-you-reshape-an-array) about reshaping arrays), define a 2D array `vec_1` representing $\vec{v}$.

* Finally, define a 2D array representing the column vector $[4,1,9]^T$. 

In [None]:

assert np.shape(vec_0) == (3,), 'vec_0 is not of the right shape'
assert np.shape(vec_1) == (1,3), 'vec_1 is not of the right shape'
assert np.shape(vec_2) == (3,1), 'vec_2 is not of the right shape'

### TODO 5

Now that you are familiar with `np.hstack` and `np.vstack`, try to think about the following commands and determine (without running them) which of them will work or which ones will give rise to an error.

* `np.hstack((matrix_1, vec_0))`
* `np.hstack((matrix_1, vec_1))`
* `np.hstack((matrix_1, vec_2))`

You can check your answer by running the commands.

And what about these commands?

* `np.vstack((matrix_1, vec_0))`
* `np.vstack((matrix_1, vec_1))`
* `np.vstack((matrix_1, vec_2))`

You can again check your answer by running the commands.

### Matrices

A wide range of operations from linear algebra are already implemented for us in numpy. For example,

* `np.matmul(A,B)` returns the matrix multiplication of `A` and `B`. You can also direclty use the shorthand `A@B`

* `np.linalg.det(A)` returns the determinant of `A`

* `A.T` is the transpose of `A`

* `np.identity(n)` returns the numpy array representing the identity matrix of size $n \times n$

* `np.linalg.inv(A)` returns the inverse of `A`


* etc.

# Introduction to Matplotlib

Matplotlib is a plotting library for *Python*. Install it using

    sudo pip3 install matplotlib

Another library we will need is pandas, a common library for data analysis. You can install it using

    sudo pip3 install pandas
    
One of the most used compontents of matplotlib is the `matplotlib.pyplot` module. You can read this quick [tutorial](https://matplotlib.org/stable/tutorials/introductory/pyplot.html#sphx-glr-tutorials-introductory-pyplot-py), but we will cover here the most basic features that you will need for the coming labs.

The most basic command is the command to plot a line

    plt.plot([1,2,3,4])

The `plot` function is the core function of the library. The above function call
plots the 4 y-values with implicit x-values of `range(4)`. Usually you would
use the function as `plt.plot(xvalues, yvalues, label="some_text")`. In this
case you must make sure that `xvalues` and `yvalues` are the same size (as the
first x-value will be plotted with the first y-value, and the second with the
second, etc.). The full documentation for the `plot` function can be found here:
[Plot Documentation](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.html?highlight=pyplot#module-matplotlib.pyplot)

And finally, you should not forget to show your plot

    plt.show()

This tells matplotlib that you are done adding lines, labels and other stuff to your figure. A simple example to test with can be found below.

In [None]:
import matplotlib.pyplot as plt

xvalues = range(4)
yvalues = [x**2 for x in xvalues]
plt.plot(xvalues, yvalues)
plt.show()

### Making things look a little nicer

The line above doesn't look very smooth, because we are using `range` for the *x-values*, meaning the samples will be taking the whole integer values and thus there are "large" gaps between the points. The easiest way to solve this, is simply with more samples, however range does not work so well with floating point values. The *Numpy* library has a lot useful functions, and one of those is *linspace*, which solves exactly this problem. For example, `numpy.linspace(0, 3, num=100)` creates an array with `100` values that are evenly spaced in a specified interval (in this case, the interval `[0,3]`). You can read the documentation of the function here: [linspace documenation](https://docs.scipy.org/doc/numpy/reference/generated/numpy.linspace.html)

We can use *linspace* to take any number of samples evenly spaced over a specified range. Here we will take 100 samples over the same *[0, 3]* interval, and then apply the same square function. 

In [None]:
xvalues = np.linspace(0, 3, num=100)
yvalues = [x**2 for x in xvalues]
plt.plot(xvalues, yvalues)
plt.show()

In order to define `yvalues`, we constructed a list applying the function $x^2$, element by element, to the samples in `xvalues`. A more succint way to do this is to apply the function directly to the entire list. So we could have written `yvalues = xvalues**2` and obtained the same result. Note that this operation `xvalues**2` only works because `xvalues` is a numpy array (e.g. it would not have worked if `xvalues` would be a list), which is one of the nice thing of numpy arrays. 
​
We wil plot now the function $\sin$ over the interval $[-\pi, \pi]$. So we would like to write `yvalues = math.sin(xvalues)`. The problem is that `xvalues` is now a numpy array and `math.sin` only accepts a single integer or float as input. Numpy actually also has a function `np.sin` that allows us to apply the sinus function over a numpy array. 

In [None]:
import math

length_interval = 2*math.pi

xvalues = np.linspace(-length_interval/2, length_interval/2, 256)
yvalues = np.sin(xvalues)
plt.plot(xvalues, yvalues, 
         label = f'Sinus over an interval of length {length_interval:.2f}')
plt.legend(loc='upper left')
plt.show()

Note that we also used the optional argument `label`. In that label, we wanted to insert the length of the interval and used the python string formatting syntax. Here we specifically used it to round $\pi$ with precision $0.01$ (using `:.2f`).

*(note: this introduction to matplotlib was originally designed for the course Leren from the bachelor KI, but is being reused here)*

### TODO 6: Plotting a line

Plot a line connecting the points $(2,1)$ and $(3,4)$.

In [None]:
# YOUR CODE HERE

### TODO 7

Plot the function $x^2 - 3 x^3 + x^4$ over the interval $[0,3]$ using `linspace`.

In [None]:
# YOUR CODE HERE

### Plotting a dataset

Sometimes instead of plotting a curve, you may just want to plot point(s) (for example, if you are plotting a dataset). Here is an example where we plot two points,  one at position $(2, 1)$ and one in position $(3, 4)$. Note that the only difference with plotting a line passing through these two points is the optional argument `'o'`. The marker `o` indicates that our points are represented by a circle. Try to replace `'o'` by `'s'` (the marker will become a square) or `'r.'` (the marker will be a small red circle).

In [None]:
plt.plot([2,3], [1,4], 'o')
plt.show()

plt.plot([2,3], [1,4], 's')
plt.show()

plt.plot([2,3], [1,4], 'r.')
plt.show()

We will now plot datasets for two different populations. Those datasets come from the famous [iris dataset](https:////en.wikipedia.org/wiki/Iris_flower_data_set). In the file `iris_cetosa.csv`, you can find the measurements for $50$ different iris from the species *cetosa*. Each iris is characterized by four features: `sepal_length`, `sepal_width`, `petal_length` and `petal_width`. Similarly, in the file `iris_versicolour.csv`, you can find the measurements of the same features for $50$ iris from the species *versicolour*.

Using those csv files, we built two numpy arrays:
* `setosa_np` is a numpy array where each row contains the petal length and the petal width of an iris  from the species cetosa

* `versicolour_np` is a numpy array where each row contains the petal length and the petal width of an iris  from the species *versicolour*

We suggest you to print those two arrays to make sure you understand them.

### TODO 8

We ask you to plot, with the length on the $x$-axis and the width on the $y$-axis, the data from `setosa_np` in red (since `setosa_np` has $50$ rows, you should plot $50$ red dots, where each dot indicates the petal length and the petal width of a cetosa iris) and similarly, the data from `versicolour_np` in blue.

Note: if you would like to understand the code used to build `setosa_np` and `versicolour_np`, you can read the section "Importing and exporting a csv" in the numpy tutorial.

In [None]:
import pandas as pd

setosa_np = pd.read_csv('iris_setosa.csv', usecols=['petal_length', 'petal_width']).values
versicolour_np = pd.read_csv('iris_versicolour.csv', usecols=['petal_length', 'petal_width']).values

# YOUR CODE HERE

If your code is correct, your dataset should look like this:

![Screenshot%20from%202023-02-08%2022-11-58.png](attachment:Screenshot%20from%202023-02-08%2022-11-58.png)