## Trying out Jupyter Notebooks with basic relevant Python

Let's get acquainted with Python (3) and interactive Jupyter Notebooks (ipynb), our main working tools throughout the course.

First of all, in an `ipynb` file we distinguish:

- text chunks: where we write instructions, comments, explanations (like this one!)
- code chunks: where we write Python instructions to be executed

As for `Python`, we will mainly use three fundamental libraries:

1. `numpy`: to work with arrays (vectors, matrices)
2. `pandas`: to work with dataframes (reading and manipulating datasets)
3. `matplotlib`: to visualize data and results (making plots)

In addition, we'll use `Keras` for deep learning: we'll see more on this later.

This is not a course on Python, not even an introduction: just a few notes to get some basic acquaintance with these libraries.

# How to proceed?
Tell us your preference!

### Numpy

We start by importing the library and creating a vector (one-dimensional array)

In [None]:
## import libraries
import numpy as np

arr_1d = np.array([1, 2, 3, 4, 5, 0, 7, 1, 9])
print(arr_1d)

We can use the array attribute `shape` to get the dimensions of the array

In [None]:
arr_1d.shape

We see that by default `numpy` does not specify the second dimension for one-dimensional arrays (vectors). If you want to make this explicit (which may turn out to be helpful for some array operations and to avoid ambiguous results), you can use the **function** `reshape()`:

In [None]:
arr_1d = arr_1d.reshape(len(arr_1d),1)
print(arr_1d.shape)

We can also build arrays with more dimensions (matrices, "tensors"): in deep learning, mastering the dimensions of tensors (multidimensional arrays) is very important!

In [None]:
arr_2d = np.array([[1, 2, 3], [4, 5, 6]])
print(arr_2d)
print(arr_2d.shape) ## 2 rows, 3 columns

In [None]:
arr_3d = np.array([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]])
print(arr_3d)
print(arr_3d.shape) ## 2 layers, 2 row, 3 columns

In [None]:
value = arr_3d[0,1,2] ## get first layer, second row, third element (indexing in Python starts at 0!)
print(value)

- use array indices for slicing: `[start:end]`
- a step argument can be added: `[start:end:step]`

In [None]:
print(arr_1d[:]) ## entire array
print(arr_1d[3:6]) ## elements 4, 5 and 6 of the array
print(arr_1d[:3]) ## first three elements of the array --> equivalent to arr_1d[0:3] 
print(arr_1d[7:]) ## last two elements of the array --> equivalent to arr_1d[7:len(arr_1d)] 

More on `numpy` can be found for instance <a href='https://numpy.org/devdocs/user/quickstart.html'>here</a>

### Pandas

Python library to work with tabular data (multidimensional), in the form of mainly **dataframes**.

We load the library and create a first dataframe:

In [None]:
import pandas as pd

df = pd.DataFrame(np.random.randn(8, 4), columns=list('ABCD'))
df

With the dataframe attribute `dtypes` we can get the data type of each column in a Pandas dataframe:

In [None]:
df.dtypes

The Pandas function `head()` will show the first few rows of the dataframe; the attribute `columns` returns the column names.

In [None]:
df.head()

In [None]:
df.columns

With the function `to_numpy()` we convert a dataframe to an array; then we can slice it as we already saw for numpy arrays

In [None]:
arr = df.to_numpy()
arr

In [None]:
arr[:,1] ## get second column

You can also slice directly the Pandas dataframe:

1. by column name

In [None]:
df['A']

2. by slicing by rows

In [None]:
df[0:2]

3. by row and/or column names using the `.loc` syntax

In [None]:
print(df.loc[df.index.values[0]])
df.loc[:, ['B','C']]

4. by position using the `.iloc` syntax

In [None]:
df.iloc[0:2,2:4]

More on `pandas` <a href='https://pandas.pydata.org/pandas-docs/stable/getting_started/tutorials.html'>here</a>

### Matplotlib

`matplotlib` is a versatile Python library for data visualization which allows you to produce a very large variety of high-quality plots.
We import the module `pyplot` from the library and produce a first basic plot:

In [None]:
from matplotlib import pyplot as plt

x = np.array([1, 2, 3, 4])
y = np.array([1, 4, 9, 16])
plt.xlabel('x-axis')
plt.ylabel('y-axis')
plt.plot(x,y)
plt.show() ## not strictly necessary in interactive mode (as ipython or jupyter notebooks)

Instead of a diagram (line plot), we can also make a **scatter plot**:

In [None]:
x = np.array([1, 2, 3, 4, 0, 2, 7, 4, 10])
y = np.array([1, 4, 9, 16, 1, 3, 9, 8, 15])
plt.scatter(x,y)

For categorical data, a **barplot** can be used:

In [None]:
names = ['A', 'B', 'c']
values = [1, 10, 100]

plt.figure(figsize=(9, 3))

plt.plot(131)
plt.bar(names, values)
plt.show()

We can plot a distribution from synthetic data using a **histogram**:

In [None]:
mu, sigma = 100, 15
x = mu + sigma * np.random.randn(10000)

# the histogram of the data
plt.hist(x, 50, density=1, facecolor='g', alpha=0.75)

plt.xlabel('Smarts')
plt.ylabel('Probability')
plt.title('Histogram of IQ')
plt.text(60, .025, r'$\mu=100,\ \sigma=15$')
plt.axis([40, 160, 0, 0.03])
plt.grid(True)
plt.show()

You can also plot directly from a Pandas dataframe:

In [None]:
df = pd.DataFrame(np.random.randn(1000, 4), columns=list('ABCD'))
df

In [None]:
plt.plot(df)
plt.show()

In [None]:
df = df.cumsum() ## cumulative sums per columns
df

In [None]:
plt.figure()
df.plot()

In Pandas, plot functions can be called directly on dataframes, e.g. using the function `plot.bar()` (we saw an example earlier with `df.plot()`)

In [None]:
df = pd.DataFrame(np.random.rand(10, 4), columns=['a', 'b', 'c', 'd'])
df.plot.bar();

More on `matplotlib` <a href='https://matplotlib.org/tutorials/introductory/pyplot.html'>here</a>

## Exercise 0.1: do-it-yourself!

Now it is your turn to put together what you just learnt:

1. create a numpy array or a pandas dataframe (or a combination of both)
2. plot the data

In [None]:
## your code here