# Numpy Basics

Get familiar with the Numpy library, manipulate arrays and apply some basic operations 

You can use the [Numpy Cheat Sheet](https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Numpy_Python_Cheat_Sheet.pdf)

## Introduction to Numpy 

Import numpy library

In [3]:
import numpy as np

---

Use the method array to create 3 different arrays (A,B,C) of different dimmensions 1D, 2D, 3D. They should look like this

```
A = array([1, 2, 3])

B = array([[1, 2, 3],
           [3, 4, 5]])
           
C = array([[[ 1.,  2.,  3.],
            [ 3.,  4.,  5.]],

           [[ 6.,  7.,  8.],
            [ 9., 10., 11.]]])
```

use the method ndim and shape to verify the dimensions. Then, the methods size and dtype

For A, you should obtain, dimension : 1 , shape (3,), size: 3, dtype: int64

For C, you should obtain, dimension : 3 , shape (2,2,3), size: 12, dtype: float64

In [None]:
A = np.array([1, 2, 3])
print(A)
print(A.ndim) # nombre de dimensions
print(A.shape) # dimensions du tableau
print(A.dtype)
print(A.size) # nombre total d'éléments

print("------")

C = np.array([[[ 1.,  2.,  3.],
            [ 3.,  4.,  5.]],

           [[ 6.,  7.,  8.],
            [ 9., 10., 11.]]])
print(C.ndim)
print(C.shape)
print(C.dtype)
print(C.size)


With your own words, can you explain why A has a shape of (3,) and not (3,1)? 

In [3]:
# Car il s'agit d'un array 1D, donc il n'y aura qu'un seul élément

How do you change the shape of A to (3,1)? what is the difference?

In [None]:
Abis = np.array([[1], [2], [3]])
Abis_reshaped = Abis.reshape(-1)
print(Abis)
print (Abis_reshaped)
print(Abis.shape)
print(Abis.ndim)

---

Let's test other methods to create numpy arrays. 

1- Create a null vector of size 8

```
array([0., 0., 0., 0., 0., 0., 0., 0.])
```

In [None]:
null_vector = np.zeros(8) # n'affiche que des zeros

null_vector

2- Create a vector with only one of shape (4,2)

```
array([[1., 1.],
       [1., 1.],
       [1., 1.],
       [1., 1.]])
```

In [None]:
one_vector = np.ones((4, 2)) # n'affiche que des un
one_vector

3 - Create a vector with values from 2 to 18 (without using np.array)

```
array([ 2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17])
```

In [None]:
vector_2_to_18 = np.arange(2, 18) # afficher de 2 à 17 dans l'ordre

vector_2_to_18

4- Create a vector with random numbers between 0 and 1 of shape (2,3,4)

This should look like this (values can change, we are in random :) )

```
array([[[0.32917728, 0.07280774, 0.48285449, 0.82583863],
        [0.94646022, 0.05935708, 0.38376227, 0.93222678],
        [0.51835433, 0.19230094, 0.85057795, 0.11723047]],

       [[0.94304701, 0.9026631 , 0.1288193 , 0.07662968],
        [0.72708038, 0.21875513, 0.21803179, 0.27423456],
        [0.48781086, 0.91235569, 0.85295078, 0.96024868]]])
```

In [None]:
random_vector = np.random.rand(2, 3, 4) # 2 tableaux, 3 lignes, 4 colonnes
random_vector

5- Create a vector with random integer numbers between -100 and 100, shape (2,10)

This should look like this (values can change, we are in random :) )

```
array([[  82,   -3,  -60,   77,   57,   16,   40,   21,  -57,   53],
       [ -43,  -50, -100,   18,   67,    4,   69,   34,   54,  -54]])
```

In [None]:
random_int_vector = np.random.randint(-100, 101, size=(2, 10)) # affiche de -100 à 100, aura 2 lignes et 10 colonnes
random_int_vector

 Create a identity matrix of shape (3,3)
 
 ```
 array([[1, 0, 0],
       [0, 1, 0],
       [0, 0, 1]])
```

In [None]:
matrix = np.eye(3) #tableau de 3x3
matrix

Bonus: create the following checkboard array of shape (6,6)

```
array([[1, 0, 1, 0, 1, 0],
       [0, 1, 0, 1, 0, 1],
       [1, 0, 1, 0, 1, 0],
       [0, 1, 0, 1, 0, 1],
       [1, 0, 1, 0, 1, 0],
       [0, 1, 0, 1, 0, 1]])
```
Hint: Have a look at numpy.tile

<details>
    <summary>  </summary>
    <p>Have a look at <a href="https://numpy.org/devdocs/reference/generated/numpy.tile.html"><code>numpy.tile</code></a></p>
</details>

In [None]:
checkboard = np.zeros((6, 6), dtype=int)

checkboard[1::2, ::2] = 1 # remplit de 1, selectionne 2eme ligne et se répète toutes les 2 lignes, remplit une case sur 2
checkboard[::2, 1::2] = 1 # pareil pour les lignes paires

checkboard

In [None]:
checkboard_tile = np.tile(checkboard, 1)
checkboard_tile

## Array manipulation

Compute the sum of two matrices (A and B). Then, store the result into a variable ab_sum.

```
A = [[ 1 -3]
     [ 0 -6]
     [ 4  2]]
     
B = [[ 0  2]
     [ 2 -2]
     [10 20]]
```
A+B = ??

In [None]:
A = np.array([[1, -3],
              [0, -6],
              [4,  2]])

B = np.array([[0,  2],
              [2, -2],
              [10, 20]])

ab_sum = A + B

ab_sum

Repeat the sum of A and B using np.add() ... then try np.subtract()

In [None]:
A = np.array([[1, -3],
              [0, -6],
              [4,  2]])

B = np.array([[0,  2],
              [2, -2],
              [10, 20]])

ab_sum = np.add(A, B)
ab_sum

In [None]:
A = np.array([[1, -3],
              [0, -6],
              [4,  2]])

B = np.array([[0,  2],
              [2, -2],
              [10, 20]])

ab_sum = np.subtract(A, B)
ab_sum

Given the following matrix. apply the method sum to obtain the sum of rows, then the sum of columns and sum of all elements in the array

```
S = array([[1, 2],
           [3, 4],
           [5, 6]])
```

sum of rows 
```
[3
 7
 11]
```

sum of columns
```
[ 9, 12]
```

sum of all : 21

In [16]:
S = np.array([[1, 2],
           [3, 4],
           [5, 6]])

In [None]:
sum_rows = S.sum(axis=1)
sum_rows

In [None]:
sum_columns = S.sum(axis=0)
sum_columns

In [None]:
sum_all = S.sum()
sum_all

In [None]:
sum_rows, sum_columns, sum_all

How would you perform the sum of rows and columns using index slicing?

Rappel: slicing is the process to select rows and columns based on its index

general formula: 
```
A[rows,columns] 
A[start:stop:step, start:stop:step]
```

A[:,:] will slice all the rows and columns. A[:,1] will slice all rows in column 1

remember, index starts at 0 and stops and (n-1), with n being the size of row or column

Ex: the index of this 1D matrix [1,2,3,4] are 0,1,2,3


In [None]:
sum_row1 = S[0, :][0] + S[0, :][1]
sum_row2 = S[1, :][0] + S[1, :][1]
sum_row3 = S[2, :][0] + S[2, :][1]

sum_row_total = sum_row1 + sum_row2 + sum_row3
sum_row_total


Given the following matrix R. Use index slicing to get the following data: 

```
R = array([[ 1,  2,  3,  4],
           [ 5,  6,  7,  8],
           [ 9, 10, 11, 12],
           [13, 14, 15, 16],
           [17, 18, 19, 20]])

[[ 7],
 [11],
 [15]]

[14, 15, 16]

[[ 1,  2,  3,  4],
 [ 9, 10, 11, 12],
 [17, 18, 19, 20]]
```

In [None]:
R = np.arange(1, 21).reshape(5,4)
R

In [None]:
result1 = R[1:4, 2:3] # 2eme ligne jusqu'à 3eme ligne, colonne 2 (3eme exclu)
result1

In [None]:
result2 = R[3, 1:4] # 4eme ligne, de la 2ème à la 4 colonne
result2

In [None]:
result3 = R[::2] # selectionne tout le tableau, et dedans on prends toutes les 2 lignes
result3

Given these 2 vectors. Perfom multiplication, scalar product and vectorial product between X and Y. Can you explain the differences between these three methods?

```
X = [1 2 3]
Y = [4 5 6]
```

Hint: read differences between np.multiply, np.dot and np.cross

In [26]:
X = np.array([1, 2, 3])
Y = np.array([4, 5, 6])

In [None]:
multiplication = np.multiply(X, Y)
multiplication

In [None]:
scalar = np.dot(X, Y)
scalar

In [None]:
vectorial = np.cross(X, Y)
vectorial

Given the following vector. calculate the exponential of each element as $f(x) = e^x$
```
X = [1, 2, 3]
````


In [None]:
X = np.array([1, 2, 3])
X

In [None]:
exponential_X = np.exp(X)
exponential_X

Create a python function where you implement the sigmoid function : 

$f(x) = \frac{1}{1 + e^{-x}}$

Then, use it to calculate the sigmoid value for the previous vector X

In [32]:
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

In [None]:
sigmoid_X = sigmoid(X)

sigmoid_X

Plot the exponential function and the sigmoid function in different figures. Use np.linespace to define the correct domain and visualize these functions correctly. You should obtain something like this

![image-4.png](attachment:image-4.png)
![image-5.png](attachment:image-5.png)

In [None]:
import matplotlib.pyplot as plt

x_values = np.linspace(-10, 10, 100)

y_exponential = np.exp(x_values)
y_sigmoid = sigmoid(x_values)

# Tracé de la fonction exponentielle
plt.figure(figsize=(8, 6))
plt.plot(x_values, y_exponential, label="Exponential Function $e^x$")
plt.title("Exp Function")
plt.xlabel("x")
plt.ylabel("f(x)")
plt.legend()
plt.grid(True)
plt.show()

# Tracé de la fonction sigmoid
plt.figure(figsize=(8, 6))
plt.plot(x_values, y_sigmoid, label="Sigmoid Function $\\frac{1}{1 + e^{-x}}$")
plt.title("Sigmoid Function")
plt.xlabel("x")
plt.ylabel("f(x)")
plt.legend()
plt.grid(True)
plt.show()


## Difference between lists and arrays

Create the following list and transform it to an array d_array
```
d_list = [[1, 2, 3],
          [4, 5, 6]]
```

In [None]:
d_list = [[1, 2, 3],
          [4, 5, 6]]
d_array = np.array(d_list)
d_array

How do you obtain the sum of rows with the list? How do you perform with an array? this should be the result:
```
[[ 6],
 [15]]
```

In [None]:
result = [sum(i) for i in d_list]

result

In [None]:
result = d_array.sum(axis=1) # axis ligne
result

Let's mesure the time of calculation. Usee the magic command %%time at the beginning of the cell to measure time.  Compare the time of calculation to obtain the sum of elements of the following list and array. What is your conclusion? 

In [38]:
A = np.random.rand(1000, 1000)
a = A.tolist()

In [None]:
%%time
total = 0
for row in a:
    for n in row:
        total += n
round(total, 2)

In [None]:
%%time
round(np.sum(A), 2)

numpy est plus rapide

## boolean indexing

Given the G matrix, use boolean indexing to change the values of numbers > 25 to 100. 

In [None]:
G = np.arange(1,51).reshape(5,10)
G

In [None]:
G[G > 25] = 100 # remplace toutes les valeurs après 25 par 100
G

## numpy broadcasting

Execute the following cells and describe with your own words, what is happening.

In [None]:
P = np.arange(1,5)
Q = np.arange(6,10).reshape(4,1)
print(P)
print(Q)

In [None]:
P+Q # Calcule chaque valeur de la ligne avec chaque valeur de la colonne

In [None]:
O = np.arange(1,5).reshape(2,2)
O

In [None]:
O+P

Why we obtain the last error? why O+P didn't work?

In [None]:
# Car ils ne sont pas de la même taille, 0 a 2x2 alors que P n'a qu'une ligne

## Differences between pandas and numpy (series and arrays)

Create three arrays (D1, D2, D3) of shape (100,1) with random integer numbers. The first array will have random numbers between (20 to 150), the second between (1 to 10) and the last one from (100000 to 800000) 

Create one single array D of shape (100,3) by concatenating all the last 3 arrays. 

Then, store this information into a dataframe sales_df where the three columns are renamed as surface, nb_rooms, SalePrice. You should obtain something like this

![image-2.png](attachment:image-2.png)

In [47]:
import pandas as pd

In [48]:
D1 = np.random.randint(20, 151, size=(100, 1))
D2 = np.random.randint(1, 11, size=(100, 1))
D3 = np.random.randint(100000, 800001, size=(100, 1))

In [49]:
X = np.concatenate((D1, D2, D3), axis=1)

In [None]:
sales_df = pd.DataFrame(X, columns=['surface', 'nb_rooms', 'SalePrice'])
sales_df.head()

use scatter plot to observe SalePrice in fonction of surface

In [None]:
plt.figure(figsize=(10, 6))
plt.scatter(sales_df['surface'], sales_df['SalePrice'], color='blue')
plt.title('SalePrice par rapport à la surface')
plt.xlabel('Surface')
plt.ylabel('SalePrice')
plt.grid(True)
plt.show()

In Machine Learning, we would like to create a linear model to predict SalePrice. In this problem, you have 2 features and 1 target. The linear model is composed of parameters (b0,b1,b2) to estimate the SalePrice by following the formula $y' = b0 + b1 * x1 + b2 * x2$ which in matrix representation would be $Y' = X * B$, where B = [b0 b1 b2] and X is the a matrix with the values of the dataframe 

We are going to calculate the prediction manually : 

1- take the data from column sales_df['SalePrice'] and store it into an array called y

2- take data from columns 'surface' and 'nb_rooms' and store it into an array called X

3- create an array of ones of shape (100,1) and concatenate X. replace X with the new array, this should have the shape (100,3) and should look like this

![image.png](attachment:image.png)

4- create the array of parameters B = [100000, 5000, 1000]

5- calculate the predictions using X and B, store in variable y_hat

6 - Plot the linear model with the dataset. You should obtain this graph: 

![image-3.png](attachment:image-3.png)

7 - calculate the L1 and L2 errors between y and y_hat. To do so, create two functions that takes yhat and y as parameters : L1(yhat,y) et L2(yhat,y)

$l_{1_{loss}} = \Sigma^n_{i=1} |y-y'|$ ... in matrix this is $L1 = \Sigma^n_{i=1}|Y-Y'|$

$l_{2_{loss}} = \Sigma^n_{i=1} (y-y')^2$ ... in matrix this is $L2 = (Y-Y')*(Y-Y')^T$

In [None]:
# array y
y = sales_df['SalePrice'].values
y

In [None]:
# array X
X = sales_df[['surface', 'nb_rooms']].values
X

In [54]:
# array ones
ones_column = np.ones((100, 1))

In [None]:
# array ones + array X
X = np.concatenate((ones_column, X), axis=1)
X

In [56]:
# create the array of parameters
B = np.array([[100000], [5000], [1000]])

In [None]:
# calculate predictions
y_hat = np.dot(X, B) # valeurs + paramètres
y_hat

In [None]:
# Plot!
plt.figure(figsize=(10, 6))
plt.scatter(sales_df['surface'], y, color='blue', label='Actual SalePrice')
plt.plot(sales_df['surface'], y_hat, color='red', label='Predicted SalePrice')
plt.title('Scatter plot + best fit')
plt.xlabel('Surface')
plt.ylabel('SalePrice')
plt.legend()
plt.grid(True)
plt.show()

# calculate the L1 and L2 errors between y and y_hat. To do so, create two functions that takes yhat and y as parameters : L1(yhat,y) et L2(yhat,y)

$l_{1_{loss}} = \Sigma^n_{i=1} |y-y'|$ ... in matrix this is $L1 = \Sigma^n_{i=1}|Y-Y'|$

$l_{2_{loss}} = \Sigma^n_{i=1} (y-y')^2$ ... in matrix this is $L2 = (Y-Y')*(Y-Y')^T$

In [None]:
# MAE - Erreur absolue moyenne
def L1(y_hat, y):
    return np.sum(np.abs(y - y_hat))

L1(y_hat, y)

In [None]:
# MSE - erreur carrée moyenne
def L2(y_hat, y):
    difference = y - y_hat
    return np.dot(difference, difference.T)

L2(y_hat, y)

In [None]:
# RMSE - Erreur quadratique moyenne
x = L2(y_hat, y)
racine_carree = np.sqrt(x)
print(racine_carree)

In [None]:
# Coefficient de détermination
SSR = np.dot(y - y_hat, y - y_hat)
SST = np.dot(y - np.mean(y), y - np.mean(y))
R2 = 1 - (SSR/SST)
R2
