In [None]:
#################
###  Imports  ###
#################

import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf
from tensorflow import keras 
from tensorflow.keras.datasets import boston_housing
from tensorflow.keras.callbacks import EarlyStopping
from tensorflow.keras import models
from tensorflow.keras import layers

from sklearn.preprocessing import StandardScaler

# Deep Learning week - Day 1 - Regression Exercise 


The overall objectif of this exercise is to predict the house pricing in the Boston area (USA) based on input features. This will be done with a Neural Network.

The intention of this exercise is to :
- prepare the data for a NN (Neural Network)
- train a _regression_ NN
- check the NN loss during the training and adapt accordingly
- select the hyperparameters of the NN

# Data

We will predict the price of houses in Boston and suburbs, based on input variables as the is the pupil-teacher ratio (in the related town), nitric oxides concentration, the crime rate per capita or the weighted distances to five Boston employment centers.

You can check additional information about the dataset here https://towardsdatascience.com/machine-learning-project-predicting-boston-house-prices-with-regression-b4e47493633d

This classic dataset is provided in the Keras library. It can be loaded as follows : 

In [None]:
(X_train, y_train), (X_test, y_test) = boston_housing.load_data()

`shape` is an interesting attribute of the data object. It gives the (row, column) shape of the data.

In [None]:
print("Size of training data: {}".format(X_train.shape))
print("Size of test data: {}".format(X_test.shape))

### Question #1 : What kind of Machine Learning is this problem related to? Supervised, regression, unsupervised, clustering, classification, ... ?

<hr>

Due to the form of some non-linear activation functions, it is important to center and normalize (i.e. divided by its  1) the data so that they are centered around 0 with a variance of 1. 

### Question: Use the StandardScaler from scikit learn [(see documentation)](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html) to standardise the data 

Warning : Use it wisely on the train and test set. _Hint_ : you can check what was done on the multiclass classification tutorial.

In [None]:
### TODO 

### Question #3 : Plot each of your variable within the train set to check that it is somehow centered around 0 with small variance

In [None]:
### TODO 

# Model


Now that we have the data, we will define a first architecture and run the model 

In [None]:
def initialize_model():

    ### Model architecture
    model = models.Sequential()
    model.add(layers.Dense(64, activation='relu', input_shape=(13,)))
    model.add(layers.Dense(32, activation='linear'))
    model.add(layers.Dense(1))
    
    
    # Model optimization : Optimized, loss and metric to 
    model.compile(optimizer='rmsprop',
                  loss='mse',
                  metrics=['mae'])
    
    return model


### Always reinitialize the model! Otherwise, it would be a trained model

In [None]:
model = initialize_model()

history = model.fit(X_train, 
                    y_train,
                    validation_split=0.7,
                    epochs=300,
                    batch_size=10,
                    verbose=0)



In [None]:
def plot_loss_mae(history):
    plt.plot(history.history['loss'])
    plt.plot(history.history['val_loss'])
    plt.title('Model loss')
    plt.ylabel('Loss')
    plt.xlabel('Epoch')
    plt.legend(['Train', 'Test'], loc='best')
    plt.show()
    
    plt.plot(history.history['mae'])
    plt.plot(history.history['val_mae'])
    plt.title('Model MAE')
    plt.ylabel('MAE')
    plt.xlabel('Epoch')
    plt.legend(['Train', 'Test'], loc='best')
    plt.show()

In [None]:
plot_loss_mae(history)

In [None]:
results = model.evaluate(X_test, y_test, verbose=0)
print('Test loss: {} - Test accuracy (MAE): {}'.format(results[0], results[1]))

# Now let's look at different parameters effects on the result : the loss, the optimizer and different architecture 

In [None]:
def run_model(X_train, y_train, X_test, y_test, model):    
    ### Early stopping criterion
    es = EarlyStopping(monitor='val_loss', mode='min', verbose=1, patience=30)
    
    ### Fitting the model 
    history = model.fit(X_train, y_train, 
                    validation_split=0.7,
                    epochs=3000, 
                    batch_size=50,
                    verbose=0, 
                    callbacks=[es])
    
    ### Evaluation on the test set
    results = model.evaluate(X_test, y_test, verbose=0)
    
    ### Return the results
    return history, results

### Question : Write a function `init_model_1` that initializes an model such that the parameters are the loss and the optimizer. Launch it with loss being `mse` and optimizer being `adam`.

Take the same architecture as in the `initialize_model` above.

Notes : 
- MSE stands for Mean Square Error. It corresponds to te L2 norm, i.e. the square of the error between $f_{\theta}(x)$ and $y$, i.e. $||f_{\theta}(x) - y ||^2$
- MAE (which is another type or regression error) stands for Mean Absolute Error. It corresponds to te L1 norm, i.e. the absolute error between $f_{\theta}(x)$ and $y$, i.e. $|f_{\theta}(x) - y |$

In [None]:
################
### Answer :

def init_model_1(loss, optimizer):
    
    ### Model architecture
    ### TODO 
    
    
    # Model optimization : Optimized, loss and metric to 
    ### TODO 
    
    return model


In [None]:
### Launch the function and the `plot_loss_mae` function
### TODO 

### Print the output MAE on the test set

In [None]:
### TODO 

### Question: Now, compare the result (on the MAE) while training with `loss='mae'` (`optimizer` still being `adam`), especially by printing the final `mae` on the test set.

In [None]:
### TODO 

## Important : Even though your final estimation is the `MAE`, it does not mean that the `MAE` as the loss would be better to optimize the model than the `MSE`. 

In [None]:
def init_model_2(latent_dims):
    model = models.Sequential()
    model.add(layers.Dense(latent_dims[0], activation='relu', input_shape=(13,)))
    for ld in latent_dims[1:]:
        model.add(layers.Dense(ld, activation='relu', input_shape=(13,)))

    model.compile(optimizer='sgd', loss='mae', metrics=['mae'])
    
    return model


The previous function allows to run model with different architectures. For instance, `init_model_2([2])` outputs a NN with 

- a layer of 13 input dimension and 2 output dimension
- a layer of 2 input dimension and 1 output dimension.

Similarly, `init_model_2([10, 5])` outputs a NN with 

- a layer of 13 input dimension and 10 output dimension
- a layer of 10 input dimension and 5 output dimension
- a layer of 5 input dimension and 1 output dimension.

### Question: Look at final `mae` on the test set for the following architectures : 
- latent_dims = [1]
- latent_dims = [5]
- latent_dims = [10]
- latent_dims = [20]
- latent_dims = [60]
- latent_dims = [100]

## Important Remark : If the early stopping has not been called, what does it mean and what should you do?


In [None]:
### TODO 

In [None]:
########## PLOT
### TODO

### Question: Do the same reasoning for the following architectures:

- latent_dims = [10, 1]
- latent_dims = [10, 5]
- latent_dims = [10, 10]
- latent_dims = [10, 15]
- latent_dims = [10, 20]


## Important Remark : If the early stopping has not been called, what does it mean and what should you do?

In [None]:
### TODO 

In [None]:
### TODO 

### Question : Optional - Get the best score of the class - you are advised to look at both the architecture and the optimizer with a good selection of the hyperpameters