# Deep Learning
## Formative assessment
### Week 2: Multilayer perceptrons

#### Instructions

In this notebook, you will write code to implement and train a multilayer perceptron in TensorFlow using the high-level Keras API.

Some code cells are provided you in the notebook. You should avoid editing provided code, and make sure to execute the cells in order to avoid unexpected errors. Some cells begin with the line: 

`#### GRADED CELL ####`

These cells require you to write your own code to complete them.

#### Let's get started!

We'll start by running some imports, and loading the dataset. 

In [14]:
#### PACKAGE IMPORTS ####

# Run this cell first to import all required packages. Do not make any imports elsewhere in the notebook

import tensorflow as tf
import pandas as pd
import numpy as np
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
from pathlib import Path

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

# If you would like to make further imports from Tensorflow, add them here




<img src="figures/bike_sharing.jpg" title="Bike sharing" style="width: 500px;"/>
<center><font style="font-size:12px">source: <a href=https://www.visitlondon.com/traveller-information/getting-around-london/london-cycle-hire-scheme>visitlondon.com</a></font></center>

#### The Bike Sharing dataset
In this formative assessment, you will use the [Bike Sharing dataset](https://archive.ics.uci.edu/ml/datasets/Bike+Sharing+Dataset) from the UCI Machine Learning Repository. This dataset contains the hourly and daily count of rental bikes between years 2011 and 2012 in Capital bikeshare system with the corresponding weather and seasonal information.

* Fanaee-T, H,, & Gama, J. (2013), "Event labeling combining ensemble detectors and background knowledge", _Progress in Artificial Intelligence_, 1-15, Springer Berlin Heidelberg.

Your goal is to use TensorFlow to model the dataset using linear regression and MLP networks.

#### Load and preprocess the data

In [15]:
# Run this cell to load and sample the data

df = pd.read_csv(Path("./data/bike_sharing.csv"))
df.sample(10)

Unnamed: 0,year,season,month,day_name,hr,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered
4953,2011,3,7,Saturday,7,0,6,0,1,0.76,0.7273,0.66,0.194,3,39
10025,2012,1,2,Monday,18,0,1,1,1,0.52,0.5,0.27,0.4179,10,517
8205,2011,4,12,Tuesday,13,0,2,1,1,0.4,0.4091,0.37,0.3284,7,120
8547,2011,1,12,Tuesday,21,0,2,1,1,0.32,0.3333,0.87,0.0896,11,52
12574,2012,2,6,Wednesday,3,0,3,1,1,0.62,0.5758,0.83,0.194,1,7
7075,2011,4,10,Thursday,10,0,4,1,3,0.5,0.4848,0.88,0.194,10,52
6345,2011,4,9,Monday,23,0,1,1,1,0.62,0.5455,0.94,0.1343,11,64
9660,2012,1,2,Sunday,11,0,0,0,1,0.12,0.1061,0.42,0.2985,3,121
11380,2012,2,4,Tuesday,9,0,2,1,1,0.4,0.4091,0.47,0.3582,6,302
958,2011,1,2,Saturday,13,0,6,0,1,0.3,0.2727,0.39,0.4179,32,103


See [the dataset description](https://archive.ics.uci.edu/ml/datasets/Bike+Sharing+Dataset) for more information on the attributes. There are two target variables: `casual` (number of casual users) and `registered` (number of registered users).

You should first complete the following `get_inputs_and_targets` function, according to the following spec:

* The function takes inputs `dataframe` and `target_variables`
  * The `target_variables` is a list of column names that we will use for the targets
* The function should return a tuple of DataFrames `(inputs_df, targets_df)`, where `targets_df` contains only the columns in `target_variables`, and `inputs_df` contains all the remaining columns.

In [16]:
#### GRADED CELL ####

# Complete the following function. 
# Make sure to not change the function name or arguments.

def get_inputs_and_targets(dataframe, target_variables):
    """
    This function takes in the loaded DataFrame and target_variables list as above, 
    and returns inputs and targets DataFrames.
    """
    targets_df = dataframe[target_variables]
    inputs_df = dataframe.drop(target_variables, axis=1)
    return inputs_df, targets_df

In [17]:
# Run your function to get the input and target Tensors

inputs_df, targets_df = get_inputs_and_targets(df, target_variables=['casual', 'registered'])

The data will need some preprocessing before it is ready to be used to train a deep learning model. 

Firstly, several of the attributes are categorical: `year`, `season`, `month`, `day_name`, `hr`, `holiday`, `weekday`, `workingday` and `weathersit`. We will represent each of these attributes with a one-hot encoding. For example, `year` takes one of the values `2011` or `2012` in the dataset. Will represent the year of a data example as either the one-hot vector `[1, 0]` or `[0, 1]`, corresponding to the year `2011` or `2012` respectively. In general, the length of the one-hot vector will equal the number of categories, and will be all zeros except for a single one in the place of the corresponding category for a particular data example.

The final representation of our inputs will be the concatenation of all features, including one-hot vectors.

You should now complete the following `convert_to_one_hot` function, according to the following specifications:

* The function takes the inputs `inputs_dataframe` and `categorical_attributes`
    * `categorical_attributes` will be a list of column names that are present in `inputs_dataframe`
* The function should convert each categorical feature to a one-hot vector by replacing the column with a number of columns equal to the number of categories
* The function should then return the updated DataFrame

_Hint: see [`pd.get_dummies`](https://pandas.pydata.org/docs/reference/api/pandas.get_dummies.html), and note that numerical columns can be converted to `category` type in order for this function to correctly the column as categorical._

In [18]:
#### GRADED CELL ####

# Complete the following function. 
# Make sure to not change the function name or arguments.

def convert_to_one_hot(inputs_dataframe, categorical_attributes):
    """
    This function takes in the loaded DataFrame and categorical_attributes list as above, 
    and converts the categorical features to one-hot encodings.
    Your function should return the DataFrame.
    """
    inputs_dataframe[categorical_attributes] = inputs_dataframe[categorical_attributes].astype('category')
    return pd.get_dummies(inputs_dataframe)

In [19]:
# Run your function to convert the categorical features to one-hot encodings

cols = ['year', 'season', 'month', 'day_name', 'hr', 'holiday', 'weekday', 'workingday', 'weathersit']
inputs_df = convert_to_one_hot(inputs_df, cols)

In the second stage of preprocessing, we will scale the values in both the inputs and the targets.

You should now complete the following `scale_values` function, according to the following spec:

* The function takes the `inputs_df` and `targets_df` DataFrames as arguments
* The feature values in `inputs_df` should be linearly scaled to the range $[0, 1]$. Note that this will not affect the features that have been one-hot encoded. _Hint: use the [`MinMaxScaler`](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html) from `sklearn`._
* The values in the `targets_df` are counts, and these should be converted as follows:
$$\text{count} \mapsto \log (1 + \text{count})$$
where the log is the natural logarithm.
* The function should then return a tuple of Tensors `(inputs, targets)` of type `tf.float32` with the transformed values

In [20]:
#### GRADED CELL ####

# Complete the following function. 
# Make sure to not change the function name or arguments.

def scale_values(inputs_df, targets_df):
    """
    This function takes in the inputs and targets DataFrames and scales the values
    in each DataFrame according to the above description.
    Your function should return a tuple of Tensors.
    """
    scaler = MinMaxScaler()
    inputs = scaler.fit_transform(inputs_df)
    inputs = tf.constant(inputs, dtype=tf.float32)
    
    targets = np.log(1. + targets_df).values
    targets = tf.constant(targets, dtype=tf.float32)
    return inputs, targets

In [21]:
# Run your function to get the scaled inputs and outputs Tensors

inputs, targets = scale_values(inputs_df, targets_df)

In [22]:
# Split the data into training and validation sets

X_train, X_val, y_train, y_val = train_test_split(inputs.numpy(), targets.numpy(), test_size=0.3)

X_train, y_train = tf.constant(X_train), tf.constant(y_train)
X_val, y_val = tf.constant(X_val), tf.constant(y_val)

#### Linear regression model

We will first fit a simple linear regression model to the training data. Recall that this is a model of the form

$$
y = f_\theta(\mathbf{x}) + \epsilon,
$$

where $y\in\mathbb{R}^C$ is the target variable, $\mathbf{x}\in\mathbb{R}^{D}$ are the input features, $\Theta\in\mathbb{R}^{C\times D+1}$ are the model parameters, $\epsilon\in\mathbb{R}^C$ with $(\epsilon)_c\sim\mathcal{N}(0, 1)$ $(c=1,\ldots,C)$ is the observation noise random variable, and $f_\Theta:\mathbb{R}^{D+1}\mapsto\mathbb{R}$ is given by

$$
f_\Theta(\hat{\mathbf{x}}) = \Theta\hat{\mathbf{x}},
$$

where $\hat{\mathbf{x}}\in\mathbb{R}^{D+1}$ is constructed by adding a constant 1 feature to $\mathbf{x}$; that is, $\hat{\mathbf{x}}_0=1$.

We will use the Keras API to implement a linear regression model, using the `Sequential` class.

In the following function, you should build a `Sequential` model with just one `Dense` layer, which has two output units, one for each target variable, and no activation function. This is the same as the linear regression model above.

* The function takes the `input_shape` as an argument, which should be used in the `Dense` layer initializer to specify the input shape
* The function should build and return the `Sequential` object with one Dense layer with two output neurons, and no activation function

In [23]:
#### GRADED CELL ####

# Complete the following function. 
# Make sure to not change the function name or arguments.

def sequential_linear_regression(input_shape):
    """
    This function takes the input_shape as argument to build a Sequential model as 
    specified above. 
    The function should then return the Sequential model.
    """
    model = tf.keras.Sequential([
        tf.keras.layers.Dense(2, activation=None, input_shape=input_shape)
    ])
    return model

In [24]:
# Run your function to build the model and print the model summary

lr_model = sequential_linear_regression(input_shape=X_train.shape[1:])
lr_model.summary()


Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense (Dense)               (None, 2)                 138       
                                                                 
Total params: 138 (552.00 Byte)
Trainable params: 138 (552.00 Byte)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________


You should now compile and fit the model to the training data. 

* The following function takes the following arguments:
  * `sequential_model`: a Sequential model to fit to the training data
  * `num_epochs`: a positive integer that defines the number of epochs to train the model
  * `training_data`: a 2-tuple of Tensors (inputs, targets) for the training data
  * `val_data`: a 2-tuple of Tensors (inputs, targets) for the validation data
  * `batch_size`: a positive integer that defines the number of examples in each minibatch
* The function should compile the model with the mean squared error loss and the SGD optimizer
* The function should then fit the model to the training data for `num_epochs` epochs and save the returned history object
* Your function should then return the history object

_Hint: for the validation data, use the `validation_data` keyword argument (see [the docs](https://www.tensorflow.org/api_docs/python/tf/keras/Model#fit))._

In [None]:
#### GRADED CELL ####

# Complete the following function. 
# Make sure to not change the function name or arguments.

def compile_and_fit(sequential_model, num_epochs, training_data, val_data, batch_size):
    """
    This function should compile and fit the sequential_model as described above. 
    The function should then return the history object that is returned from the fit method.
    """
    X_train, y_train = training_data
    X_val, y_val = val_data
    sequential_model.compile(loss='mse', optimizer='sgd')
    history = sequential_model.fit(X_train, y_train, validation_data=(X_val, y_val), 
                                   epochs=num_epochs, batch_size=batch_size)
    return history

In [None]:
# Run your function to compile and fit the model

history = compile_and_fit(lr_model, num_epochs=30, training_data=(X_train, y_train), 
                          val_data=(X_val, y_val), batch_size=128)

In [None]:
# Plot the losses

plt.plot(history.history['loss'], label='train')
plt.plot(history.history['val_loss'], label='val')
plt.title("Loss vs epochs")
plt.xlabel("Epochs")
plt.ylabel("MSE loss")
plt.legend()
plt.show()

We can see that our linear regression model is underfitting. 

In [None]:
# Compute the train and validation loss of the linear regression Sequential model

print("Model train loss: {}".format(lr_model.evaluate(X_train, y_train, verbose=0)))
print("Model validation loss: {}".format(lr_model.evaluate(X_val, y_val, verbose=0)))

Note that the model above is equivalent to fitting separate linear regression models for each scalar target output (`casual` and `registered`). However, these two models are clearly very closely related, and there is likely to be shared features that would be helpful for both models. 

In addition, we would like to train a higher capacity model to attempt to alleviate the underfitting we see in the linear regression model.

Both of these reasons are motivation for training a deeper multilayer perceptron (MLP) model. This is a higher capacity model that simple linear regression, and we expect that the intermediate features represented by the hidden layers will learn features of the data that are useful for predicting both of the target variables.

You should now complete the following `get_mlp` function to build an MLP model according to the following spec:

* The function takes the arguments `input_shape` and `hidden_layers`
* `hidden_layers` is a list of integers, corresponding to the number of neurons in the hidden layers
* The function should build the MLP using the `Sequential` API
  * It should use the `input_shape` argument in the first layer of the model
  * The hidden layers should each use a ReLU activation function
  * The output layer should have 2 neurons, and not use an activation function
* The function should then return the model object

In [None]:
#### GRADED CELL ####

# Complete the following function. 
# Make sure to not change the function name or arguments.

def get_mlp(input_shape, hidden_layers):
    """
    This function takes the input_shape, hidden_layers and output_units as arguments 
    to build a Sequential model as specified above. 
    The function should then return the Sequential model.
    """
    model = Sequential()
    for i, units in enumerate(hidden_layers):
        if i == 0:
            model.add(Dense(units, activation='relu', input_shape=input_shape))
        else:
            model.add(Dense(units, activation='relu'))
    if len(hidden_layers) > 0:
        model.add(Dense(2))
    else:
        model.add(Dense(2, input_shape=input_shape))
    return model

In [None]:
# Run your function to build the model and print the model summary

mlp_model = get_mlp(input_shape=X_train.shape[1:], hidden_layers=[64, 32])
mlp_model.summary()

In [None]:
# Run your compile_and_fit function on the MLP

history = compile_and_fit(mlp_model, num_epochs=30, training_data=(X_train, y_train), 
                          val_data=(X_val, y_val), batch_size=128)

In [None]:
# Plot the losses

plt.plot(history.history['loss'], label='train')
plt.plot(history.history['val_loss'], label='val')
plt.title("Loss vs epochs")
plt.xlabel("Epochs")
plt.ylabel("MSE loss")
plt.legend()
plt.show()

In [None]:
# Compute the train and validation loss of the linear regression Sequential model

print("Model train loss: {}".format(mlp_model.evaluate(X_train, y_train, verbose=0)))
print("Model validation loss: {}".format(mlp_model.evaluate(X_val, y_val, verbose=0)))

The model performance has improved significantly using the MLP instead of linear regression. However, there is still room for improvement - the model is still underfitting and so we should try further increasing the capacity. You should try re-building and training MLP models for different hyperparameter settings to see how much you are able to improve the performance.

Congratulations on completing this week's assignment! You have now implemented linear regression using the Keras API, as well as an MLP model, and compared the performance.