# Deep Learning
## Formative assessment
### Week 3: Loss functions and backpropagation

#### Instructions

In this notebook, you will write code to train an MLP model with both the high-level Keras API and a custom training loop, using the automatic differentiation tools from TensorFlow and a custom loss function.

Some code cells are provided you in the notebook. You should avoid editing provided code, and make sure to execute the cells in order to avoid unexpected errors. Some cells begin with the line: 

`#### GRADED CELL ####`

These cells require you to write your own code to complete them.

#### Let's get started!

We'll start by running some imports, and loading the dataset.

In [1]:
#### PACKAGE IMPORTS ####

# Run this cell first to import all required packages. Do not make any imports elsewhere in the notebook

import tensorflow as tf
import pandas as pd
import numpy as np
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
from pathlib import Path
from keras.models import Sequential
from keras.layers import Dense, Dropout, BatchNormalization

# If you would like to make further imports from Tensorflow, add them here







<table><tr>
<td> <img src="figures/adelie.jpg" title="Adélie" style="width: 275px;"/> </td>
<td> <img src="figures/chinstrap.jpg" title="Chinstrap" style="width: 275px;"/> </td>
    <td> <img src="figures/gentoo.jpg" title="Gentoo" style="width: 275px;"/> </td>
</tr></table>

<center><font style="font-size:12px">source: <a href=https://en.wikipedia.org/wiki/Penguin>wikipedia</a></font></center>

#### The Palmer Penguins dataset
In this formative assessment, you will use the [Palmer Penguins dataset](https://allisonhorst.github.io/palmerpenguins/articles/intro.html). These data were collected from 2007 - 2009 by Dr. Kristen Gorman with the [Palmer Station Long Term Ecological Research Program](https://pal.lternet.edu/), part of the [US Long Term Ecological Research Network](https://lternet.edu/). The dataset consists of measurements for three penguin species observed in the Palmer Archipelago, Antarctica.

* Gorman, K.B., Williams, T.D. & Fraser, W.R. (2014), "Ecological sexual dimorphism and environmental variability within a community of Antarctic penguins (genus Pygoscelis)", PLoS ONE **9** (3):e90081, https://doi.org/10.1371/journal.pone.0090081

Your goal is to model the dataset using an MLP network, trained using the automatic differentiation tools in TensorFlow.

#### Load and preprocess the dataset

In [2]:
# Run this cell to load and sample the data

df = pd.read_csv(Path("./data/penguins.csv"))
df.sample(5)

Unnamed: 0,Body Mass (g),Clutch Completion,Comments,Culmen Depth (mm),Culmen Length (mm),Date Egg,Delta 13 C (o/oo),Delta 15 N (o/oo),Flipper Length (mm),Individual ID,Island,Region,Sample Number,Sex,Species,Stage,studyName
285,3350.0,Yes,,17.0,35.7,11/22/09,-23.90309,8.96436,189.0,N64A1,Torgersen,Anvers,119,FEMALE,Adelie,"Adult, 1 Egg Stage",PAL0910
85,5850.0,Yes,,14.6,48.4,11/29/07,-25.48025,7.8208,213.0,N37A2,Biscoe,Anvers,14,MALE,Gentoo,"Adult, 1 Egg Stage",PAL0708
186,5500.0,Yes,,15.1,48.1,11/9/09,-26.22664,8.45738,209.0,N29A2,Biscoe,Anvers,110,MALE,Gentoo,"Adult, 1 Egg Stage",PAL0910
73,3600.0,Yes,,17.3,42.4,12/3/07,-24.6879,9.35138,181.0,N73A1,Dream,Anvers,21,FEMALE,Chinstrap,"Adult, 1 Egg Stage",PAL0708
275,4875.0,Yes,,14.0,47.5,11/25/09,-26.23613,8.12691,212.0,N14A1,Biscoe,Anvers,89,FEMALE,Gentoo,"Adult, 1 Egg Stage",PAL0910


We will work the following columns from the DataFrame:

In [26]:
# This is the list of columns to use as input features from the DataFrame

input_cols = ['Body Mass (g)', 'Culmen Depth (mm)', 'Culmen Length (mm)', 'Flipper Length (mm)']

In [27]:
# This is the column to use for the target variable

target_col = ['Species']

We will also use the `MinMaxScaler` from `sklearn` to scale the input features.

In [5]:
# Create a MinMax Scaler

scaler = MinMaxScaler()

You should now complete the following `get_inputs_and_targets` function, according to the following specifications:

* The function takes `dataframe`, `input_columns`, `target_column`, `minmaxscaler` as arguments
* Extract the inputs and target columns from the loaded DataFrame using `input_columns` and `target_column` lists
* Remove any rows with `NaN` values
* Scale the input features to the range $[0, 1]$ using the `minmaxscaler`
* The function should then return a tuple of constant `tf.Tensor` objects `(input_variables, target_variable)`
  * `input_variables` should be of type `tf.float32`, with shape `(num_examples, num_features)` 
  * `target_variable` should be of type `tf.string`, with shape `(num_examples,)`

In [45]:
#### GRADED CELL ####

# Complete the following function. 
# Make sure to not change the function name or arguments.

def get_inputs_and_targets(dataframe, input_columns, target_column, minmaxscaler):
    """
    This function takes in the loaded DataFrame and column lists as above, and a
    MinMaxScaler object. The function should extract the input and target features as 
    above, and return a tuple (input_variables, target_variable) of Tensors.
    """
    
    dataframe = dataframe[target_column + [ i for i in input_columns]].dropna()
    input_cols = dataframe[input_columns]
    target_col = dataframe[target_column]
    scaler.fit(input_cols)
    input_cols = scaler.transform(input_cols)
    input_cols = tf.constant(input_cols, dtype = tf.float32)
    target_col = tf.constant(target_col, dtype = tf.string)
    target_col = tf.squeeze(target_col)
    
    return (input_cols, target_col)


In [46]:
# Run your function to get the input and target Tensors

X, y = get_inputs_and_targets(df, input_cols, target_col, scaler)

(342,)


In [40]:
# Split the data into training and test sets

X_train, X_test, y_train, y_test = train_test_split(X.numpy(), y.numpy(), test_size=0.2)

In [41]:
# Load the data into tf.data.Dataset objects

train_ds = tf.data.Dataset.from_tensor_slices((X_train, y_train))
test_ds = tf.data.Dataset.from_tensor_slices((X_test, y_test))

The target variable needs to be further processed to convert the string labels to integer labels to train our model.

You should now complete the following `get_dataset` function according to the following specifications:

* The function takes `data` as an argument, which is a tuple of numpy arrays `(inputs, targets)`
* The training and test data should be loaded into a `tf.data.Dataset` object
* The `get_dataset` function should contain a nested function that is used in the `map` method of the Dataset to process the targets
  * The string targets should be converted to integer labels according to the following mapping:</br>
  `{"Adelie": 0, "Chinstrap": 1, "Gentoo": 2}`
* The resulting Dataset should return a tuple of `(inputs, targets)` Tensors, of types `tf.float32` and `tf.int32` respectively

In [52]:
#### GRADED CELL ####

# Complete the following function. 
# Make sure to not change the function name or arguments.

def get_dataset(data):
    """
    This function takes a tuple of numpy arrays, and creates a tf.data.Dataset
    object according to the above description.
    The function should then return the Dataset.
    """
    dataset = tf.data.Dataset.from_tensor_slices(data)
    def convert_label(inputs,target):
        if target == "Adelie":
            target = 0
        elif target == "Chinstrap":
            target = 0
        else:
            target = 2
        return inputs,target
    return dataset.map(convert_label)
    

In [54]:
# Create the training and test Datasets and print the element_spec

train_ds = get_dataset(data=(X_train, y_train))
test_ds = get_dataset(data=(X_test, y_test))

print(train_ds)
train_ds.element_spec

<_MapDataset element_spec=(TensorSpec(shape=(4,), dtype=tf.float32, name=None), TensorSpec(shape=(), dtype=tf.int32, name=None))>


(TensorSpec(shape=(4,), dtype=tf.float32, name=None),
 TensorSpec(shape=(), dtype=tf.int32, name=None))

In [55]:
# Shuffle, batch and prefetch the Datasets

train_ds = train_ds.shuffle(X_train.shape[0]).batch(32).prefetch(tf.data.AUTOTUNE)
test_ds = test_ds.shuffle(X_test.shape[0]).batch(32).prefetch(tf.data.AUTOTUNE)

#### MLP model

You should now complete the following `get_model` function to build the MLP model we will use to train on the Palmer Penguins dataset.

* The function takes `hidden_units`, `output_units`, `input_shape`, `rate` as arguments
* You should build the model using the `Sequential` API
* `hidden_units` is a list of integers, specifying the width of the hidden layers within the model
  * Each hidden layer should use a sigmoid activation function
  * Each fully connected layer should be followed by a batch normalization layer, and then a dropout layer with dropout rate equal to `rate`
* The first layer in the model should set the input shape using the `input_shape` argument
* `output_units` is an integer specifying the number of neurons in the final output layer
  * The final output layer should not use an activation function

In [59]:
#### GRADED CELL ####

# Complete the following function. 
# Make sure to not change the function name or arguments.

def get_model(hidden_units, output_units, input_shape, rate):
    """
    This function should create an MLP model according to the above description.
    The function should then return the model.
    """
    penguin_model = Sequential()
    for i, neurons in enumerate(hidden_units):
        if i == 0:
            penguin_model.add(Dense(neurons, activation = "sigmoid", input_shape = input_shape))
            
        else:
            penguin_model.add(Dense(neurons, activation='sigmoid'))
        penguin_model.add(BatchNormalization())
        penguin_model.add(Dropout(rate=rate))
    #finally, deal with output layer :
    if len(hidden_units) == 0:
        penguin_model.add(Dense(output_units, input_shape=input_shape))
    else:
        penguin_model.add(Dense(output_units))
    return penguin_model

In [60]:
# Use your function to create a model and print the summary

model = get_model(hidden_units=[10, 10], output_units=3, input_shape=(4,), rate=0.8)
model.summary()

Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_1 (Dense)             (None, 10)                50        
                                                                 
 batch_normalization_1 (Bat  (None, 10)                40        
 chNormalization)                                                
                                                                 
 dropout_1 (Dropout)         (None, 10)                0         
                                                                 
 dense_2 (Dense)             (None, 10)                110       
                                                                 
 batch_normalization_2 (Bat  (None, 10)                40        
 chNormalization)                                                
                                                                 
 dropout_2 (Dropout)         (None, 10)               

#### Loss function

We will train the model using a categorical cross entropy loss function. Since the final layer in the model does not use an activation function, it is returning the logits to be used in the computation of the loss function.

The categorical cross entropy for a single data example $(x, y)$ is given by:

$$
l(y, \hat{y}) = -\sum_{j=1}^C y_{j} \log \hat{y}_{j},\tag{1}
$$

where $C$ is the number of classes (in our case $C=3$), $y, \hat{y}\in\mathbb{R}^C$, and $\hat{y}_{j}$ is equal to the probability of the label $j$ as predicted by our neural network $f_\theta$ with parameters $\theta$, given the input $x$. In the above formulation the target label $y$ is represented as a one-hot vector. In our case, $y$ will be length three with two zeros and a single 1 in the place of the correct label.

Note also that the our model defined above outputs logits $z_j$, not probabilities. The probabilities are computed using the softmax function:

$$
\hat{y_j} = \frac{\exp(z_j)}{\sum_{k=1}^3 \exp(z_k)}.
$$

The loss function we want to minimise is the categorical cross entropy \eqref{cce} averaged over all examples in the training data. In practice, we will estimate this loss function by sampling minibatches of data and computing the average categorical cross entropy over the minibatch.

You should now complete the following `loss_function` function, to compute the categorical cross entropy loss as above.

In TensorFlow, loss functions have the signature `loss(y_true, y_pred)`, where `y_true` is the ground truth Tensor and `y_pred` is the model prediction given the inputs. The `compute_loss` function follows this signature, so we would be able to pass it to the `loss` argument directly when calling `model.compile`.

* The function takes `y_true` and `y_pred` as arguments
  * `y_true` is a batch of ground truth inputs, of shape `(num_examples,)` and type `tf.int32`
  * `y_pred` is a batch of model predictions, of shape `(num_examples, 3)` and type `tf.float32`
* The function should compute the categorical cross entropy as above
  * Bear in mind that `y_pred` will be a batch of logits, not probabilities
  * `y_true` contains the integer-encoded labels (either 0, 1 or 2)
* The function should average the categorical cross entropy over the minibatch, and return the result as a scalar Tensor

_Hint: you might find the functions [`tf.math.reduce_logsumexp`](https://www.tensorflow.org/api_docs/python/tf/math/reduce_logsumexp) and [`tf.gather`](https://www.tensorflow.org/api_docs/python/tf/gather) useful._

In [None]:
#### GRADED CELL ####

# Complete the following function. 
# Make sure to not change the function name or arguments.

def loss_function(y_true, y_pred):
    """
    This function should compute the categorical cross entropy loss as described above.
    The function should return a scalar Tensor with the computed loss value.
    """
    
    

In [None]:
# Get a ground truth and predictions Tensor to test your function

inputs, y_true = next(iter(train_ds))
y_pred = model(inputs)

In [None]:
# Compute the loss on the batch of data using your function

loss_function(y_true, y_pred)

In [None]:
# Check to see that your computed loss agrees with the built-in TensorFlow function

tf.reduce_mean(tf.keras.metrics.sparse_categorical_crossentropy(y_true, y_pred, from_logits=True))

#### Train your model with the high-level Keras API

You should now complete the following `train_model_keras` function to train the MLP model using the high-level Keras API.

* The function takes `mlp_model`, `loss_fn`, `opt`, `training_dataset` and `epochs` as arguments
* The function should use the high-level Keras API to compile and train the model
  * Use the `compile` method to compile `mlp_model` using the loss function `loss_fn`, `opt` optimizer and accuracy metric
  * Train with the `fit` method, using `training_dataset` for `epochs` epochs
* The function should then return the training history

In [None]:
#### GRADED CELL ####

# Complete the following function. 
# Make sure to not change the function name or arguments.

def train_model_keras(mlp_model, loss_fn, opt, training_dataset, epochs):
    """
    This function should use the compile and fit methods to train the MLP model.
    The function should return the history from the training.
    """
    
    

In [None]:
# Create an SGD optimizer

optimizer = tf.keras.optimizers.SGD()

In [None]:
# Compile and fit the MLP model

model = get_model(hidden_units=[40, 20], output_units=3, input_shape=(4,), rate=0.5)
history = train_model_keras(model, loss_fn=loss_function, opt=optimizer, 
                            training_dataset=train_ds, epochs=200)

In [None]:
# Plot the learning curves

plt.figure(figsize=(10, 3))
plt.subplot(1, 2, 1)
plt.plot(history.history['loss'])
plt.xlabel("Epoch")
plt.ylabel("Loss")

plt.subplot(1, 2, 2)
plt.plot(history.history['accuracy'])
plt.xlabel("Epoch")
plt.ylabel("Accuracy")
plt.show()

In [None]:
# Evaluate the model

model.evaluate(test_ds)

#### Train your model with a custom training loop

You will now implement a custom training loop to train an MLP model on the Palmer Penguins dataset, making use of the automatic differentiation tools in TensorFlow.

First you should complete the following `train_step` function, which will implement the core operations of computing the loss and gradients, and updating the model parameters.

* The function takes the arguments `mlp_model`, `loss_fn`, `opt` and `train_batch`
* `train_batch` is a tuple of `(inputs, targets)` Tensors yielded from the Dataset
* The function should compute the batch loss using `train_batch`, `mlp_model` and `loss_fn`
  * The model should be run in training mode (see [the docs](https://www.tensorflow.org/api_docs/python/tf/keras/Model#call))
* It should then compute the gradients and update the model parameters using the optimizer `opt`
* It should return a tuple of three Tensors: `(loss, y_true, y_pred)`
  * `loss` is the scalar batch loss as computed by `loss_fn`
  * `y_true` is the ground truth Tensor for the batch
  * `y_pred` is the model predictions Tensor for the batch

In [None]:
#### GRADED CELL ####

# Complete the following function. 
# Make sure to not change the function name or arguments.

@tf.function
def train_step(mlp_model, loss_fn, opt, train_batch):
    """
    This function should perform the update step as described above.
    The function should return a tuple of Tensors (loss, y_true, y_pred).
    """
    
    

You should now complete the following `train_model_custom` function to perform the custom training loop. We will use two metric objects (defined below) to record the loss and accuracy values over the course of training. See the docs for the base [`Metric`](https://www.tensorflow.org/api_docs/python/tf/keras/metrics/Metric) class to see the generic methods that are available.

* The function takes `mlp_model`, `loss_fn`, `opt`, `training_dataset`, `train_step_fn`, `epochs`, `loss_metric` and `accuracy_metric` as arguments
* The custom training loop should consist of an outer loop for the epochs, that runs for `epochs` number of times
* At the start of each epoch, the metric states should be reset using the `reset_state` method
* Within each epoch, the function should loop over `training_dataset` to pull batches of data
* For each batch, it should use `train_step_fn` to update the model parameters
  * This function returns a tuple of Tensors `(loss, y_true, y_pred)`
  * For each batch, the metrics should also be updated, using the `update_state` method
* The average loss and accuracy over each epoch should each be stored in a list of floats
  * The average loss and accuracy can be retrieved from the metrics at the end of the epoch using the `result` method
* The function should return a tuple of the two lists `(epoch_losses, epoch_acc)` for average loss and accuracy scores per epoch

In [None]:
# Define the loss and accuracy metrics and optimizer

loss_metric = tf.keras.metrics.Mean()
accuracy_metric = tf.keras.metrics.SparseCategoricalAccuracy()
optimizer = tf.keras.optimizers.SGD()

In [None]:
#### GRADED CELL ####

# Complete the following function. 
# Make sure to not change the function name or arguments.

def train_model_custom(mlp_model, loss_fn, opt, training_dataset, train_step_fn, epochs, 
                       loss_metric=loss_metric, accuracy_metric=accuracy_metric):
    """
    This function should run the custom training loop as described above.
    The function should return a tuple of two lists with the loss and accuracy scores.
    """
    
    

In [None]:
# Use your function to run the custom training loop

model = get_model(hidden_units=[40, 20], output_units=3, input_shape=(4,), rate=0.5)
epoch_losses, epoch_acc = train_model_custom(model, loss_fn=loss_function, opt=optimizer,
                                             training_dataset=train_ds, 
                                             train_step_fn=train_step, epochs=200)

In [None]:
# Plot the learning curves

plt.figure(figsize=(10, 3))
plt.subplot(1, 2, 1)
plt.plot(epoch_losses)
plt.xlabel("Epoch")
plt.ylabel("Loss")

plt.subplot(1, 2, 2)
plt.plot(epoch_acc)
plt.xlabel("Epoch")
plt.ylabel("Accuracy")
plt.show()

#### Evaluate your model

Finally, you will also implement custom code to evaluate your model. First you should complete the following `test_step` function, which is similar to the `train_step` function, except that it does not compute gradients or update the model parameters.

* The function takes the arguments `mlp_model`, `loss_fn` and `test_batch`
* `test_batch` is a tuple of `(inputs, targets)` Tensors yielded from the Dataset
* The function should compute the batch loss using `test_batch`, `mlp_model` and `loss_fn`
  * The model should be run in inference mode (see [the docs](https://www.tensorflow.org/api_docs/python/tf/keras/Model#call))
* The function should return a tuple of three Tensors: `(loss, y_true, y_pred)`
  * `loss` is the scalar batch loss as computed by `loss_fn`
  * `y_true` is the ground truth Tensor for the batch
  * `y_pred` is the model predictions Tensor for the batch

In [None]:
#### GRADED CELL ####

# Complete the following function. 
# Make sure to not change the function name or arguments.

@tf.function
def test_step(mlp_model, loss_fn, test_batch):
    """
    This function should perform the evaluation step as described above.
    The function should return a tuple of Tensors (loss, y_true, y_pred).
    """
    
    

Now you should complete the following `test_model_custom` function that will evaluate the model on a test dataset. This will be similar to the `train_model_custom` function, except that no optimizer is used/needed and no parameter updates are made.

* The function takes `mlp_model`, `loss_fn`, `test_dataset`, `test_step_fn`, `loss_metric` and `accuracy_metric` as arguments
* The evaluation should make one complete iteration loop through `test_dataset`
* At the start of the loop, the metric states should be reset using the `reset_state` method
* For each batch, you should use `test_step_fn` to compute the loss and model prediction
  * This function returns a tuple of Tensors `(loss, y_true, y_pred)`
  * For each batch, the metrics should also be updated, using the `update_state` method
* The average loss and accuracy should be retrieved from the metrics at the end of the loop using the `result` method
* The function should return a tuple of two floats `(avg_loss, avg_acc)` for average loss and accuracy scores over the `test_dataset`

In [None]:
#### GRADED CELL ####

# Complete the following function. 
# Make sure to not change the function name or arguments.

def test_model_custom(mlp_model, loss_fn, test_dataset, test_step_fn, 
                      loss_metric=loss_metric, accuracy_metric=accuracy_metric):
    """
    This function should run the custom evaluation loop as described above.
    The function should return a tuple of two floats for the loss and accuracy scores.
    """
    
    

In [None]:
# Use your function to evaluate the model

avg_loss, avg_acc = test_model_custom(model, loss_function, test_ds, test_step)
print(f"Test loss: {avg_loss}")
print(f"Test accuracy: {avg_acc}")

Congratulations on completing this week's assignment! You have now written custom code to implement a loss function, training loop and evaluation loop for an MLP model.