# Deep Learning
## Formative assessment
### Week 1: Introduction to Deep Learning

#### Instructions

In this notebook, you will write code to implement a linear regression model in Keras. You will implement the analytic solution, as well as a low-level training loop to update parameters using gradient descent.

Some code cells are provided you in the notebook. You should avoid editing provided code, and make sure to execute the cells in order to avoid unexpected errors. Some cells begin with the line: 

`#### GRADED CELL ####`

These cells require you to write your own code to complete them.

#### Let's get started!

We'll start by running some imports, and loading the dataset.

In [None]:
#### PACKAGE IMPORTS ####

# Run this cell first to import all required packages. Do not make any imports elsewhere in the notebook

import keras
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
from pathlib import Path

<center><img src="figures/life_expectancy_wikipedia.png" title="Life expectancy" style="width: 450px;"/></center>
<center><font style="font-size:12px">source: <a href=https://en.wikipedia.org/wiki/List_of_countries_by_life_expectancy>wikipedia</a></font></center>

#### The WHO Life Expectancy dataset
In this formative assessment, you will use the [WHO Life Expectancy dataset](https://www.kaggle.com/kumarajarshi/life-expectancy-who) from Kaggle. This dataset was collected from the Global Health Observatory (GHO) data repository under the World Health Organization (WHO), for the purpose of health data analysis. The dataset includes multiple factors affecting life expectancy across 133 countries, divided into the broad categories of immunization related factors, mortality factors, economical factors and social factors.

Your goal is to use Keras to model the dataset using linear regression.

#### Load and subset the data

In [None]:
# Run this cell to load and describe the data

df = pd.read_csv(Path("./data/Life Expectancy Data.csv"))
df.describe()

We will work the following columns from the DataFrame:

In [None]:
# This is the list of columns to use from the DataFrame

cols = ['Life expectancy ', 'Adult Mortality', 'Alcohol', ' BMI ',
        'Polio', 'Total expenditure', 'Diphtheria ', ' HIV/AIDS', 
        'GDP', 'Income composition of resources', 'Schooling']

You should now complete the following function, according to the following specifications:

* Extract the columns above from the loaded DataFrame
* Remove any rows with `NaN` values
* Define a 1-D numpy array using the values in the `Life expectancy ` column. This will be the target variable
* Define a 2-D numpy array using the values from all remaining columns. This array should have shape `(num_examples, num_features)`. These will be the input variables
* The function should then return the tuple of Tensor objects `(input_variables, target_variable)` of type `float32`

In [None]:
#### GRADED CELL ####

# Complete the following function. 
# Make sure to not change the function name or arguments.

def get_inputs_and_targets(dataframe, columns):
    """
    This function takes in the loaded DataFrame and column list as above, and extracts the
    numpy arrays as described above.
    Your function should return a tuple (input_variables, target_variable) of Tensors.
    """
    
    

In [None]:
# Run your function to get the input and target Tensors

X, y = get_inputs_and_targets(df, cols)

In [None]:
# Split the data into training and test sets and standardise the input scales

X_train, X_test, y_train, y_test = train_test_split(keras.ops.convert_to_numpy(X), 
                                                    keras.ops.convert_to_numpy(y), 
                                                    test_size=0.2)

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

X_train, y_train = keras.ops.convert_to_tensor(X_train), keras.ops.convert_to_tensor(y_train)
X_test, y_test = keras.ops.convert_to_tensor(X_test), keras.ops.convert_to_tensor(y_test)

#### Linear regression model

We will fit a simple model of the form

$$
y = f_\theta(\mathbf{x}) + \epsilon,
$$

where $y\in\mathbb{R}$ is the target variable, $\mathbf{x}\in\mathbb{R}^{10}$ are the input features, $\theta\in\mathbb{R}^{11}$ are the model parameters, $\epsilon\sim\mathcal{N}(0, 1)$ is the observation noise random variable, and $f_\theta:\mathbb{R}^{10}\mapsto\mathbb{R}$ is given by

$$
\begin{align}
f_\theta(\mathbf{x}) &= \theta_0 + \sum_{m=1}^{10} \theta_m x_m\\
&= \sum_{m=0}^{10} \theta_m x_m.
\end{align}
$$

In the second line above we have defined $x_0=1$ to be the constant feature. The maximum likelihood solution is given by the normal equation

$$
\theta_{ML} = \left(\mathbf{X}^T \mathbf{X}\right)^{-1}\mathbf{X}^T\mathbf{y},
$$

where $\mathbf{X}\in\mathbb{R}^{N\times M}$ is the data matrix, $\mathbf{y}\in\mathbb{R}^N$ are the targets, $N$ is the number of data examples, and $M$ are the number of features (including the constant feature).

You should now complete the following function to implement the normal equation to compute the maximum likelhood solution. Your code should only use Keras functions. 

* The arguments to the function are an `inputs` Tensor of shape `(num_examples, num_features)`, and a `targets` Tensor of shape `(num_examples,)`
* The function should add a column of ones as the first column to the `inputs` Tensor for the constant feature
* The function should output a 1-D Tensor of parameters of length `(num_features + 1,)` (the first entry will be the bias)

_Hint: check [the docs](https://keras.io/api/) for relevant Keras functions._

In [None]:
#### GRADED CELL ####

# Complete the following function. 
# Make sure to not change the function name or arguments.

def normal_equation(inputs, targets):
    """
    This function takes in inputs and targets Tensors, and implements the normal equation
    as above, only using Keras functions.
    Your function should return a Tensor for the maximum likelihood solution for the parameters.
    """
    
    

In [None]:
# Run your function to compute the ML estimate

theta_ml = normal_equation(X_train, y_train)
bias_ml, weights_ml = theta_ml[0], theta_ml[1:]
print("MLE weights:")
print(weights_ml)
print("MLE bias:")
print(bias_ml)

#### Gradient descent

You will now implement the (batch) gradient descent algorithm to find the MLE using optimization. 

First, you should complete the following `get_variables` function to create Keras Variable objects for the weights and bias of the linear regression model.

* The function takes `num_features` as an argument
* The bias should be a Keras Variable with scalar shape, float32 type, and an initial value of zero
* The weights Variable should be a 1-D Tensor of length `num_features`, float32 type, and with initial values sampled from a standard normal distribution
* The function should return the tuple of Variables `(weights, bias)`

In [None]:
#### GRADED CELL ####

# Complete the following function. 
# Make sure to not change the function name or arguments.

def get_variables(num_features):
    """
    This function takes in the number of features as an argument, and creates Tensors
    for the linear regression model weights and bias, as well as an iteration counter Variable.
    Your function should return a tuple of two Tensor objects (weights, bias).
    """
    
    

In [None]:
# Run your function to create the Variables

weights, bias = get_variables(num_features=10)

Now define the model itself by completing the following function. This function implements $f_\theta(\mathbf{x}) = \theta_0 + \sum_{m=1}^{10} \theta_m x_m$ as above.

* The function takes an `inputs` Tensor, `weights` and `bias` Variables as input
* The `inputs` Tensor could be a batch of inputs of shape `(batch_size, num_features)`, or a single set of inputs of shape `(num_features,)`
* The function should return the output Tensor $f_\theta(\mathbf{x})$
* The output Tensor should have shape `(batch_size,)` (if passed a batch of inputs), or else should be a scalar

In [None]:
#### GRADED CELL ####

# Complete the following function. 
# Make sure to not change the function name or arguments.

def f(inputs, weights, bias):
    """
    This function takes in an inputs Tensor, weights and bias Variables. It should compute and 
    return the output Tensor prediction. 
    """
    
    

In [None]:
# Test your function on some dummy inputs

inputs = keras.random.normal((3, 10), dtype='float32')
print(f(inputs, weights, bias))

inputs = keras.random.normal((10,), dtype='float32')
print(f(inputs, weights, bias))

We will need to define the loss function to optimise. As we have assumed Gaussian noise $\epsilon\sim\mathcal{N}(0, 1)$ and we are looking to find the maximum likelihood solution, this will be the mean squared error loss. The loss function that you should implement is therefore:

$$
{L}_{MSE}(\theta) = \frac{1}{M} \sum_{\mathbf{x}_i, y_i\in\mathcal{D}} (y_i - \hat{y}_i)^2
$$

where $\hat{y}_i = f_\theta(\mathbf{x}_i)$, and $(\mathbf{x}_i, y_i)$ is an example input and output from the training dataset $\mathcal{D}$. The function specifications are as follows:

* The `mse` function takes two Tensors as arguments: `y_true` and `y_pred`
* These two Tensors will have shape `(num_examples,)`
* The loss function should compute and return the mean squared error loss (MSE) as a scalar Tensor
* Use only Keras functions inside your function

In [None]:
#### GRADED CELL ####

# Complete the following function. 
# Make sure to not change the function name or arguments.

def mse(y_true, y_pred):
    """
    This function takes a batch of 'ground truth' values y_true and a corresponding batch
    of model predictions y_pred, and computes the mean squared error.
    Your function should return the MSE as a scalar Tensor.
    """
    
    

In [None]:
# Compute the initial loss on the training data

mse(y_train, f(X_train, weights, bias))

In [None]:
# Compute the train and test loss of the MLE

print(f"MLE train loss: {mse(y_train, f(X_train, weights_ml, bias_ml)):.4f}")
print(f"MLE test loss: {mse(y_test, f(X_test, weights_ml, bias_ml)):.4f}")

The following function implements the update step of gradient descent, that we will use inside the training loop. Recall this update uses the gradient of the loss with respect to the model parameters to make the update:

$$
\theta_{t+1} = \theta_{t} - \eta \nabla_\theta {L}_{MSE}(\theta_t),\qquad t\in\mathbb{N}_0,
$$

where $\eta>0$ is the learning rate.

* The `gradient_descent_update` function takes the following arguments:
  * `model_fn` is the function that defines the predictive function (the function `f` above)
  * `inputs` and `targets` are the inputs and targets Tensors, of shape `(num_examples, 10)` and `(num_examples,)` respectively
  * `w` and `b` are the Variables that represent the model parameters
  * The `learning_rate` is the gradient descent hyperparameter
* The function should compute the gradient descent update step (assuming the mean squared error loss as above), updating the `w` and `b` Variables accordingly, using the `learning_rate` passed in
* The function should not return anything. The weights and bias are updated in-place.

In [None]:
#### GRADED CELL ####

# Complete the following function. 
# Make sure to not change the function name or arguments.

def gradient_descent_update(model_fn, inputs, targets, w, b, learning_rate=0.01):
    """
    This function takes the model function, inputs batch, targets batch, weights Tensor,
    bias Tensor and learning rate as arguments. It should update the Tensors w and b
    using the SGD update rule above for the MSE loss.
    """
    
    

In [None]:
# Test your gradient descent update function

print("Before the update:")
print(weights.value)
print(bias.value)
gradient_descent_update(f, X_train, y_train, weights, bias, learning_rate=0.05)
print("\nAfter the update:")
print(weights.value)
print(bias.value)

You are now ready to write the training loop in the following function. The training loop consists of a pre-defined number of iterations, where one iteration consists of the gradient descent update step which updates the weights and biases according to the gradient descent update rule. 

You should complete the following `training_loop` function according to the specifications:

* The function takes the following arguments:
  * `num_iterations`: a positive integer that defines the number of iterations to run the training loop
  * `model_fn`: as before, the function that defines the predictive function
  * `training_data`: a 2-tuple of Tensors `(inputs, targets)` for the complete training data
  * `w`: the Variable that represents the model weights
  * `b`: the Variable that represents the model bias
  * `mse`: the loss function to evaluate the model (this will be your `mse` function above)
  * `gradient_descent_update`: the function that implements the gradient descent update (this will be your `gradient_descent_update` function above)
  * `learning_rate`: the learning rate for the gradient descent update
* The function should perform the gradient descent update `num_iterations` times
  * For each iteration, the parameters `w` and `b` should be updated according to `gradient_descent_update`, using the `learning_rate` provided
* After every update, the model loss should be evaluated on the training data using the `mse` function and appended to a list as a scalar float
* The list of losses `losses` should then be returned by the function

In [None]:
#### GRADED CELL ####

# Complete the following function. 
# Make sure to not change the function name or arguments.

def training_loop(num_iterations, model_fn, training_data, w, b, 
                  mse=mse, gradient_descent_update=gradient_descent_update, learning_rate=0.01):
    """
    This function executes the training loop according to the specifications above. 
    It should run for num_iterations, updating the model parameters using the 
    gradient_descent_update function at every iteration.
    The function should return the list of losses computed at each iteration, 
    using the mse function.
    """
    
    

In [None]:
# Re-initialise the model parameters and run the training loop

weights, bias = get_variables(num_features=10)
losses = training_loop(1000, f, (X_train, y_train), weights, bias, 
                       mse=mse, gradient_descent_update=gradient_descent_update, learning_rate=0.01)

In [None]:
# Plot the losses

plt.plot(losses)
plt.title("Loss vs iterations")
plt.xlabel("Iterations")
plt.ylabel("MSE loss")
plt.show()

In [None]:
# Compute the train and test loss of the learned weights

print(f"Model train loss: {mse(y_train, f(X_train, weights, bias)):.4f}")
print(f"Model test loss: {mse(y_test, f(X_test, weights, bias)):.4f}")

Compare your learned weights and bias to the exact solution computed earlier. They should be fairly close:

In [None]:
# Print the learned weights and bias

print("Learned weights:")
print(keras.ops.convert_to_numpy(weights))
print("Learned bias:")
print(keras.ops.convert_to_numpy(bias))

In [None]:
# Print the exact weights and bias

print("Exact ML weights:")
print(keras.ops.convert_to_numpy(weights_ml))
print("Exact ML bias:")
print(keras.ops.convert_to_numpy(bias_ml))

Congratulations on completing this week's assignment! You have now implemented the exact linear regression solution in Keras as well as the gradient descent algorithm for approximating the maximum likelihood parameters. 