# Neural network training exercises

**1. Load the [diabetes dataset](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_diabetes.html) from `sklearn`. Extract the inputs and targets from the dataset into numpy arrays, and convert the numpy arrays to have float32 type.** 

In [None]:
from sklearn.datasets import load_diabetes
diabetes_dataset = load_diabetes()

In [None]:
print(diabetes_dataset["DESCR"])

In [None]:
print(diabetes_dataset.keys())

In [None]:
import numpy as np

data = diabetes_dataset["data"].astype(np.float32)
targets = diabetes_dataset["target"].astype(np.float32)

In [None]:
data.shape, targets.shape

**2. Create training and validation splits with a 80/20 ratio. Compute the mean $\mu_{train}$ and standard deviation $\sigma_{train}$ of the targets from the training split, and normalise the training and validation targets $y$ by computing $(y - \mu_{train})/\sigma_{train}$.**

In [None]:
from sklearn.model_selection import train_test_split

train_data, val_data, train_targets, val_targets = train_test_split(data, targets, test_size=0.2) 

print(train_data.shape)
print(val_data.shape)
print(train_targets.shape)
print(val_targets.shape)

In [None]:
mu_train = train_targets.mean()
std_train = train_targets.std()
print(mu_train, std_train)

In [None]:
train_targets = (train_targets - mu_train) / std_train
val_targets = (val_targets - mu_train) / std_train

**3. Define an MLP model to train on the diabetes dataset. Your model should have three hidden layers of size 256 neurons each, and using a ReLU activation function. The final layer should have a single neuron with no activation function, to predict the target value.**

In [None]:
import keras
from keras import ops

In [None]:
from keras.models import Sequential
from keras.layers import Input, Dense

model = Sequential([
    Input(shape=(train_data.shape[-1],)),
    Dense(256, activation="relu"),
    Dense(256, activation="relu"),
    Dense(256, activation="relu"),
    Dense(1)
])

In [None]:
model.summary()

**4. Compile your model with the MSE loss and Adam optimizer. Train the model for 100 epochs, passing the validation data to the `validation_data` argument. Plot the training and validation curves. Are there signs of overfitting/underfitting?**

In [None]:
model.compile(optimizer='adam', loss="mse")
history = model.fit(train_data, train_targets, epochs=100, validation_data=(val_data, val_targets), verbose=False)

In [None]:
import matplotlib.pyplot as plt

plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('Loss vs. epochs')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['Training', 'Validation'], loc='upper right')
plt.show()

**5. Re-define your model by adding dropout after each hidden layer with a dropout rate of 0.5, and adding L2 regularisation to each hidden layer with a regularisation coefficient of 1e-5. Re-compile and re-train your new model, again for 100 epochs. Plot the training and validation curves. Has the regularisation made a difference?**

In [None]:
from keras.layers import Dropout
from keras import regularizers

l2_coeff = 1e-5
rate = 0.5

def get_regularised_model():
    model = Sequential([
        Input(shape=(train_data.shape[-1],)),
        Dense(256, kernel_regularizer=regularizers.l2(l2_coeff), activation="relu"),
        Dropout(rate),
        Dense(256, kernel_regularizer=regularizers.l2(l2_coeff), activation="relu"),
        Dropout(rate),
        Dense(256, kernel_regularizer=regularizers.l2(l2_coeff), activation="relu"),
        Dropout(rate),
        Dense(1)
    ])
    return model
model = get_regularised_model()

In [None]:
model.compile(optimizer='adam', loss="mse")
history = model.fit(train_data, train_targets, epochs=100, validation_data=(val_data, val_targets), verbose=False)

In [None]:
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('Loss vs. epochs')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['Training', 'Validation'], loc='upper right')
plt.show()

**6. Re-initialise the same regularised model from Q5. This time, compile the model including a mean absolute error (MAE) metric, and train for 100 epochs using early stopping, where the early stopping is monitoring validation MAE performance, and has a patience of 10 epochs. Plot the training and validation curves.**

In [None]:
model = get_regularised_model()
model.compile(optimizer='adam', loss="mse", metrics=['mae']) 
earlystopping = keras.callbacks.EarlyStopping(monitor='val_mae', patience=10)
history = model.fit(train_data, train_targets, epochs=100, validation_data=(val_data, val_targets), verbose=False,
                    callbacks=[earlystopping])

In [None]:
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.plot(history.history['val_mae'])
ymax, ymin = plt.gca().get_ylim()
plt.vlines(earlystopping.best_epoch, ymax=ymax, ymin=ymin, linestyle='--', color='r')
plt.title('Loss vs. epochs')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.xticks(np.arange(len(history.history['loss'])))
plt.legend(['Training', 'Val loss', 'Val MAE', 'Best epoch'], loc='upper right')
plt.show()