# Deep learning with fully connected artificial neural networks
#### Part of the course on "Foundations of machine learning", Department of Mathematics and Statistics, University of Turku, Finland
#### Lectures available on YouTube: https://youtube.com/playlist?list=PLbkSohdmxoVAZ9DEHEWHjeGK7Ei-DjKHI&si=Msu74_I0qhLrRWcu
#### Code available on GitHub: https://github.com/ionpetre/FoundML_course_assignments

#### This notebook is partially based on the following sources: 

> https://www.tensorflow.org/tutorials/keras/classification

We demonstrate in this notebook the use of fully connected neural networks for classification and regression. We use the tensorflow and keras as the deep learning Pyhton libraries. 

Datasets used in this notebook: Fashion MNIST, California housing, CIFAR-10. 

In [None]:
# From https://www.tensorflow.org/tutorials/keras/classification:

# MIT License
#
# Copyright (c) 2017 François Chollet
#
# Permission is hereby granted, free of charge, to any person obtaining a
# copy of this software and associated documentation files (the "Software"),
# to deal in the Software without restriction, including without limitation
# the rights to use, copy, modify, merge, publish, distribute, sublicense,
# and/or sell copies of the Software, and to permit persons to whom the
# Software is furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
# THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
# DEALINGS IN THE SOFTWARE.

#### Load the libraries

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix, classification_report

In [None]:
# Reset the seed of the random number generator, for reproducibility purposes

import tensorflow as tf
from keras.utils import set_random_seed

def reset_seed(SEED = 0):

    # Set the seed using keras.utils.set_random_seed. This will set:
    # 1) `numpy` seed
    # 2) `tensorflow` random seed
    # 3) `python` random seed
    set_random_seed(SEED)

    # This will make TensorFlow ops as deterministic as possible, but it will
    # affect the overall performance, so it's not enabled by default.
    # `enable_op_determinism()` is introduced in TensorFlow 2.9.
    tf.config.experimental.enable_op_determinism()


reset_seed(2023)

## I. Demo an fully connected neural network classifiers on the fashion MNIST dataset

#### The fashion MNIST dataset: 

This is a dataset of 60,000 28x28 grayscale images of 10 fashion categories, along with a test set of 10,000 images. This dataset can be used as a drop-in replacement for MNIST.

The classes are:

| Label | Description   |
|-------|---------------|
|    0  | T-shirt/top   |
|    1  |	Trouser     |
|    2  |	Pullover    |
|    3  |	Dress       |
|    4  |	Coat        |
|    5  |	Sandal      |
|    6  |	Shirt       |
|    7  |	Sneaker     |
|    8  |	Bag         |
|    9  |	Ankle boot  |

License: The copyright for Fashion-MNIST is held by Zalando SE. Fashion-MNIST is licensed under the MIT license.

The data is available from the Keras datasets. 

In [None]:
from keras.datasets import fashion_mnist
from keras.utils import to_categorical

(X_train_valid, y_train_valid), (X_test, y_test) = fashion_mnist.load_data()

print('We have %2d training pictures and %2d test pictures.' % (X_train_valid.shape[0],X_test.shape[0]))
print('Each picture is of size (%2d,%2d)' % (X_train_valid.shape[1], X_train_valid.shape[2]))

#### Data preprocessing
The data must be preprocessed before training the network. If you inspect the first image in the training set, you will see that the pixel values fall in the range of 0 to 255:

In [None]:
plt.figure()
plt.imshow(X_train_valid[0])
plt.colorbar()
plt.grid(False)
plt.show()

In [None]:
# Scale the data into [0,1] by dividing to 255

X_train_valid_std = X_train_valid/255
X_test_std  = X_test/255

In [None]:
# Display some images

class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
               'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']

plt.figure(figsize=(20,12))
for i in range(50):
    plt.subplot(5,10,i+1)
    plt.xticks([])
    plt.yticks([])
    plt.grid(False)
    plt.imshow(X_train_valid_std[i])
    plt.xlabel(class_names[y_train_valid[i]])
plt.show()

In [None]:
# Is the dataset balanced?

y_train_valid_count = np.unique(y_train_valid, return_counts=True)
df_y_train_valid = pd.DataFrame({'Label':y_train_valid_count[0], 'Count':y_train_valid_count[1]})
df_y_train_valid

# A: YES!

In [None]:
# Train - validation split

X_train_std, X_valid_std, y_train, y_valid = train_test_split(
    X_train_valid_std, 
    y_train_valid, 
    test_size=0.2, 
    random_state=150, 
    stratify=y_train_valid,
    shuffle=True
)

# Check the result of the data split

print('# of training images:', X_train_std.shape[0])
print('# of validation images:', X_valid_std.shape[0])

#### Train a fully connected neural network classifier on the fashion MNIST dataset

We will use Keras, one of the most popular libraries for deep learning.
Our network consists of a sequence of three `Dense` layers, with 128, 64, and 32 neurons, which are fully connected. We chose "relu" as the activation function. We also have an  10-neuron "softmax" output layer.

Three more ingredients are to be chosen in the "compilation" phase of the model: 
* A loss function to quantify the current error of the model; 
* An optimizer: this is the mechanism through which the network will update itself based on the data it sees and its loss function.
* Metrics to monitor during training and testing. Here we will only care about accuracy (the fraction of the images that were correctly classified).

In [None]:
# The model can be setup by specifying each layer: 
#          its type, its size, its activation function.

from keras import models
from keras import layers

    


ANNmodel = models.Sequential([
    layers.Input(shape=(28, 28)),
    layers.Flatten(),
    layers.Dense(128, activation='relu'),
    layers.Dense(64, activation='relu'),
    layers.Dense(32, activation='relu'),
    layers.Dense(10, activation='softmax'),
])

In [None]:
# The model must be compiled by specifying the numerical optimizer algorithm, 
#     the loss function, and metrics to be followed up epoch by epoch

ANNmodel.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=1e-3),
    loss=tf.keras.losses.CategoricalCrossentropy(),
    metrics=[tf.keras.metrics.CategoricalAccuracy(), 
             tf.keras.metrics.TruePositives(),
            ],
)

ANNmodel.summary()

#### Our model has 111 146 parameters, possibly quite many for the size of our dataset (48 000 images 28 x 28). Let's see how it works. 

In [None]:
# Encode the labels from numerical to categorical

from keras.utils import to_categorical

y_train_cat = to_categorical(y_train, num_classes=10)
y_valid_cat = to_categorical(y_valid, num_classes=10)
y_test_cat = to_categorical(y_test, num_classes=10)

In [None]:
# We reset all variables implicitly instantiated by Keras/tensorflow
tf.keras.backend.clear_session()

# We reset the random number generators, for reproducibility purposes 
reset_seed(2023)


# This callback will stop the training when there is no improvement in the loss 
#      for three consecutive epochs.
callback = tf.keras.callbacks.EarlyStopping(monitor='loss', patience=10)



# Fit the model by specifying the number of epochs and the batch size
# We also indicate the validation data so we can collect the evolution 
#      of the metrics through the epochs, both on train, as well as on validation.

ANN_fit_history = ANNmodel.fit(X_train_std,
                               y_train_cat, 
                               epochs=300, 
                               batch_size=128,
                               callbacks=[callback],
                               validation_data=(X_valid_std, y_valid_cat)
                              )


Note that the call to `model.fit()` returns a `History` object. This object has a member `history`, which is a dictionary containing data 
about everything that happened during training. Let's take a look at it:

In [None]:
history_dict = ANN_fit_history.history
print(history_dict.keys())

In [None]:
# Plot the evolution of the loss and the accurayc throughout the epochs
# This is useful to find over-fitting and decide on early stopping of the training. 

import matplotlib.pyplot as plt

train_loss = history_dict['loss']
val_loss = history_dict['val_loss']
train_acc = history_dict['categorical_accuracy']
val_acc = history_dict['val_categorical_accuracy']
train_tp = np.array(history_dict['true_positives']) / X_train_std.shape[0]       # normalized true positives
val_tp = np.array(history_dict['val_true_positives']) / X_valid_std.shape[0]     # normalized true positives 
epochs = range(1, len(train_loss) + 1)


plt.figure(figsize=(20, 5))

plt.subplot(1,3,1)
plt.plot(epochs, train_loss, 'b', label='Training cat. cross-entropy')
plt.plot(epochs, val_loss, 'r', label='Validation cat. cross-entropy')
plt.title('Training and validation loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()


plt.subplot(1,3,2)
plt.plot(epochs, train_acc, 'b', label='Training accuracy')
plt.plot(epochs, val_acc, 'r', label='Validation accuracy')
plt.title('Training and validation accuracy')
plt.xlabel('Epochs')
plt.ylabel('Categorical accuracy')
plt.legend()

plt.subplot(1,3,3)
plt.plot(epochs, train_tp, 'b', label='Training TP')
plt.plot(epochs, val_tp, 'r', label='Validation TP')
plt.title('Training and validation true positives')
plt.xlabel('Epochs')
plt.ylabel('True positives')
plt.legend()

plt.show()

#### We can see in the loss lines that the model is overfit: it learns well the training data, it starts well also on the validation data, but after a while the performance on validation gets bad. We can see the improvment on the training data and the stagnation on the validation data also in the accuracy and in the true positive rate. 

In [None]:
# Use the model to predict in the form of a 10-class probability distribution
y_train_prob = ANNmodel.predict(X_train_std)

# Select the most likely class
y_train_pred=np.argmax(y_train_prob, axis=1)

print("\n The classification results on the train data:")
print(classification_report(y_train,y_train_pred))
print("Confusion matrix (train data):\n", confusion_matrix(y_train,y_train_pred))




# The classification results for the validation data

y_valid_prob = ANNmodel.predict(X_valid_std)
y_valid_pred=np.argmax(y_valid_prob, axis=1)
print("\n The classification results on the validation data:")
print(classification_report(y_valid,y_valid_pred))
print("Confusion matrix (validation data):\n", confusion_matrix(y_valid,y_valid_pred))

Let's visualise some of the predictions to see where the model is wrong.
We display the correct prediction labels in blue and the incorrect prediction labels in red. The number gives the percentage (out of 100) for the predicted label.

In [None]:
# Plot the first X validation images, their predicted labels, and the true labels in parenthesis.
# Color correct predictions in blue and incorrect predictions in red.


def plot_image(i, predictions_array, true_label, img):
    true_label, img = true_label[i], img[i]
    plt.grid(False)
    plt.xticks([])
    plt.yticks([])
    plt.imshow(img)

    predicted_label = np.argmax(predictions_array)
    if predicted_label == true_label:
        color = 'blue'
    else:
        color = 'red'

    plt.xlabel("{} {:2.0f}% ({})".format(class_names[predicted_label],
                                100*np.max(predictions_array),
                                class_names[true_label]),
                                color=color)

def plot_value_array(i, predictions_array, true_label):
    true_label = true_label[i]
    plt.grid(False)
    plt.xticks(range(10))
    plt.yticks([])
    thisplot = plt.bar(range(10), predictions_array, color="#777777")
    plt.ylim([0, 1])
    predicted_label = np.argmax(predictions_array)

    thisplot[predicted_label].set_color('red')
    thisplot[true_label].set_color('blue')




num_rows = 10
num_cols = 2
num_images = num_rows*num_cols
plt.figure(figsize=(3*2*num_cols, 2*num_rows))

for i in range(num_images):
    plt.subplot(num_rows, 2*num_cols, 2*i+1)
    plot_image(i, y_valid_prob[i], y_valid, X_valid_std)
    plt.subplot(num_rows, 2*num_cols, 2*i+2)
    plot_value_array(i, y_valid_prob[i], y_valid)
    plt.xticks(range(10), class_names, rotation=90)
    
plt.tight_layout()
plt.show()

#### Let's train a smaller model, hoping to get less overfit. 

In [None]:
# We reset all variables implicitly instantiated by Keras/tensorflow 
#      (especially the internal names for layers and for the fit history)
tf.keras.backend.clear_session()

# We reset the random number generators, for reproducibility purposes 
reset_seed(2023)



# Only 2 smaller layers this time, plus the output layer
# A drop from 111 146 parameters to 25 818 parameters.

ANNmodel = models.Sequential([
    layers.Input(shape=(28, 28)),
    layers.Flatten(),
    layers.Dense(32, activation='relu'),
    layers.Dense(16, activation='relu'),
    layers.Dense(10, activation='softmax'),
])

# The model must be compiled by specifying the numerical optimizer algorithm, 
#     the loss function, and metrics to be followed up epoch by epoch

ANNmodel.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=1e-3),
    loss=tf.keras.losses.CategoricalCrossentropy(),
    metrics=[tf.keras.metrics.CategoricalAccuracy(), 
             tf.keras.metrics.TruePositives(),
            ],
)

print(ANNmodel.summary())


# This callback will stop the training when there is no improvement in the loss 
#      for three consecutive epochs.
callback = tf.keras.callbacks.EarlyStopping(monitor='loss', patience=10)


# Fit the model by specifying the number of epochs and the batch size
ANN_fit_history = ANNmodel.fit(X_train_std, 
                               y_train_cat, 
                               epochs=300, 
                               batch_size=128,
                               callbacks=[callback],
                               validation_data=(X_valid_std, y_valid_cat)
                              )

history_dict = ANN_fit_history.history

In [None]:
# Plot the evolution of the loss and the accurayc throughout the epochs
# This is useful to find over-fitting and decide on early stopping of the training. 

print(history_dict.keys())

train_loss = history_dict['loss']
val_loss = history_dict['val_loss']
train_acc = history_dict['categorical_accuracy']
val_acc = history_dict['val_categorical_accuracy']
train_tp = np.array(history_dict['true_positives']) / X_train_std.shape[0]
val_tp = np.array(history_dict['val_true_positives']) / X_valid_std.shape[0]
epochs = range(1, len(train_loss) + 1)


plt.figure(figsize=(20,5))

plt.subplot(1,3,1)
plt.plot(epochs, train_loss, 'b', label='Training cat. cross-entropy')
plt.plot(epochs, val_loss, 'r', label='Validation cat. cross-entropy')
plt.title('Training and validation loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()


plt.subplot(1,3,2)
plt.plot(epochs, train_acc, 'b', label='Training accuracy')
plt.plot(epochs, val_acc, 'r', label='Validation accuracy')
plt.title('Training and validation accuracy')
plt.xlabel('Epochs')
plt.ylabel('Categorical accuracy')
plt.legend()

plt.subplot(1,3,3)
plt.plot(epochs, train_tp, 'b', label='Training TP')
plt.plot(epochs, val_tp, 'r', label='Validation TP')
plt.title('Training and validation true positives')
plt.xlabel('Epochs')
plt.ylabel('True positives')
plt.legend()

plt.show()

#### The model remains overfit!

In [None]:
# Use the model to predict in the form of a 10-class probability distribution
y_train_prob = ANNmodel.predict(X_train_std)

# Select the most likely class
y_train_pred=np.argmax(y_train_prob, axis=1)

print("\n The classification results on the train data:")
print(classification_report(y_train,y_train_pred))
print("Confusion matrix (train data):\n", confusion_matrix(y_train,y_train_pred))




# The classification results for the validation data

y_valid_prob = ANNmodel.predict(X_valid_std)
y_valid_pred=np.argmax(y_valid_prob, axis=1)
print("\n The classification results on the validation data:")
print(classification_report(y_valid,y_valid_pred))
print("Confusion matrix (validation data):\n", confusion_matrix(y_valid,y_valid_pred))

Note: the average accuracy is about the same for the smaller model as for the larger one. The model still seems overfit!

In [None]:
# The classification results for the test data

y_test_prob = ANNmodel.predict(X_test_std)
y_test_pred=np.argmax(y_test_prob, axis=1)
print("\n The classification results on the test data:")
print(classification_report(y_test,y_test_pred))
print("Confusion matrix (test data):\n", confusion_matrix(y_test,y_test_pred))

In [None]:
del X_train_valid
del X_train_valid_std
del X_train_std
del X_valid_std
del X_test
del X_test_std
del y_train
del y_train_prob
del y_train_pred
del y_valid
del y_valid_prob
del y_valid_pred
del y_test
del y_test_prob
del y_test_pred
del ANNmodel

### A deep learning regression model
#### Data: the California housing dataset

In [None]:
# Load the dataset from sklearn, add the target to the main dataset

from sklearn.datasets import fetch_california_housing

calif_X, calif_y = fetch_california_housing(return_X_y=True, as_frame=True)
display(calif_X)

In [None]:
# Split the data into train/validation/test

X_train_valid, X_test, y_train_valid, y_test = train_test_split(
    calif_X, 
    calif_y, 
    test_size=0.2, 
    random_state=120, 
    shuffle=True
)

X_train, X_valid, y_train, y_valid = train_test_split(
    X_train_valid, 
    y_train_valid, 
    test_size=0.25, 
    random_state=120, 
    shuffle=True
)

del X_train_valid
del y_train_valid

# convert to pandas dataframe
X_train = pd.DataFrame(X_train, columns=calif_X.columns)
X_valid = pd.DataFrame(X_valid, columns=calif_X.columns)
X_test = pd.DataFrame(X_test, columns=calif_X.columns)
y_train = pd.DataFrame(y_train)
y_valid = pd.DataFrame(y_valid)
y_test = pd.DataFrame(y_test)

del calif_X
del calif_y

In [None]:
X_train.info()

In [None]:
y_train.describe()

In [None]:
# Standardise the data

from sklearn.preprocessing import StandardScaler

std_scaler = StandardScaler()
std_scaler.fit(X_train)

X_train_std = std_scaler.transform(X_train)
X_valid_std = std_scaler.transform(X_valid)
X_test_std  = std_scaler.transform(X_test)

In [None]:
# We reset all variables implicitly instantiated by Keras/tensorflow 
#      (especially the internal names for layers and for the fit history)
tf.keras.backend.clear_session()

# We reset the random number generators, for reproducibility purposes 
reset_seed(2023)

In [None]:
# Design the model
# For a regression model, the output layer will have a single neuron.
# In the input layer we need to specify the input size. 
# The model has 3 dense layers of size 128/64/32 and a single neuron output layer. 
# This gives 11 521 parameters. 

from keras import models
from keras import layers

ANNmodel = models.Sequential([
    layers.Input(shape=(8,)),
    layers.Flatten(),
    layers.Dense(128, activation='relu'),
    layers.Dense(64, activation='relu'),
    layers.Dense(32, activation='relu'),
    layers.Dense(1),
])


# The model must be compiled by specifying the numerical optimizer algorithm, 
#     the loss function, and metrics to be followed up epoch by epoch

ANNmodel.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=1e-3),
    loss='mean_squared_error',
    metrics=[
       # tf.keras.metrics.MeanAbsoluteError(),
       # tf.keras.metrics.MeanAbsolutePercentageError(),
        tf.keras.metrics.R2Score(),
        ],
)

print(ANNmodel.summary())

In [None]:
# This callback will stop the training when there is no improvement in the loss 
#      for three consecutive epochs.
callback = tf.keras.callbacks.EarlyStopping(monitor='loss', patience=10)


# Fit the model by specifying the number of epochs and the batch size
ANN_fit_history = ANNmodel.fit(X_train_std, 
                               y_train, 
                               epochs=100, 
                               batch_size=32,
                               callbacks=[callback],
                               validation_data=(X_valid_std, y_valid)
                              )

history_dict = ANN_fit_history.history

In [None]:
# Plot the evolution of the loss and the accurayc throughout the epochs
# This is useful to find over-fitting and decide on early stopping of the training. 

print(history_dict.keys())

train_loss = history_dict['loss']
val_loss = history_dict['val_loss']
#train_mape = history_dict['mean_absolute_percentage_error']
#val_mape = history_dict['val_mean_absolute_percentage_error']
train_R2 = history_dict['r2_score']
val_R2 = history_dict['val_r2_score']
epochs = range(1, len(train_loss) + 1)


plt.figure(figsize=(12,4))

plt.subplot(1,2,1)
plt.plot(epochs, train_loss, 'b', label='Training MSE')
plt.plot(epochs, val_loss, 'r', label='Validation MSE')
plt.title('Train and validation loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()


plt.subplot(1,2,2)
plt.plot(epochs, train_R2, 'b', label='Training R2')
plt.plot(epochs, val_R2, 'r', label='Validation R2')
plt.title('R2 scores')
plt.xlabel('Epochs')
plt.ylabel('Train and validation R2 scores')
plt.legend()

plt.show()


####  Looking at the plots it looks like it is overfit. So we need a smaller model. 

> Looking at the mean absolute percentage value, the model does not do well even on the train data. This may reflect the relatively limited dataset: too few features, possibly too few datapoints. 

In [None]:
# We reset all variables implicitly instantiated by Keras/tensorflow 
#      (especially the internal names for layers and for the fit history)
tf.keras.backend.clear_session()

# We reset the random number generators, for reproducibility purposes 
reset_seed(2023)

In [None]:
# Design the model
# The model has 2 dense layers of size 16/8 and a single neuron output layer. 
# This gives 289 parameters. 

from keras import models
from keras import layers

ANNmodel = models.Sequential([
    layers.Input(shape=(8,)),
    layers.Flatten(),
    layers.Dense(16, activation='relu'),
    layers.Dense(8, activation='relu'),
    layers.Dense(1),
])



# The model must be compiled by specifying the numerical optimizer algorithm, 
#     the loss function, and metrics to be followed up epoch by epoch

ANNmodel.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=1e-3),
    loss='mean_squared_error',
    metrics=[
       # tf.keras.metrics.MeanAbsoluteError(),
       # tf.keras.metrics.MeanAbsolutePercentageError(),
        tf.keras.metrics.R2Score(),
        ],
)

print(ANNmodel.summary())

In [None]:
# This callback will stop the training when there is no improvement in the loss 
#      for three consecutive epochs.
callback = tf.keras.callbacks.EarlyStopping(monitor='r2_score', patience=10)


# Fit the model by specifying the number of epochs and the batch size
ANN_fit_history = ANNmodel.fit(X_train_std, 
                               y_train, 
                               epochs=100, 
                               batch_size=32,
                               callbacks=[callback],
                               validation_data=(X_valid_std, y_valid)
                              )

history_dict = ANN_fit_history.history

In [None]:
# Plot the evolution of the loss and the accurayc throughout the epochs
# This is useful to find over-fitting and decide on early stopping of the training. 

print(history_dict.keys())

train_loss = history_dict['loss']
val_loss = history_dict['val_loss']
#train_mape = history_dict['mean_absolute_percentage_error']
#val_mape = history_dict['val_mean_absolute_percentage_error']
train_R2 = history_dict['r2_score']
val_R2 = history_dict['val_r2_score']
epochs = range(1, len(train_loss) + 1)


plt.figure(figsize=(12,4))

plt.subplot(1,2,1)
plt.plot(epochs, train_loss, 'b', label='Training MSE')
plt.plot(epochs, val_loss, 'r', label='Validation MSE')
plt.title('Train and validation loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()


plt.subplot(1,2,2)
plt.plot(epochs, train_R2, 'b', label='Training R2')
plt.plot(epochs, val_R2, 'r', label='Validation R2')
plt.title('R2 scores')
plt.xlabel('Epochs')
plt.ylabel('Train and validation R2 scores')
plt.legend()

plt.show()


#### Much better: the model is no longer overfit. It has learned great, R2 could be higher.

In [None]:
import sklearn.metrics as metrics
def regression_results(y_true, y_pred):

    # Regression metrics
    mse=metrics.mean_squared_error(y_true, y_pred) 
    mae=metrics.mean_absolute_error(y_true, y_pred) 
    mape=metrics.mean_absolute_percentage_error(y_true, y_pred)
    r2=metrics.r2_score(y_true, y_pred)
    
    print('MSE: ', round(mse,4))
    print('MAE: ', round(mae,4))
    print('MAPE: ', round(mape,4))
    print('R2: ', round(r2,4))
    

In [None]:
# The regression results for the test data

y_test_pred = ANNmodel.predict(X_test_std)
print("Regression results on the test dataset:")
regression_results(y_test,y_test_pred)

## Challenge: The CIFAR-10 dataset
#### Note: This is a notoriously difficult dataset to learn using fully connected neural networks. Let's see how well we can learn it!

The CIFAR-10 and CIFAR-100 are labeled subsets of the 80 million tiny images dataset. They were collected by Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images. The 10 classes are:

| Label | Description   |
|-------|---------------|
|    0  | Airplane   |
|    1  |	Automobile     |
|    2  |	Bird    |
|    3  |	Cat       |
|    4  |	Deer        |
|    5  |	Dog      |
|    6  |	Frog       |
|    7  |	Horse     |
|    8  |	Ship         |
|    9  |	Truck  |

The dataset is divided into five training batches and one test batch, each with 10000 images. The test batch contains exactly 1000 randomly-selected images from each class. The training batches contain the remaining images in random order, but some training batches may contain more images from one class than another. Between them, the training batches contain exactly 5000 images from each class. 

Webpage, including download: https://www.cs.toronto.edu/~kriz/cifar.html
Dataset on Keras: https://www.tensorflow.org/api_docs/python/tf/keras/datasets/cifar10

In [None]:
# We reset all variables implicitly instantiated by Keras/tensorflow 
#      (especially the internal names for layers and for the fit history)
tf.keras.backend.clear_session()

# We reset the random number generators, for reproducibility purposes 
reset_seed(2023)

In [None]:
from keras.datasets import cifar10
from keras.utils import to_categorical

(X_train_valid, y_train_valid), (X_test, y_test) = cifar10.load_data()
y_train_valid = y_train_valid.ravel()
y_test = y_test.ravel()

print('We have %2d training pictures and %2d test pictures.' % (X_train_valid.shape[0],X_test.shape[0]))
print('Each picture is of size (%2d,%2d)' % (X_train_valid.shape[1], X_train_valid.shape[2]))

In [None]:
# Scale the data into [0,1] by dividing to 255

X_train_valid_std = X_train_valid/255
X_test_std  = X_test/255

In [None]:
# Display some images

class_names = ['Airplane', 'Automobile', 'Bird', 'Cat', 'Deer', 'Dog', 'Frog', 'Horse', 'Ship', 'Truck']


plt.figure(figsize=(20,12))
for i in range(50):
    plt.subplot(5,10,i+1)
    plt.xticks([])
    plt.yticks([])
    plt.grid(False)
    plt.imshow(X_train_valid_std[i])
    plt.xlabel(class_names[int(y_train_valid[i])])
plt.show()


In [None]:
# Train - validation split 

X_train_std, X_valid_std, y_train, y_valid = train_test_split(
    X_train_valid_std, 
    y_train_valid, 
    test_size=0.2, 
    random_state=150, 
    stratify=y_train_valid,
    shuffle=True
)

# Check the result of the data split

print('# of training images:', X_train_std.shape[0])
print('# of validation images:', X_valid_std.shape[0])
print("Note the shape of the data (3 color channels):", X_train_std.shape)

In [None]:
# Encode the labels from numerical to categorical

from keras.utils import to_categorical

y_train_cat = to_categorical(y_train, num_classes=10)
y_valid_cat = to_categorical(y_valid, num_classes=10)
y_test_cat = to_categorical(y_test, num_classes=10)

In [None]:
# Train an ANN model with an input "Flatten" layer of shape (32, 32, 3), accounting for the 3 color channels,
#       followed by 3 layers of size 128/64/32, followed by an output layer of a suitable size.
# Choose 'relu' for the activation function of the hidden layers, and a suitable activation for the output layer. 
# Your code here



#### Q1. How many (trainable) parameters does your 128/64/32 model have? 

In [None]:
# We reset all variables implicitly instantiated by Keras/tensorflow
tf.keras.backend.clear_session()

# We reset the random number generators, for reproducibility purposes 
reset_seed(2023)


# This callback will stop the training when there is no improvement in the loss 
#      for three consecutive epochs.
callback = tf.keras.callbacks.EarlyStopping(monitor='loss', patience=10)

# Fit the model by specifying the number of epochs and the batch size
# We also indicate the validation data so we can collect the evolution 
#      of the metrics through the epochs, both on train, as well as on validation.

ANN_fit_history = ANNmodel.fit(X_train_std,
                               y_train_cat, 
                               epochs=300, 
                               batch_size=128,
                               callbacks=[callback],
                               validation_data=(X_valid_std, y_valid_cat)
                              )


In [None]:
history_dict = ANN_fit_history.history
print(history_dict.keys())

# Plot the evolution of the loss and the accurayc throughout the epochs
# This is useful to find over-fitting and decide on early stopping of the training. 

import matplotlib.pyplot as plt

train_loss = history_dict['loss']
val_loss = history_dict['val_loss']
train_acc = history_dict['categorical_accuracy']
val_acc = history_dict['val_categorical_accuracy']
train_tp = np.array(history_dict['true_positives']) / X_train_std.shape[0]
val_tp = np.array(history_dict['val_true_positives']) / X_valid_std.shape[0]
epochs = range(1, len(train_loss) + 1)


plt.figure(figsize=(20, 5))

plt.subplot(1,3,1)
plt.plot(epochs, train_loss, 'b', label='Training cat. cross-entropy')
plt.plot(epochs, val_loss, 'r', label='Validation cat. cross-entropy')
plt.title('Training and validation loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()


plt.subplot(1,3,2)
plt.plot(epochs, train_acc, 'b', label='Training accuracy')
plt.plot(epochs, val_acc, 'r', label='Validation accuracy')
plt.title('Training and validation accuracy')
plt.xlabel('Epochs')
plt.ylabel('Categorical accuracy')
plt.legend()

plt.subplot(1,3,3)
plt.plot(epochs, train_tp, 'b', label='Training TP')
plt.plot(epochs, val_tp, 'r', label='Validation TP')
plt.title('Training and validation true positives')
plt.xlabel('Epochs')
plt.ylabel('True positives')
plt.legend()

plt.show()

#### To think about: Based on the evolution of the loss throughout the epochs, do you consider the 128/64/32 model overfit (is the loss on the validation clearly increasing)?

#### Check the accuracy of the 128/64/32 model on the training and on the validation data: run the code below. 

In [None]:
# Use the model to predict in the form of a 10-class probability distribution
y_train_prob = ANNmodel.predict(X_train_std)

# Select the most likely class
y_train_pred=np.argmax(y_train_prob, axis=1)

print("\n The classification results on the train data:")
print(classification_report(y_train,y_train_pred))
print("Confusion matrix (train data):\n", confusion_matrix(y_train,y_train_pred))




# The classification results for the validation data

y_valid_prob = ANNmodel.predict(X_valid_std)
y_valid_pred=np.argmax(y_valid_prob, axis=1)
print("\n The classification results on the validation data:")
print(classification_report(y_valid,y_valid_pred))
print("Confusion matrix (validation data):\n", confusion_matrix(y_valid,y_valid_pred))


# The classification results for the test data

y_test_prob = ANNmodel.predict(X_test_std)
y_test_pred=np.argmax(y_test_prob, axis=1)
print("\n The classification results on the test data:")
print(classification_report(y_test,y_test_pred))
print("Confusion matrix (test data):\n", confusion_matrix(y_test,y_test_pred))

#### Q2. What is the accuracy of the 128/64/32 model on the training data? 
#### Q3. What is the accuracy of the 128/64/32 model on the validation data? 

In [None]:
# Plot the first X validation images, their predicted labels, and the true labels in parenthesis.
# Color correct predictions in blue and incorrect predictions in red.


def plot_image(i, predictions_array, true_label, img):
    true_label, img = int(true_label[i]), img[i]
    plt.grid(False)
    plt.xticks([])
    plt.yticks([])
    plt.imshow(img)

    predicted_label = np.argmax(predictions_array)
    if predicted_label == true_label:
        color = 'blue'
    else:
        color = 'red'

    plt.xlabel("{} {:2.0f}% ({})".format(class_names[predicted_label],
                                100*np.max(predictions_array),
                                class_names[true_label]),
                                color=color)

def plot_value_array(i, predictions_array, true_label):
    true_label = int(true_label[i])
    plt.grid(False)
    plt.xticks(range(10))
    plt.yticks([])
    thisplot = plt.bar(range(10), predictions_array, color="#777777")
    plt.ylim([0, 1])
    predicted_label = np.argmax(predictions_array)

    thisplot[predicted_label].set_color('red')
    thisplot[true_label].set_color('blue')




num_rows = 10
num_cols = 2
num_images = num_rows*num_cols
plt.figure(figsize=(3*2*num_cols, 2*num_rows))

for i in range(num_images):
    plt.subplot(num_rows, 2*num_cols, 2*i+1)
    plot_image(i, y_valid_prob[i], y_valid, X_valid_std)
    plt.subplot(num_rows, 2*num_cols, 2*i+2)
    plot_value_array(i, y_valid_prob[i], y_valid)
    plt.xticks(range(10), class_names, rotation=90)
    
plt.tight_layout()
plt.show()

In [None]:
# Train a smaller model. 
# Use an input "Flatten" layer of shape (32, 32, 3), accounting for the 3 color channels,
#       followed by 2 layers of size 64/32, followed by an output layer of a suitable size.
# Choose 'relu' for the activation function of the hidden layers, and a suitable activation for the output layer. 
# Your code here



#### Q4. How many (trainable) parameters does the 64/32 model have? 

In [None]:
# We reset all variables implicitly instantiated by Keras/tensorflow
tf.keras.backend.clear_session()

# We reset the random number generators, for reproducibility purposes 
reset_seed(2023)


# This callback will stop the training when there is no improvement in the loss 
#      for three consecutive epochs.
callback = tf.keras.callbacks.EarlyStopping(monitor='loss', patience=10)

# Fit the model by specifying the number of epochs and the batch size
# We also indicate the validation data so we can collect the evolution 
#      of the metrics through the epochs, both on train, as well as on validation.

ANN_fit_history = ANNmodel.fit(X_train_std,
                               y_train_cat, 
                               epochs=300, 
                               batch_size=128,
                               callbacks=[callback],
                               validation_data=(X_valid_std, y_valid_cat)
                              )


In [None]:
history_dict = ANN_fit_history.history
print(history_dict.keys())

# Plot the evolution of the loss and the accurayc throughout the epochs
# This is useful to find over-fitting and decide on early stopping of the training. 

import matplotlib.pyplot as plt

train_loss = history_dict['loss']
val_loss = history_dict['val_loss']
train_acc = history_dict['categorical_accuracy']
val_acc = history_dict['val_categorical_accuracy']
train_tp = np.array(history_dict['true_positives']) / X_train_std.shape[0]
val_tp = np.array(history_dict['val_true_positives']) / X_valid_std.shape[0]
epochs = range(1, len(train_loss) + 1)


plt.figure(figsize=(20, 5))

plt.subplot(1,3,1)
plt.plot(epochs, train_loss, 'b', label='Training cat. cross-entropy')
plt.plot(epochs, val_loss, 'r', label='Validation cat. cross-entropy')
plt.title('Training and validation loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()


plt.subplot(1,3,2)
plt.plot(epochs, train_acc, 'b', label='Training accuracy')
plt.plot(epochs, val_acc, 'r', label='Validation accuracy')
plt.title('Training and validation accuracy')
plt.xlabel('Epochs')
plt.ylabel('Categorical accuracy')
plt.legend()

plt.subplot(1,3,3)
plt.plot(epochs, train_tp, 'b', label='Training TP')
plt.plot(epochs, val_tp, 'r', label='Validation TP')
plt.title('Training and validation true positives')
plt.xlabel('Epochs')
plt.ylabel('True positives')
plt.legend()

plt.show()

#### To think about: Based on the evolution of the loss throughout the epochs, do you consider the 64/32 model overfit (is the loss on the validation clearly increasing)? 

#### Check the accuracy of the 64/32 model on the training and on the validation data. 

In [None]:
# Your code here

#### Q5. What is the accuracy of the 64/32 model on the training data? 
#### Q6. What is the accuracy of the 64/32 model on the validation data? 