#  08 - Neural Networks

*HFT Stuttgart, 2024 Summer Term, Michael Mommert (michael.mommert@hft-stuttgart.de)*

This Jupyter Notebook provides a simple introduction into Python programming and is based on Notebooks prepared by the amazing Dr. Marco Schreyer.

In this lab, we will learn how to implement, train, and apply our first **Artificial Neural Network (ANN)** using a Python library named `PyTorch`. The `PyTorch` library is an open-source machine learning library for Python, used for a variety of applications such as image classification and natural language processing. We will use the implemented neural network to learn to again classify images of fashion articles from the **Fashion-MNIST** dataset.

The figure below illustrates a high-level view of the machine learning process we aim to establish in this lab:

<img align="center" style="max-width: 700px" src="https://github.com/mommermi/hft-kid-2024summer/blob/main/lab_03/classification.png?raw=1">

## 1. Deep Learning Workflow

As a reminder, here is the Deep Learning workflow for training a neural network:

<img align="center" style="max-width: 700px" src="https://github.com/mommermi/hft-kid-2024summer/blob/main/08_neuralnetworks/dl_pipeline.pdf?raw=1">

## 2. Setup of the Jupyter Notebook Environment

Similar to the previous labs, we need to import a couple of Python libraries that allow for data analysis and data visualization. We will mainly use the `PyTorch`, `Numpy`, `Scikit-Learn`, `Matplotlib` and a few utility libraries throughout this lab:

In [None]:
# import standard python libraries
from datetime import datetime
import numpy as np
import os

Import the Python machine / deep learning libraries:

In [None]:
# import the PyTorch deep learning libary
import torch, torchvision
import torch.nn.functional as F
from torch import nn, optim

Import the sklearn classification metrics and some other useful tools:

In [None]:
# import sklearn classification evaluation library
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
from sklearn.model_selection import train_test_split

Import plotting capabilities:

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

**Run this cell only if you are running this notebook on Google Colab**: Import `Google's GDrive` connector and mount your `GDrive` directories:

In [None]:
# import the Google Colab GDrive connector
from google.colab import drive

# mount GDrive inside the Colab notebook
drive.mount('/content/drive')

# create data sub-directory inside the Colab Notebooks directory
data_directory = '/content/drive/MyDrive/Colab Notebooks/data_fmnist'
if not os.path.exists(data_directory): os.makedirs(data_directory)

**Run this cell only if you are running this notebook on Binder**: Create a directory to store the dataset:

In [None]:
data_directory = 'data_fmnist/'
if not os.path.exists(data_directory): os.makedirs(data_directory)

Set a random `seed` value to obtain reproducable results:

In [None]:
# init deterministic seed
seed_value = 42
np.random.seed(seed_value) # set numpy seed
torch.manual_seed(seed_value) # set pytorch seed CPU

Google Colab provides free GPUs for running notebooks. However, if you just execute this notebook as is, it will use your device's CPU (as if you were running the notebook on Bilder or some other cloud computing service). To run the notebook on a GPU at Colab, you have to go to `Runtime` > `Change runtime type` and set the Runtime type to `GPU` in the drop-down. Running this lab on a CPU is fine, but you will find that GPU computing is faster. `cuda:0` indicates that the notebook  is using a GPU.

Enable GPU computing by setting the device flag and init a CUDA seed:

In [None]:
# set cpu or gpu enabled device
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu').type

# init deterministic GPU seed
torch.cuda.manual_seed(seed_value)

# log type of device enabled
print('[LOG] notebook with {} computation enabled'.format(str(device)))

Let's determine if we have access to a GPU provided by e.g. `Google's Colab` environment (this will result in an error message, if you do not have access to a GPU):

In [None]:
!nvidia-smi

## 3. Dataset Download and Data Assessment

The **Fashion-MNIST database** is a large database of Zalando articles that is commonly used for training various image processing systems. The database is widely used for training and testing in the field of machine learning. Let's have a brief look into a couple of sample images contained in the dataset:

<img align="center" style="max-width: 700px; height: 300px" src="https://github.com/mommermi/hft-kid-2024summer/blob/main/lab_03/FashionMNIST.png?raw=1">

Source: https://www.kaggle.com/c/insar-fashion-mnist-challenge

Further details on the dataset can be obtained via Zalando research's [github page](https://github.com/zalandoresearch/fashion-mnist).

The **Fashion-MNIST database** is an image dataset of Zalando's article images, consisting of in total 70,000 images.

The dataset is divided into a set of **60,000 training examples** and a set of **10,000 evaluation examples**. Each example is a **28x28 grayscale image**, associated with a **label from 10 classes**. Zalando created this dataset with the intention of providing a replacement for the popular **MNIST** handwritten digits dataset. It is a useful addition as it is a bit more complex, but still very easy to use. It shares the same image size and train/test split structure as MNIST, and can therefore be used as a drop-in replacement. It requires minimal efforts on preprocessing and formatting the distinct images.

Let's download and inspect the training images of the dataset. Therefore, let's first define the directory in which we aim to store the training data:

In [None]:
train_path = data_directory + '/train_fmnist'

Now, let's download the training data accordingly:

In [None]:
# download and transform training images
fashion_mnist_train = torchvision.datasets.FashionMNIST(root=train_path, train=True, download=True)

# split data (X) from labels (y)
X_train = fashion_mnist_train.data
y_train = fashion_mnist_train.targets

Verify the number of training images downloaded:

In [None]:
# determine the number of training data images
len(X_train)

Furthermore, let's inspect a couple of the downloaded training images:

In [None]:
# select some random image ids
image_ids = np.random.randint(0, len(X_train), size=9)

# retrieve images and labels
images = X_train[image_ids]
labels = y_train[image_ids]
images

Ok, that doesn't seem right :). Let's plot the image:

In [None]:
f, ax = plt.subplots(3, 3, figsize=(9, 9))
ax = np.ravel(ax)

for i in range(len(images)):
    ax[i].imshow(images[i], cmap='gray')

Nice. Let's check what the corresponding ground truth labels are:

In [None]:
labels

Ok, we know that the numerical label is 6. Each image is associated with a label from 0 to 9, and this number represents one of the fashion items. So what does 6 mean? Is 6 a bag? A pullover? The order of the classes can be found on Zalando research's [github page](https://github.com/zalandoresearch/fashion-mnist). We need to map each numerical label to its fashion item, which will be useful throughout the lab:

In [None]:
fashion_classes = {0: 'T-shirt/top',
                   1: 'Trouser',
                   2: 'Pullover',
                   3: 'Dress',
                   4: 'Coat',
                   5: 'Sandal',
                   6: 'Shirt',
                   7: 'Sneaker',
                   8: 'Bag',
                   9: 'Ankle boot'}

So, now we can do the translation (we need `.item()` to turn the tensor into an integer) and use the labels as titles in our figure:

In [None]:
f, ax = plt.subplots(3, 3, figsize=(9, 9))
ax = np.ravel(ax)

for i in range(len(images)):
    ax[i].imshow(images[i], cmap='gray')
    ax[i].set_title(fashion_classes[labels[i].item()])

Fantastic, right? Let's now define the directory in which we aim to store the evaluation data:

In [None]:
eval_path = data_directory + '/eval_fmnist'

And download the evaluation data accordingly:

In [None]:
# download and transform training images
fashion_mnist_eval = torchvision.datasets.FashionMNIST(root=eval_path, train=False, download=True)

Let's also verify the number of evaluation images downloaded:

In [None]:
# determine the number of evaluation data images
len(fashion_mnist_eval)

We will now split the evaluation dataset into equally sized chunks as our validation and test dataset:

In [None]:
X_val, X_test, y_val, y_test = train_test_split(fashion_mnist_eval.data, fashion_mnist_eval.targets, test_size=0.5, stratify=fashion_mnist_eval.targets, random_state=seed_value)
X_val.shape, X_test.shape, y_val.shape, y_test.shape

## 4. Neural Network Implementation

In this section we, will implement the architecture of the **neural network** we aim to utilize to learn a model that is capable to classify the 28x28 pixel FashionMNIST images of fashion items. However, before we start the implementation let's briefly revisit the process to be established. The following cartoon provides a birds-eye view:

<img align="center" style="max-width: 1000px" src="https://github.com/mommermi/hft-kid-2024summer/blob/main/lab_03/process.png?raw=1">

### 4.1 Implementation of the Neural Network Architecture

The neural network, which we name **'FashionMNISTNet'** consists of three **fully-connected layers** (including an “input layer” and two hidden layers). Furthermore, the **FashionMNISTNet** should encompass the following number of neurons per layer: 100 (layer 1), 50 (layer 2) and 10 (layer 3). Meaning the first layer consists of 100 neurons, the second layer of 50 neurons and third layer of 10 neurons (the number of digit classes we aim to classify.

We will now start implementing the network architecture as a separate Python class. Implementing the network architectures as a **separate class** in Python is good practice in deep learning projects. It will allow us to create and train several instances of the same neural network architecture. This provides us, for example, the opportunity to evaluate different initializations of the network parameters or train models using distinct datasets. 

In [None]:
# implement the MNISTNet network architecture
class FashionMNISTNet(nn.Module):
    
    # define the class constructor
    def __init__(self):
        
        # call super class constructor
        super(FashionMNISTNet, self).__init__()
        
        # specify fully-connected (fc) layer 1 - in 28*28, out 100
        self.linear1 = nn.Linear(28*28, 100, bias=True) # the linearity W*x+b
        
        # specify fc layer 2 - in 100, out 50
        self.linear2 = nn.Linear(100, 50, bias=True) # the linearity W*x+b
        
        # specify fc layer 3 - in 50, out 10
        self.linear3 = nn.Linear(50, 10) # the linearity W*x+b
        
        # add a softmax to the last layer
        self.logsoftmax = nn.LogSoftmax(dim=1) # the softmax

        # relu non-linear activation function
        self.relu = nn.ReLU(inplace=True)

    # define network forward pass
    def forward(self, images):
        
        # reshape image pixels
        x = images.float().view(-1, 28*28)
        
        # define fc layer 1 forward pass
        x = self.relu(self.linear1(x))
        
        # define fc layer 2 forward pass
        x = self.relu(self.linear2(x))
        
        # define layer 3 forward pass
        x = self.logsoftmax(self.linear3(x))
        
        # return forward pass result
        return x

Now, that we have implemented our first neural network we are ready to instantiate a network model to be trained:

In [None]:
model = FashionMNISTNet()

Let's push the initialized `FashionMNISTNet` model to the computing `device` that is enabled:

In [None]:
model = model.to(device)

Once the model is initialized, we can visualize the model structure and review the implemented network architecture by execution of the following cell:

In [None]:
# print the initialized architectures
print('[LOG] FashionMNISTNet architecture:\n\n{}\n'.format(model))

Looks like intended? Brilliant! Finally, let's have a look into the number of model parameters that we aim to train in the next steps of the notebook:

In [None]:
# init the number of model parameters
num_params = 0

# iterate over the distinct parameters
for param in model.parameters():

    # collect number of parameters
    num_params += param.numel()
    
# print the number of model paramters
print('[LOG] Number of to be trained FashionMNISTNet model parameters: {}.'.format(num_params))

Ok, our "simple" FashionMNISTNet model already encompasses an impressive number 84'060 model parameters to be trained.

### 4.2 Specification of the Neural Network Loss Function

Now that we have implemented the **FashionMNISTNet** we are ready to train the network. However, prior to starting the training, we need to define an appropriate loss function. Remember, we aim to train our model to learn a set of model parameters $\theta$ that minimize the classification error of the true class $c^{i}$ of a given handwritten digit image $x^{i}$ and its predicted class $\hat{c}^{i} = f_\theta(x^{i})$ as faithfully as possible. 

Thereby, the training objective is to learn a set of optimal model parameters $\theta^*$ that optimize $\arg\min_{\theta} \|C - f_\theta(X)\|$ over all training images in the FashionMNIST dataset. To achieve this optimization objective, one typically minimizes a loss function $\mathcal{L_{\theta}}$ as part of the network training. In this lab we use the **'Negative Log Likelihood (NLL)'** loss, defined by:

$$\mathcal{L}^{NLL}_{\theta} (c_i, \hat c_i) = - \frac{1}{N} \sum_{i=1}^N \log (\hat{c}_i) $$

for a set of $n$-FashionMNIST images $x^{i}$, $i=1,...,n$ and their respective predicted class labels $\hat{c}^{i}$. This is summed for all the correct classes. 

During training the **NLL** loss will penalize models that result in a high classification error between the predicted class labels $\hat{c}^{i}$ and their respective true class label $c^{i}$. Luckily, an implementation of the NLL loss is already available in PyTorch! It can be instantiated "off-the-shelf" via the execution of the following PyTorch command:

In [None]:
# define the optimization criterion / loss function
nll_loss = nn.NLLLoss()

Let's also push the initialized `nll_loss` computation to the computing `device` that is enabled:

In [None]:
nll_loss = nll_loss.to(device)

## 5. Neural Network Model Training

In this section, we will train our neural network model (as implemented in the section above) using the transformed images of fashion items. More specifically, we will have a detailed look into the distinct training steps as well as how to monitor the training progress.

### 5.1. Preparing the Network Training

So far, we have pre-processed the dataset, implemented the ANN and defined the classification error. Let's now start to train a corresponding model for **10 epochs** and a **mini-batch size of 128** FashionMNIST images per batch. This implies that the whole dataset will be fed to the ANN 10 times in chunks of 128 images yielding to **469 mini-batches** (60.000 images / 128 images per mini-batch) per epoch.

In [None]:
# specify the training parameters
num_epochs = 10 # number of training epochs
mini_batch_size = 128 # size of the mini-batches

Based on the loss magnitude of a certain mini-batch PyTorch automatically computes the gradients. But even better, based on the gradient, the library also helps us in the optimization and update of the network parameters $\theta$.

We will use the **Stochastic Gradient Descent (SGD) optimization** and set the learning-rate $l = 0.0001$. Each mini-batch step the optimizer will update the model parameters $\theta$ values.

In [None]:
# define learning rate and optimization strategy
learning_rate = 0.0001
optimizer = optim.SGD(params=model.parameters(), lr=learning_rate)

Now that we have successfully implemented and defined the three ANN building blocks let's take some time to review the `FashionMNISTNet` model definition as well as the `loss`. Please, read the above code and comments carefully and don't hesitate to let us know any questions you might have.

Furthermore, let's specify and instantiate PyTorch data loaders for the different splits (the mini batch sizes for val and test equal the sizes of these datasets for quicker processing):

In [None]:
fashion_mnist_train_dataloader = torch.utils.data.DataLoader(list(zip(X_train, y_train)), batch_size=mini_batch_size, shuffle=True)
fashion_mnist_val_dataloader = torch.utils.data.DataLoader(list(zip(X_val, y_val)), batch_size=len(X_val), shuffle=True)
fashion_mnist_test_dataloader = torch.utils.data.DataLoader(list(zip(X_test, y_test)), batch_size=len(X_test), shuffle=True)

### 5.2. Running the Network Training

Finally, we start training the model. The detailed training procedure for each mini-batch is performed as follows: 

>1. do a forward pass through the FashionMNISTNet network, 
>2. compute the negative log likelihood classification error $\mathcal{L}^{NLL}_{\theta}(c^{i};\hat{c}^{i})$, 
>3. do a backward pass through the FashionMNISTNet network, and 
>4. update the parameters of the network $f_\theta(\cdot)$.

To ensure learning while training our ANN model, we will monitor whether the loss decreases with progressing training. Therefore, we obtain and evaluate the classification performance of the entire training dataset after each training epoch. Based on this evaluation, we can conclude on the training progress and whether the loss is converging (indicating that the model might not improve any further).

The following elements of the network training code below should be given particular attention:
 
>- `loss.backward()` computes the gradients based on the magnitude of the reconstruction loss,
>- `optimizer.step()` updates the network parameters based on the gradient.

In [None]:
# init collection of training epoch losses
train_epoch_losses = []

# set the model in training mode
model.train()

# train the MNISTNet model
for epoch in range(num_epochs):
    
    # init collection of mini-batch losses
    train_mini_batch_losses = []
    
    # iterate over all-mini batches
    for i, (images, labels) in enumerate(fashion_mnist_train_dataloader):
    
        # push mini-batch data to computation device
        images = images.to(device)
        labels = labels.to(device)

        # run forward pass through the network
        output = model(images)

        # reset graph gradients
        model.zero_grad()
        
        # determine classification loss
        loss = nll_loss(output, labels)
        
        # run backward pass
        loss.backward()
        
        # update network paramaters
        optimizer.step()
        
        # collect mini-batch reconstruction loss
        train_mini_batch_losses.append(loss.data.item())
    
    # determine mean min-batch loss of epoch
    train_epoch_loss = np.mean(train_mini_batch_losses)
    
    # print epoch loss
    now = datetime.now().strftime("%Y%m%d-%H:%M:%S")
    print('[LOG {}] epoch: {} train-loss: {}'.format(str(now), str(epoch), str(train_epoch_loss)))
    
    # determine mean min-batch loss of epoch
    train_epoch_losses.append(train_epoch_loss)

Upon successful training let's visualize and inspect the training loss per epoch:

In [None]:
# prepare plot
fig = plt.figure()
ax = fig.add_subplot(111)

# add grid
ax.grid(linestyle='dotted')

# plot the training epochs vs. the epochs' classification error
ax.plot(np.array(range(1, len(train_epoch_losses)+1)), train_epoch_losses, label='epoch loss (blue)')

# add axis legends
ax.set_xlabel("Epoch]")
ax.set_ylabel("Training Loss")

# add plot title
plt.title('Training Epochs $e_i$ vs. Classification Error $L^{NLL}$', fontsize=10);

Ok, fantastic. The training error is nicely going down. We could train the network a couple more epochs until the error converges. But let's stay with the 10 training epochs for now and continue with evaluating our trained model.

## 6. Neural Network Model Evaluation

We will now evaluate the trained model using the same mini-batch approach as we did throughout the network training and derive the mean negative log-likelihood loss of the mini-batches:

In [None]:
# set the model in eval mode
model.eval()

# move the model to the cpu (for evaluation, we will not need the gpu)
model.to('cpu')

# init collection of mini-batch losses
eval_mini_batch_losses = []

# iterate over all-mini batches
for i, (images, labels) in enumerate(fashion_mnist_test_dataloader):

    # run forward pass through the network
    output = model(images)

    # determine classification loss
    loss = nll_loss(output, labels)

    # collect mini-batch reconstruction loss
    eval_mini_batch_losses.append(loss.data.item())

# determine mean min-batch loss of epoch
eval_loss = np.mean(eval_mini_batch_losses)

# print epoch loss
now = datetime.utcnow().strftime("%Y%m%d-%H:%M:%S")
print('[LOG {}] eval-loss: {}'.format(str(now), str(eval_loss)))

Ok, great. The evaluation loss looks in-line with our training loss. Let's now inspect a few sample predictions to get an impression of the model quality. Therefore, we will again pick a random image of our evaluation dataset and retrieve its PyTorch tensor as well as the corresponding label:

In [None]:
# set (random) image id
image_id = 1000

# retrieve image exhibiting the image id
image, label = X_test[image_id], y_test[image_id]

Let's show the image and the corresponding ground truth label:

In [None]:
# set image plot title 
plt.title('Example: {}, Label: {}'.format(str(image_id), fashion_classes[label.item()]))

# plot mnist handwritten digit sample
plt.imshow(image, cmap='gray')

Let's compare the true label with the prediction of our model:

In [None]:
model(image)

We can even determine the likelihood of the most probable class:

In [None]:
most_probable = torch.argmax(model(image), dim=1).item()
print('Most probable class: {}'.format(most_probable))
print('This class represents the following fashion article: {}'.format(fashion_classes[most_probable]))

Let's now obtain the predictions for all the fashion item images of the evaluation data:

In [None]:
predictions = torch.argmax(model(X_test), dim=1)

Furthermore, let's obtain the overall classifcation accuracy:

In [None]:
accuracy_score(y_test, predictions.detach())

Let's also inspect the confusion matrix to determine major sources of misclassification:

In [None]:
# determine classification matrix of the predicted and target classes
mat = confusion_matrix(y_test, predictions.detach())

# initialize the plot and define size
plt.figure(figsize=(8, 8))

# plot corresponding confusion matrix
sns.heatmap(mat.T, square=True, annot=True, fmt='d', cbar=False, cmap='YlOrRd_r', xticklabels=fashion_classes.values(), yticklabels=fashion_classes.values())
plt.tick_params(axis='both', which='major', labelsize=8, labelbottom = False, bottom=False, top = False, left = False, labeltop=True)

# set plot title
plt.title('Fashion MNIST classification matrix')

# set axis labels
plt.xlabel('[true label]')
plt.ylabel('[predicted label]');

Ok, we can easily see that our current model confuses sandals with either sneakers or ankle boots. However, the inverse does not really hold. The model sometimes confuses sneakers with ankle boots, and only very rarely with sandals. The same holds ankle boots. Our model also has issues distinguishing shirts from coats (and, to a lesser degree, from T-shirts and pullovers).

These mistakes are not very surprising, as these items exhibit a high similarity.

## 7. Final Notes

In this lab we did not really tune the hyperparameter of our neural network (the learning rate); therefore, we did not make use of the validation dataset. You can implement hyperparameter tuning (learning rate) and checking for overfitting as your home assignment.