# Analysing the data and the results

## Contents

1. Analyse data
   1. Load dataset
   2. Inspect type of dataset
   3. Analyse dataset
   4. Preprocess dataset
2. Machine learning pipeline
   1. Define different model architectures
   2. Utils: training loop and computing performance
   3. Define different model instances and train them all
   4. Analysis of the evolution of the training and validation losses for each model
   5. Analysis of the confusion matrix of each model on the validation dataset
   6. Model Selection
   7. Model evaluation
3. Analyse Results
   1. Plot confusion matrix
   2. Plot examples of misclassified inputs
   3. Final comments

**Objectives:**

1. Practice even more what you have learned with the tutorials and the previous assigments.
2. Encourage critical thinking when analysing data and results

Of course, in reality, more sanity checks could be implemented and the analysis could go much deeper. Don't hesitate to also check the `project_checklist` document. What you should always keep in mind though is that **you should never just trust the numbers! Nor blindly use tools that you don't understand**

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset, random_split
from torchvision import datasets, transforms

from sklearn.metrics import accuracy_score, f1_score, confusion_matrix

from datetime import datetime
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns

from collections import Counter

SEED = 265
torch.manual_seed(SEED)

## 1. Analyse data

### 1.1 Load dataset

The dataset is an image classification dataset.

#### Tasks

1. Download the datasets from the link given in the MittUiB assignment and load it using torch.load().
2. Split the `train_val` dataset into a training and a validation dataset.

In [None]:
data_train_val = torch.load("data_train_val.pt")
data_test = torch.load("data_test.pt")

# TODO...

### 1.2 Inspect type of dataset

#### Tasks

Inspect the dataset. Do everything that seems relevant to you to have a better idea of the type of data you will play with. You can for example try to answer the following questions first:

- What is the size of each dataset?
- What is the shape of the input?
- What is the type of the input tensors?
- What is the shape of the targets?
- What is the type of the target tensors?
- How many classes are there?
- What do instances of each class look like?

Remember that you can acces any element of a dataset without using any dataloader using `data[i]`. 

See `02 - Machine learning pipeline and MNIST` for more information about plotting instances for each class

In [None]:
# TODO...

### 1.3 Analyse dataset

#### Tasks

Inspect the dataset. Do everything that seems relevant to you to have a better idea of the type of data you will play with. You can for example try to answer the following questions first:

- What is the range of values of the input? Its mean? And standard deviation? 
- How many instances are there in each class?

See `02 - Machine learning pipeline and MNIST` for more information about counting instances of each class.


In [None]:
# TODO...

### 1.4 Preprocess dataset

See `02 - Machine learning pipeline and MNIST` for more information about the preprocessing in general

#### Tasks

1. Instantiate a pytorch transforms to preprocess the dataset according to the analysis you just made.

In [None]:
# TODO...

## 2. Machine learning pipeline

See practical exercise 2 `Machine learning pipeline and MNIST` for more information about the machine pipeline in general

### 2.1 Define different model architectures

#### Tasks

1. Define 3 different model architectures that are suitable to classify the images of the dataset.

In [None]:
# TODO...

### 2.2 Utils: training loop and computing performance

See `02 - Machine learning pipeline and MNIST` for more information about the training loop in general

#### Tasks

1. Write a function ``train`` that 
   1. Trains the model for ``n`` epochs (complete passes through the training dataset)
   2. Computes and stores the training loss and the validation loss for each epoch
   3. Returns the list of training and validation losses

In [None]:
# TODO...

### 2.3 Define different model instances and train them all

#### Tasks

1. For each model architecture that you defined:
   1. create multiple instances and train them with different hyperparameters.
   2. store the training and validation loss for each model trained

In [None]:
# TODO...

### 2.4 Analysis of the evolution of the training and validation losses for each model

#### Tasks

1. For each of your trained models, plot the training loss and the validation loss. See `02 - Machine learning pipeline and MNIST` for more information about how to plot the training loss and the validation loss. 
2. Analyse your plots. For example, you can try to answer the following questions:
   1. Are you results so unexpected that you think you should look for a bug in your code? For example, if the validation loss is much better that the training loss, or the losses are constants or if the training loss is increasing, etc. 
   2. Are results so disappointing that you think you should go back to `1. Analyse data` because of a misunderstanding concerning the content of the dataset or an inappropriate preprocessing?
   3. Are some models overfitting/underfitting? 
   4. Do some architectures perform systematically better than others? Do you think you should go back to `2.1. Define different model architectures` and define new models?
   5. Do some sets of hyperparameters systematically yield a better/worse performance than other? Do you think you should go back to `3. Define different model instances and train them all`?
   

In [1]:
#TODO...

### 2.5 Analysis of the confusion matrix of each model on the validation dataset

#### Tasks

1. For each trained model, plot the confusion matrix of the validation dataset. To do so:
   1. Compute the confusion matrix. You can use `sklearn.metrics.confusion_matrix` (remember to use `.cpu()` on Pytorch tensors right before using a library outside Pytorch)
   2. Plot the obtained confusion matrix. You can use the `plot_confusion_matrix` function.
2. Analyse the confusion matrix. For example, you can try to answer the following questions
   1. Are the different classes equally well (mis-)classified? 
   2. For each class, does the model tend to misclassify the input with some other specific classes? Do you know why?

In [2]:
def plot_confusion_matrix(matrix, ax=None):
    """
    Plot the given confusion matrix
    """
    ax = sns.heatmap(
        data=matrix.round(2),
        cmap=sns.color_palette("RdBu_r", 1000), ax=ax
    )
    return ax

#TODO...

### 2.6 Model Selection

See `02 - Machine learning pipeline and MNIST` for more information about model selection.

#### Tasks

1. Choose a performance measure. Justify your choice.
2. Select the best model based on the validation performance

In [None]:
#TODO...

### 2.7 Model evaluation

See `02 - Machine learning pipeline and MNIST` for more information about model evaluation.

#### Tasks

1. Evaluate the performance of the selected model on the test dataset.
2. Analyse your result. For example, you can try to answer the following questions :

   1. Are your results surprising? In other words, relative to your general knowledge in machine learning, your understanding of the data and the problem, and relative to the conﬁdence you put in your design choices and implementation, did you expect such performance?
   2. Are your results satisfying? In other words, is your model better than random outputs? Does your model outperform any baseline model? Do you think you could use your model in real life? 


In [None]:
#TODO...

## 3. Analyse Results

We would like to analyse the performance of our model a bit further, in order to be able to communicate on its qualities and limits.

### 3.1 Plot confusion matrix

#### Tasks

1. Plot the confusion matrix of the best model on the test dataset.
2. Analyse the confusion matrix. For example, you can try to answer the following questions
   1. Are the different classes equally well (mis-)classified? 
   2. For each class, does the model tend to misclassify the input with some other specific classes? Do you know why?

In [None]:
#TODO...

### 3.2 Plot examples of misclassified inputs

#### Tasks

1. For each class, plot examples of misclassified inputs together with their predicted class

In [None]:
#TODO...

### 3.3 Final comments

#### Tasks

1. In which cases does your selected model seem to struggle? In which cases does your model seem to yield good results? How would you explain that?
1. If you were given unlimited time, what would you try to improve your performance?