# Report

Summary your findings and motivate your choice of approach. A better motivation show your understanding of the lab. Dont forget to include the result from part 1!

**Name:** Johannes Hägglund \
**Date:** 2021-12-19

## Introduction
This lab was an introduction to image segmantation, which is the technique of instead predicting an image as a label, specific regions within the image are segmented and highlighted. An example of this application is object detection, where the model is given the image, and maps the label to a specific region. 


## Remainder, I deleted some logs, therefore the colors of the curves represented in this report may differ from what is shown in tensorboard (logs are same but color may differ). 

## Result

### Metrics
#### Todo1 & Todo2
The calculation of accuracy segmentation goes under the name pixel accuracy, and according to <a href="https://towardsdatascience.com/evaluating-image-segmentation-models-1e9bb89a001b">Frank Liang</a>, in his blogpost at towards data science, he defines the pixel accuracy as following: <br>

<center>$Pixel \text{ } Accuracy = \frac{\#TP + \#TN}{\#TP + \#TN + \#FP + \#FN}$</center> 

So the answer to the question ***Does high accuracy imply good performance?*** is no, it does not. For example, an image where  the majority of pixels is the background and small segments of interesting parts will still generate a high pixel accuracy, since the model is capable of segmenting the background-pixels, thus high accuracy but poor performance.  According to 
<a href='https://towardsdatascience.com/metrics-to-evaluate-your-semantic-segmentation-model-6bcb99639aa2'>towards datascience</a>, there exists 3 common metrics, where pixel accuracy is one of them. The other two are defined as following:
<ol>
    <li>Intersection-Over-Union, this metric takles the problem with imbalanced data by mesuring the amount of area that is covered for specific regions. In simple terms, it divides the Area of Overlap with Area of Union, and for all classes, it takes the mean to get a result covering all classes. <br><br>
        <center>$IoU = \frac{Area\text{ }of\text{ }Overlap}{Area\text{ }of\text{ }Union}$</center></li>
    <li>Dice Coefficient, this metric is similar to IoU, it's calculated as following: <br><br> <center>$DC = \frac{2 * Area\text{ }of\text{ }Overlap}{Total\text{ }Pixels}$</center></li>
</ol>

### Loss
According to <a href='https://www.jeremyjordan.me/semantic-segmentation/#loss'>Jeremy Jordan</a>, pixel-wise cross entropy loss and dice loss are the two most popular loss functions for image segmentation. Pixel-wise cross entropy loss treats each pixel individually and then calculates a mean value over all pixels. Though this loss function is vulnerable to imbalanced data. <br>
Dice loss is based on the previous mentioned metric dice coefficient, as I understood it the dice loss is calculated as $Dice Loss = 1 - DC$. <br><br>
The implementation for Dice loss did not exist in Pytorch metrics/loss library, therefore it had to be implemented from scratch. <br>
The code was found publicly available and all credit to 
<a href='https://kornia.readthedocs.io/en/v0.1.2/_modules/torchgeometry/losses/dice.html'>
Pytorch geometric</a> that provided the loss function. 

### Architecture
#### Todo1
Unfortunately I did not manage to overfit the model. I tested various of architectures: different kernel sizes and different size of conv layers. After some time, I got disk quota, and cleared my disk on unnecessary logs. According to the netdisk I should have enough space to keep running my model, but each time I try to save I get same error. Therefore, I will have to contact helpdesk and discuss the problem with them. I decided to do this task as the final task, therefore I am not able to complete it. <br><br>
As I said, I tested various of architectures but none of them resulted in overfitting. Also discussions with other students have been made, and they experience the same. 

#### Todo2
For comparison I have chosen to run for 15 epochs. By running the models for only 5 epochs does not tell what value each model
will converge to. By running the initial model (the model that was given in the lab) and modified model for 15 epochs, generates corresponding results: <br>

The changed architecture looks as following: <br>

        self.conv1 = self.Encoder(num_classes, 32, 7, 3)
        self.conv2 = self.Encoder(32, 64, 3, 1)
        self.conv3 = self.Encoder(64, 128, 3, 1)
        self.conv4 = self.Encoder(128, 256, 3, 1)
        
        self.upconv4 = self.Decoder(256, 128, 3, 1)
        self.upconv3 = self.Decoder(128*2, 64, 3, 1)
        self.upconv2 = self.Decoder(64*2, 32, 3, 1)
        self.upconv1 = self.Decoder(32*2, num_classes, 3, 1)

where the cross-entropy-loss function is replaced with the DiceLoss. <br><br>

The results can be shown in the images below, where the green curve is the modified model and light-blue curve is the initial model.
<center><strong>IoU for train and validation-set</strong></center>
<table><tr>
    <td><h2 style="margin-right:20%;">Train</h2><img src="Images/IoU_Train.svg" alt="Drawing" style="width: 100%;"/></td>
    <td><h2 style="margin-right:10%;">Validation</h2><img src="Images/IoU_Validation.svg" alt="Drawing" style="width: 100%;"/> </td>
</tr></table>

<br>

<center><strong>Loss for validation-set</strong></center>
<table><tr>
    <td><h2 style="margin-right:20%;">Train</h2><img src="Images/loss_Validation.svg" alt="Drawing" style="width: 100%;"/></td>
</tr></table>
<br><br>

The predictions for each model can be shown below, where the first image is representing the initial model while the second represents the modified model. <br>
<center><strong>The initial model</strong></center>
<table><tr>
    <td><h2 style="margin-right:10%;"></h2><img src="Images/initmodel.PNG" alt="Drawing" style="width: 100%;"/> </td>
</tr></table>

<br>

<center><strong>The modified model</strong></center>
<table><tr>
    <td><h2 style="margin-right:10%;"></h2><img src="Images/modified.PNG" alt="Drawing" style="width: 100%;"/> </td>
</tr></table>
<br>
The corresponding test IoUs for each model are the following: <br>

 | | Initial model  | Modified model |  |  | 
 | --- | ---  | --- | --- | --- |
 | IoU, Test data | 0.9824 | 0.9778  |  | 
 |                |        |         |  | 

<br>
By looking at the IoU table for the test data, and the corresponding prediction images, we can clearly see that the inital model is already quite good. From the modified model we can see that it still lacks in some segments, and misses some pixels. Looking at the validation and training curves for both models, we can clearly see that the IoU could be improved, and the current value is a bit too low compared to what I expected, this conclusion is drawn by looking at the obtained value on the test-set. Another approach could be to increase the epochs a bit more, it can clearly be concluded that both models has not converged yet in terms of training and validation IoU.  

***The versions in tensorboard are: 152 (modified model) and 150 (initial model) ****

### Hyperparameter tuning
For hyperparameter tuning I tried to tune the learning rate, introduce dropouts and mix the output-channels and kernel-size. Since the model from the start already is good enough, this part was harded to improve. The new obtained model can be shown below: <br>  

        self.conv1 = self.Encoder(num_classes, 16, 3, 'same')
        self.conv2 = self.Encoder(16, 32, 3, 'same')
        self.conv3 = self.Encoder(32, 64, 3, 'same')
        self.conv4 = self.Encoder(64, 128, 3, 'same')
        
        self.upconv4 = self.Decoder(128, 64, 3, 'same')
        self.upconv3 = self.Decoder(64*2, 32, 3, 'same')
        self.upconv2 = self.Decoder(32*2, 16, 3, 'same')
        self.upconv1 = self.Decoder(16*2, num_classes, 3, 'same')
        
The final model is obtained by taking inspiration from https://www.youtube.com/watch?v=azM57JuQpQI. <br><br>
Images from the run and comparison with previous runs can be shown below, and the orange curve is the tuned model. <br>

***Remainder: Ignore the grey curve, it should not be included in the comparison. ***
<br>

<center><strong>IoU for train and validation-set</strong></center>
<table><tr>
    <td><h2 style="margin-right:20%;">Train</h2><img src="Images/IoU_Train_tune.svg" alt="Drawing" style="width: 100%;"/></td>
    <td><h2 style="margin-right:10%;">Validation</h2><img src="Images/IoU_Validation_tune.svg" alt="Drawing" style="width: 100%;"/> </td>
</tr></table>

<table><tr>
    <td><h2 style="margin-right:20%;">Loss validation</h2><img src="Images/loss_Validation_tune.svg" alt="Drawing" style="width: 100%;"/></td>
</tr></table> <br>

<table><tr>
    <td><h2 style="margin-right:20%;"></h2><img src="Images/newmodel_tune.PNG" alt="Drawing" style="width: 100%;"/></td>
</tr></table> <br>

Unfortunately, I can not argue why the orange is getting that high IoU at initilization. Between different runs with same model I experienced inconsistency. Some runs generated good results, while other did not, and from what I know, as long as I don't re-run the train-dataloader, the batch should be the same for all runs. <br><br> The comparison between all models, from start to hypertune can be seen in the table below: <br>

| | Initial model  | Modified model |  Tuned model|  | 
 | --- | ---  | --- | --- | --- |
 | IoU, Test data | 0.9824 | 0.9778  |  0.9878 | 
 |    |  |   |  | 
 
<br>

***Version 150 (original model), 152 (increased complexity), 175 (tuned model)***

### Augmentation

#### Todo1 & Todo2
For the augmentation I implemented vertical flip. I tried to introduce ColorJitter but got an error telling that the input images contained four channels (R, G, B, alpha), from my understanding the images only contains three channels. Unfortunately the time was critical, therefore I did not try to solve the problem.  
##### Vertical flip

| | Probablility (flip)  | Test IoU |  | Version tensorboard  | 
 | --- | ---  | --- | --- | --- |
 |  | 0.3 | 0.8008  | | 184 |
 |  | 0.5  | 0.9914  | | 186  |
 |  | 0.8  | 0.9880  | | 183 | 

<br>
Looking at the table above, the augmentation vertical flip with a probablility of 0.5 generated best result in terms of Test IoU. Though, looking at the generates graphs in tensorboard, both flip=0.5 and flip=0.3 suffered from generating good results in terms of training and validation. Flip=0.8 gave good results in terms of all dataloaders. <br><br>

Since the model that obtained the best test IoU still had not converged in terms of training and validation, I chosed to increment the epochs from 15 to 30. The obtained results can be shown below: version 187 <br><br>


<table><tr>
    <td><h2 style="margin-right:10%;">Training</h2><img src="Images/IoU_Train_aug.svg" alt="Drawing" style="width: 100%;"/></td>
    <td><h2 style="margin-right:10%;">Validation</h2><img src="Images/IoU_Validation_aug.svg" alt="Drawing" style="width: 100%;"/></td>
</tr></table> <br>
<br><br>

<table><tr>
    <td><h2 style="margin-right:10%;">Loss validation</h2><img src="Images/loss_Validation_aug.svg" alt="Drawing" style="width: 100%;"/></td>
</tr></table> <br>

<table><tr>
    <td><h2 style="margin-right:10%;"></h2><img src="Images/mymodel_best0.5flip_30epochs.PNG" alt="Drawing" style="width: 100%;"/></td>
</tr></table> <br>


| | Final Test IoU  | |  | | 
 | --- | ---  | --- | --- | --- |
 |  | 0.9950 |   | |  |
 |  |   |  | |   |

<br>

##### Gaussian blur

| | Probablility  | Gaussian blur params |  | Test IoU  |Version in tensorboard |
 | --- | ---  | --- | --- | --- | --- |
 |  | 0.3 | kernelsize=5, sigma=(0.1, 5) | | 0.9887 | 191
 |  | 0.5  | kernelsize=5 , sigma=(0.01, 1)| | 0.8563  | 192
 |  | 0.8  | kernelsize=5 , sigma=0.5| | 0.9896 | 193

##### Vertical flip & Gaussian blur



| | Probablility (flip)  | Gaussian blur params |  | Test IoU  |Version in tensorboard |
 | --- | ---  | --- | --- | --- | --- |
 |  | 0.5 | prob=0.5, kernelsize=3, sigma=(0.1, 5) | | 0.9258 | 188
 |  | 0.5  | prob=0.3, kernelsize=3 , sigma=(0.01, 1)| | 0.9263  | 189
 |  | 0.5  | prob=0.8, kernelsize=3 , sigma=0.5| | 0.9171 | 190
 
<br>
The conclusions from the results above is that the augmentation for vertical flip generated best result after 15 epochs using flip probability 0.5. By incrementing the epochs for the best model, I managed to obtain 0.9950 in test IoU. Gaussian blur as individual did perform well enough, but not as good as the vertical flip. <br>
A combination of both techniques did pretty much decrease the performance of the model.  

***Question 1:*** Did data augmentation improve the model?<br>
***Answer 1:*** As said, yes it did improve the model. By using RandomVerticalFlip with a probability of 0.5, the model managed to reach 0.9950 in test IoU, after 30 epochs. <br>
***Question 2:*** What do you think have the greatest impact on the performance, why? <br>
***Answer 2:*** I think that by introducing noise to the model is not always good, it might be good because it forces the model to learn the data distribution better and avoid learning noisy data. But it might also decrease the performance, and the model starts to learn the noise instead of focusing on the actual data. Comparing the hypertuning and augmentation, I would say that the hypertuning had the greatest impact, and augmentation helped reach the last/final performance. Another impact, that I discovered during the lab, is that the selection of metric and loss function will make big impact on how well the model will perform. By using accuracy and cross-entropy loss the model might look good, but in general what it does is just predicting the majority of class, in this case the background. By introducing metrics like IoU and loss functions like diceloss will give a better overview over how the model performs. <br>