# ENN583 Week 11 - Segmentation

This week's practical will explore semantic segmentation models available via PyTorch. You will analyse the predictions and associated confidences of multiple segmentation models, and investigate how combining the predictions of models (i.e. ensembling) can often improve performance.

In particular, we'll look at:
* a [Fully Convolutional Network](https://arxiv.org/abs/1411.4038) -- as we explored in the lecture
* [DeepLabv3](https://pytorch.org/vision/stable/models/deeplabv3.html) -- a more recent, high-performing architecture
* [Deep Ensembles](https://arxiv.org/abs/1612.01474) -- which combines these two models!

## Step 1: Import necessary libraries

In [None]:
from torchvision.io.image import read_image
from torchvision.models.segmentation import fcn_resnet50, FCN_ResNet50_Weights, deeplabv3_resnet50, DeepLabV3_ResNet50_Weights
from torchvision.transforms.functional import to_pil_image
import torch

import matplotlib.pyplot as plt
import numpy as np

import matplotlib.patches as mpatches
from matplotlib.colors import LinearSegmentedColormap, ListedColormap
import matplotlib

import glob

## Step 2: Load the segmentation models from PyTorch

PyTorch has a number of pre-trained models available, including semantic segmentation models trained on Pascal VOC! 

You can see these [here](https://pytorch.org/vision/stable/models.html#semantic-segmentation).

As mentioned, we will compare FCN and DeepLabv3. Compare the 'Mean IoU' of each of these models as listed on the linked Pytorch page -- which model do you expect to perform the best? Also consider the model number of parameters, and GFLOPS -- is there a big difference?

Below, we have built off the pytorch example code to visualise how to use a segmentation model. This is very similar to the process that you followed with the object detection models. Note the key steps:
1. Initialise the model, load the weights and set into 'eval' mode
2. Load the data transforms necessary -- these typically ensure the image is the correct size and is normalized
3. Preprocess the data using the data transform -- pretty self-explanatory
4. Test the data with the model

Below, we're printing two things:
* the list of classes that the model was trained on
* the size of the input to the model -- the image after it goes through pre-processing
* the size of the output (prediction) from the segmentation model

What does the output represent? Why does it have different dimensions to the input?

In [None]:
img = read_image("data/dog.jpg")

# Step 1: Initialize model with the best available weights
weights = FCN_ResNet50_Weights.DEFAULT
model = fcn_resnet50(weights=weights)
model.eval()

# Step 2: Initialize the inference transforms
preprocess = weights.transforms()

# Step 3: Apply inference preprocessing transforms
batch = preprocess(img).unsqueeze(0)

# Step 4: Use the model and visualize the prediction
with torch.no_grad():
    prediction = model(batch)["out"]
    normalized_masks = prediction.softmax(dim=1)

print(f"Input size: {batch.size()}")
print(f"Output size: {normalized_masks.size()}")

class_list = weights.meta["categories"]
class_list[0] = "background"
print(f"Classes: {class_list}")


## **Your turn!** 

Adapt the output from the model to create an image representing the prediction from the model. This should be a 2D image, where each pixel is represented as a number between 0 and 20 (representing the predicted class index from the 21 classes of the model).

You can pass this image into the ```draw_seg_results``` function, alongside the original image, to visualise the prediction and its corresponding image. 

Looking at the prediction, has the model segmented the image correctly? Where does it look like there may be errors?

In [None]:
def draw_seg_results(img, class_preds):
    fig, ax = plt.subplots(1, 2, figsize = (12, 4))
    
    #visualise the input image
    img_vis = np.swapaxes(img, 0, 1)
    img_vis = np.swapaxes(img_vis, 1, 2)
    ax[0].imshow(img_vis)   
    
    # a function to visualise the segmentation prediction
    # you must provide a 2D array (matching width and height of original image), where each element is a number representing the predicted class of that pixel.
    cmap = matplotlib.colormaps.get_cmap('tab20')
    new_colors = np.concatenate(([[0.2, 0.2, 0.2]], cmap.colors))
    cmap = ListedColormap(new_colors)

    ax[1].imshow(class_preds, cmap = cmap, vmin = 0, vmax = 20)
    
    patches = [mpatches.Patch(color=new_colors[i], label="{l}".format(l=class_list[i]) ) for i in range(len(class_list)) ]
    fig.legend(handles=patches, bbox_to_anchor=(0.2, 0.05), loc=2, borderaxespad=0. , ncols = 5)
    
    plt.show()

### Enter your code below


Observing the segmentation result above, you may notice 2 areas or error:
1. The back leg of the dog has been classified as 'cat'.
2. The boundary of the dog is classified randomly between the classes (notice the "rainbow" colour?).

Let's observe the model's confidence in these predictions -- are these confident mistakes? or not?

## **Your turn!** 

Convert the ```normalised_masks``` into an image which relates to the confidence of the predicted class. Add a third subplot into the ```draw_seg_results``` function that can be used to visualise this with ```plt.imshow()```, and observe how the confidence varies across the image. You can use ```plt.colorbar()``` to see the quantitative confidences of each pixel.

Compare the changes in confidence to the accuracy of the prediction.

In [None]:
for file_name in glob.glob('data/Road_Anomalies/images/*'):
    img = read_image(file_name)
   
    # Step 1: Initialize model with the best available weights
    weights = FCN_ResNet50_Weights.DEFAULT
    model = fcn_resnet50(weights=weights)
        
    model.eval()

    # Step 2: Initialize the inference transforms
    preprocess = weights.transforms()

    # Step 3: Apply inference preprocessing transforms
    batch = preprocess(img).unsqueeze(0)

    # Step 4: Use the model and visualize the prediction
    with torch.no_grad():
        prediction = model(batch)["out"]
        normalized_masks = prediction.softmax(dim=1)
        
    #fix the code below...
    draw_seg_results(img, ..., ...)
        

## Compare with another segmentation model!
**Your turn:** In the loop above, add in a sub-loop where you also test with the DeepLabv3 ResNet50 model -- everything you need for this is already imported in the first cell, look there for hints on how to refer to the models.

Compare the results for the two different models -- is one clearly better? Do they work/fail in different ways?

## Two is better than one: Ensembling

A common way to improve the performance of models on a task is to ***ensemble*** them -- this means you test an input through multiple models, and then aggregate their predictions to produce an output. There are many benefits of this! 

1. Often you get more accurate predictions
2. Often the confidence is better calibrated -- e.g. more meaningful, high confidence for correct predictions and low for incorrect
3. The confidence is also better able to identify novel/anomalous objects that the model hasn't seen before!

<img src="ensemble.png" alt="drawing" width="500"/>

**Your turn:** In the loop above, draw the results when you ensemble the predictions of each model. You can do this by averaging their predictions together.

Looking at the results, is this helpful for performance? What could be a downside of ensembling?
