<a href="https://colab.research.google.com/github/katek28/Deep-Learning-projects/blob/main/Deep_learning_for_Pneumothorax_segmenation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# AMV Group Assignment: Deep learning for Pneumothorax segmentation




---


In this assignment, we experimented with deep learning models for image segmentation. We developed a model to segment a pneumothorax on chest x-ray images, using state-of-the-art models and a public xray dataset.

# Getting started and setting things up
## Importing and installing modules
In this notebook we used several custom python modules, that had to be installed before we could use them. This is done in the code cell below. 






In [None]:
#@title
import os
os.environ['CUDA_LAUNCH_BLOCKING'] = "1"
import numpy as np
import pandas as pd
import random
import cv2
import copy
from PIL import Image
from glob import glob
import shutil
import tqdm
import zipfile

from sklearn.model_selection import train_test_split

from ipywidgets import interact
import matplotlib.pyplot as plt

import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision.transforms as transforms
import torchvision.models as models
from torch.utils.data import DataLoader, Dataset
import plotly.graph_objects as go

!pip install segmentation-models-pytorch
!pip install pytorch-lightning
!pip install albumentations

import segmentation_models_pytorch as smp
import pytorch_lightning as pl
from pytorch_lightning.callbacks import ModelCheckpoint
from pytorch_lightning.loggers import WandbLogger, TensorBoardLogger
from pytorch_lightning.callbacks import LearningRateMonitor
lr_monitor = LearningRateMonitor()

from pytorch_lightning.metrics.classification import AUROC, Accuracy, F1

from albumentations import (
    Compose, HorizontalFlip, RandomBrightness, RandomContrast, RandomGamma, OneOf,
    ToFloat, ShiftScaleRotate, RandomBrightness, RandomContrast, RandomSizedCrop
)

def dice_coefficient(y_true, y_pred, empty_score=1.0, mode='hard'):
    # Function to calculate dice coefficient after thresholding
    if mode == 'hard':
        y_th = y_pred > 0.5
    elif mode == 'soft':
        y_th = y_pred
    else:
        raise ValueError('Invalid dice mode! Choose either "soft" or "hard"')
    im1 = y_true
    im2 = y_th
    if im1.shape != im2.shape:
        raise ValueError("Shape mismatch: im1 and im2 must have the same shape.")
    im_sum = im1.sum() + im2.sum()
    if im_sum == 0:
        return empty_score
    intersection = (im1*im2).sum()

    return (2. * intersection.sum()) / im_sum

%load_ext tensorboard

print('Module installations and imports completed successfully!')

Collecting segmentation-models-pytorch
  Downloading segmentation_models_pytorch-0.2.0-py3-none-any.whl (87 kB)
[?25l[K     |███▊                            | 10 kB 17.5 MB/s eta 0:00:01[K     |███████▌                        | 20 kB 23.3 MB/s eta 0:00:01[K     |███████████▏                    | 30 kB 12.8 MB/s eta 0:00:01[K     |███████████████                 | 40 kB 9.8 MB/s eta 0:00:01[K     |██████████████████▊             | 51 kB 5.5 MB/s eta 0:00:01[K     |██████████████████████▍         | 61 kB 6.1 MB/s eta 0:00:01[K     |██████████████████████████▏     | 71 kB 5.8 MB/s eta 0:00:01[K     |██████████████████████████████  | 81 kB 6.5 MB/s eta 0:00:01[K     |████████████████████████████████| 87 kB 3.7 MB/s 
[?25hCollecting pretrainedmodels==0.7.4
  Downloading pretrainedmodels-0.7.4.tar.gz (58 kB)
[K     |████████████████████████████████| 58 kB 6.4 MB/s 
[?25hCollecting efficientnet-pytorch==0.6.3
  Downloading efficientnet_pytorch-0.6.3.tar.gz (16 kB)
Collec

ModuleNotFoundError: ignored

In [None]:
print(pl.__version__)

### Mounting google drive to the notebook




In [1]:
from google.colab import drive
drive.mount('/gdrive')

MessageError: ignored

### Copy data from drive to colab
In the code cell below we import the data into google colab, and extract the .zip file. By copying it into colab instead of loading it from the google drive each time, we can do much faster training.



In [None]:
#Copying the data to the local colab runtime

######################
zip_path = '/gdrive/My Drive/dataset.zip'
######################

print('Copying data to local runtime')
shutil.copyfile(zip_path, 'dataset.zip')
print('Copy complete. Extracting...')
if not os.path.exists('dataset'):
    os.makedirs('dataset')
!unzip -q dataset.zip -d dataset
print('Extraction complete.')
print('Dataset imported to CoLab runtime succesfully.')

## Definitions


### Defining the dataset

Here, the 'DataLoader' class is defined. It assumes that the dataset was succesfully copied to the local colab runtime, so make sure you've executed the previous codecell without error.

Apart from loading the images, it also handles splitting of the dataset into a train / validation / test parts. Run the code cell below to define our dataloader

In [2]:
#@title
class SIIMACR_kaggle(Dataset):
    '''
    Dataset Class for SIIM-ACR pneumothorax segmentation dataset from kaggle
    Dataset link - https://www.kaggle.com/c/siim-acr-pneumothorax-segmentation/data
    There is 1 class in the given labels.
    The `get_filenames` function retrieves the filenames of all images in the given `path` and
    saves the absolute path in a list.
    In the `get_item` function, images and masks are resized to the given `img_size`, 
    given `transform` (if any) are applied to the image only
    (mask does not usually require transforms, but they can be implemented in a similar way).
    '''

    def __init__(self, root_path, split, img_size=(512, 512), 
                 transform=None, mode='segmentation', negative_fraction=1):
        self.img_size = img_size
        self.transform = transform
        self.split = split
        self.root = root_path
        self.df = pd.read_csv(os.path.join(root_path, 'dataset.csv'), index_col=0)
        self.mode = mode
        self.negative_fraction = negative_fraction

        trainingset = self.df[self.df['dataset']=='train']
        test = self.df[self.df['dataset']=='test']
        # Split between train, valid and test set
        train, validation = train_test_split(trainingset, test_size=0.1, random_state=42, stratify=trainingset['pneu'])

        datasets = {'train': train, 'valid': validation, 'test': test}
        self.samples = self.subsample_negative_samples(datasets[split])
        self.data_summary = self.create_data_summary()
        self.tensortransform = transforms.Compose([transforms.ToTensor()])
   
    def __len__(self):
        return(len(self.samples))

    def __getitem__(self, idx):
        img = Image.open(os.path.join(self.root, self.samples.iloc[idx]['image_path'].replace('\\', '/')))
        img = img.resize(self.img_size)
        img = np.array(img)

        mask = Image.open(os.path.join(self.root, self.samples.iloc[idx]['mask_path'].replace('\\', '/')))
        mask = mask.resize(self.img_size)
        mask = np.array(mask)
        mask[mask!=0] = 1
        mask = mask[:,:, 0].squeeze()

        if self.transform:
            augmented = self.transform(image=img, mask=mask)
        if self.mode == 'segmentation':
            if self.transform:
                return self.tensortransform(augmented['image']), augmented['mask']
            else:
                return img, mask
        elif self.mode == 'classification':
             return img, int(mask.any())
        elif self.mode == 'classification+segmentation':
             return img, mask, int(mask.any())

    def give_data_for_class(self, cclass, idx):
        if cclass == 'pneu':
            sample_idxs = np.nonzero(self.samples['pneu'].values)[0]
            return self.give_data(sample_idxs[idx])
        elif cclass == 'no_pneu':
            sample_idxs = np.nonzero(self.samples['pneu'].values != 1)[0]
            return self.give_data(sample_idxs[idx])

    def give_data(self, idx):
        img = Image.open(os.path.join(self.root, self.samples.iloc[idx]['image_path'].replace('\\', '/')))
        img = img.resize(self.img_size)
        img = np.array(img)

        mask = Image.open(os.path.join(self.root, self.samples.iloc[idx]['mask_path'].replace('\\', '/')))
        mask = mask.resize(self.img_size)
        mask = np.array(mask)
        mask[mask!=0] = 1
        mask = mask[:,:, 0].squeeze()
        return img, mask

    def create_data_summary(self):
        return {'pneu':len(self.samples[self.samples['pneu']==1]), 
                'no_pneu':len(self.samples[self.samples['pneu']==0])}
    
    def subsample_negative_samples(self, dataset):
        neg_samples = dataset[dataset['pneu']==0]
        pos_samples = dataset[dataset['pneu']==1]
        n_neg = int(np.clip(self.negative_fraction*len(neg_samples), 1, None))
        neg_samples_sampled = neg_samples.sample(n_neg, random_state=42)
        return pd.concat([pos_samples, neg_samples_sampled])

print('DataSet defined.')

NameError: ignored

#### Inspecting the data
A very important aspect of deep learning is making sure that your data is in a proper state before you pass it into your model. If something's wrong with the data, you will never be able to train a good model on it. Therefore, it is really important to visually inspect your data before you do anything with it.

Using the cell below, we could have a look at the data and the masks for the training, validation and test set. 

In [3]:
# You can change the value of 'dataset' to inspect the different splits
# Options are: 'train', 'valid' or 'test'
dataset = 'train'

dsc = SIIMACR_kaggle(root_path='dataset', split=dataset, transform=None, negative_fraction=1)

n_total = dsc.__len__()
n_pneu = dsc.data_summary['pneu']
n_nopneu = dsc.data_summary['no_pneu']

print('{} dataset contains {} ({:.1f}%) "pneu", and {} ({:.1f}%) "no_pneu" samples'.format(dataset, n_pneu, n_pneu/n_total*100, n_nopneu, n_nopneu/n_total*100))

@interact(show_class=['pneu', 'no_pneu'], index=(0, dsc.data_summary['no_pneu']-1, 1))
def plot_image_and_mask(show_class='pneu', index=0):
    image, mask = dsc.give_data_for_class(show_class, index)
    image_masked = copy.deepcopy(image)
    image_masked[:,:,0][mask!=0] = image_masked[:,:,0][mask!=0]*1
    image_masked[:,:,1][mask!=0] = image_masked[:,:,1][mask!=0]*0
    image_masked[:,:,2][mask!=0] = image_masked[:,:,2][mask!=0]*0

    display_image = np.hstack((image, image_masked))
    fig = plt.figure(figsize=(8,4))
    plt.imshow(display_image, cmap='bone')
    plt.axis('off')

NameError: ignored

### Defining the loss functions and metrics

For this segmentation problem we used the dice coefficient as a metric. It measures the overlap between two areas; in our case these are the labels and the predictions of the model. For perfect overlap, the dice coefficient is equal to 1. If there is no overlap, the dice is 0.

We experimented with three different loss functions: The dice coefficient, the binary cross entropy and the combo-loss, which is just defined as the sum of the bce loss and the dice loss

The Dice loss and the Combo loss are defined below. The bce is one of the default loss functions available in pytorch, so we don't have to define it manually here. 

In [None]:
#@title
def DiceMetric(inputs, targets, smooth=1): 
    #flatten label and prediction tensors
    inputs = inputs.view(-1)
    targets = targets.view(-1)

    intersection = (inputs * targets).sum()                            
    dice = (2.*intersection + smooth)/(inputs.sum() + targets.sum() + smooth)  

    return dice

def DiceLoss(inputs, targets, smooth=1):
    return 1 - DiceMetric(inputs, targets, smooth)

def ComboLoss(inputs, targets, smooth=1):
    dice_contribution = DiceLoss(inputs, targets, smooth)
    bce_contribution = F.binary_cross_entropy(inputs, targets)
    return bce_contribution + dice_contribution

print('Loss functions defined.')

### Defining the model

In the following code cell the model itself is defined. Model definition is still a bit complex because it requires not only the details of the model itself, but also procedures for calculating the loss and metrics during training, validation and testing. Furthermore, it also holds the configuration of the optimizers and hyperparameters used during training (e.g. learning rate, number of epochs, batch size, etc.).

In [4]:
#@title
class Model(pl.LightningModule):
    '''
    Model Module
    This is a basic  module implemented with Pytorch Lightning.
    It is specific to the SIIMACR dataset i.e. dataloaders are for this thorax xray dataset
    It uses the ResNet18 model as an example.
    Adam optimizer is used.
    '''

    def __init__(self, hparams):
        super().__init__()
        self.hparams.update(hparams)
        self.root_path = hparams['root']
        self.batch_size = 4
        self.epochs = hparams['epochs']
        self.learning_rate = hparams['lr']
        self.scheduler = hparams['lr_scheduler']
        self.loss_function = hparams['loss_function']
        self.mode='segmentation'
        self.negative_fraction=hparams['negative_fraction']
        self.augmentation = hparams['data_augmentation']

        decoder_channels = [256, 128, 64, 32, 16]
        self.net = smp.Unet(hparams['model_backbone'],
                              encoder_depth=hparams['encoder_depth'], 
                              encoder_weights = None,
                              classes=1, 
                              in_channels=3, 
                              activation='sigmoid', 
                              aux_params = None,
                              decoder_channels=decoder_channels[:hparams['encoder_depth']]) 

        self.transform_test = Compose([ToFloat(max_value=1)],p=1)

        if self.augmentation:
            self.transform_train = Compose([
                  HorizontalFlip(p=0.5),
                  ShiftScaleRotate(shift_limit=0.0825, scale_limit=0.2, rotate_limit=25),
                  OneOf([
                    RandomContrast(),
                    RandomGamma(),
                    RandomBrightness(),
                    ], p=0.3),
                  ToFloat(max_value=1)
                ],p=1)
        else:
            self.transform_train = self.transform_test
       
        self.trainset = SIIMACR_kaggle(self.root_path, split='train', transform=self.transform_train, mode=self.mode, negative_fraction=self.negative_fraction)
        self.validset = SIIMACR_kaggle(self.root_path, split='valid', transform=self.transform_test, mode=self.mode, negative_fraction=1)
        self.testset = SIIMACR_kaggle(self.root_path, split='test', transform=self.transform_test, mode=self.mode, negative_fraction=1)

        self.ntest = self.testset.__len__()
        self.nvalid = self.validset.__len__()
        self.ntrain = self.trainset.__len__()
        
        self.metric = 'dice'
        
        self.save_hyperparameters()
    
    def on_fit_start(self):
        metric_placeholder = {'test_{}'.format(self.metric): 0, 'val_{}'.format(self.metric): 0}
        self.logger.log_hyperparams(self.hparams, metrics=metric_placeholder)

    def forward(self, x):
        return self.net(x)

    def calculate_loss_and_metric(self, batch):
        img, mask = batch
        img = img.float()
        mask = mask.float().unsqueeze(1)
        out = self(img)
        if self.loss_function == 'dice':
            loss_val = DiceLoss(out, mask)
        elif self.loss_function == 'bce':
            loss_val = F.binary_cross_entropy(out, mask)
        elif self.loss_function == 'combo':
            loss_val = ComboLoss(out, mask)
        dice = DiceMetric(out, mask)
        return loss_val, dice

    def training_step(self, batch, batch_nb):
        loss_val, metric = self.calculate_loss_and_metric(batch)
        log_dict = {'train_loss': loss_val, 'train_{}'.format(self.metric): metric}
        self.log_dict(log_dict)
        return {'loss': loss_val, 'log': log_dict, 'progress_bar': log_dict}

    def validation_step(self, batch, batch_idx):
        loss_val, metric = self.calculate_loss_and_metric(batch)
        log_dict = {'val_loss': loss_val, 'val_{}'.format(self.metric): metric}
        self.log_dict(log_dict)
        return {'val_loss': loss_val, 'val_{}'.format(self.metric): metric}

    def test_step(self, batch, batch_idx):
        loss_val, metric = self.calculate_loss_and_metric(batch)
        log_dict = {'test_loss': loss_val, 'test_{}'.format(self.metric): metric}
        self.log_dict(log_dict)
        return {'test_{}'.format(self.metric): metric}

    def validation_epoch_end(self, outputs):
        loss_val = sum(output['val_loss'] for output in outputs) / len(outputs)
        metric_val = sum(output['val_{}'.format(self.metric)] for output in outputs) / len(outputs)
        log_dict = {'val_loss': loss_val, 'val_{}'.format(self.metric): metric_val}
        self.log_dict(log_dict)
        return {'log': log_dict, 'val_loss': log_dict['val_loss'], 'progress_bar': log_dict, 'val_{}'.format(self.metric): log_dict['val_{}'.format(self.metric)]}

    def test_epoch_end(self, outputs):
        metric_val = sum(output['test_{}'.format(self.metric)] for output in outputs) / len(outputs)
        log_dict = {'test_{}'.format(self.metric): metric_val}
        self.log_dict(log_dict)
        return {'log': log_dict, 'progress_bar': log_dict, 'test_{}'.format(self.metric): log_dict['test_{}'.format(self.metric)]}

    def configure_optimizers(self):
        opt = torch.optim.Adam(self.net.parameters(), lr=self.learning_rate)
        if self.scheduler == 'cosine':
            sch = torch.optim.lr_scheduler.CosineAnnealingLR(opt, T_max=self.epochs/3)
        else:
            lmbd = lambda epoch: 1
            sch = torch.optim.lr_scheduler.LambdaLR(opt, lr_lambda=lmbd)
        return [opt], [sch]

    def train_dataloader(self):
        return DataLoader(self.trainset, batch_size=self.batch_size, shuffle=True, num_workers=4)

    def val_dataloader(self):
        return DataLoader(self.validset, batch_size=self.batch_size, shuffle=False, num_workers=4)

    def test_dataloader(self):
        return DataLoader(self.testset, batch_size=self.batch_size, shuffle=False, num_workers=1)

print('Model class defined.')

NameError: ignored

# Model Training

Now that everything's set up, we get to the fun part: actually training and evaluating the model. 



## Specify hyperparameters
Here, you can set the hyperparameters and the model name. 

There are several new hyperparameters here that were not in the week 3 code assignment. The meaning of those parameters is explained in more detail at the bottom of this notebook (in the 'Assignment' section).

In [None]:
# Here, the hyperparameters for the model are defined. 

hparams = {
            'root': 'dataset',
            'epochs': 10,
            'encoder_depth': 5,
            'lr': 1e-2,
            'lr_scheduler': 'constant',
            'loss_function': 'dice',
            'model_backbone': 'resnet18',
            'negative_fraction': 0.0,
            'data_augmentation': False
           }

# The name of the model that will be trained. Make sure to change this before 
# starting a next training run, to avoid overwriting your previously trained 
# models!
model_name = 'resnet18_version_0'

## The training procedure

The cell below contains the code to run the model training. Once you run it, it will create folders to store the model weights. It will also show you how the training is progressing.

After each epoch, the model is evaluated on the validation set. If the dice coefficient on the validation set improved, a model 'checkpoint' is created, which just means that the current weights of the model are saved to disk. 

After the training completes and the model has been trained for the specified number of epochs, the latest model checkpoint is loaded (so the 'best possible' model that was trained is used) and that model is used to make predictions on the test set. The dice coefficient on the test set is then calculated.


In [None]:
# 1 INIT LIGHTNING MODEL

model = Model(hparams)

# 2 Create folder to save the models
checkpoint_path = os.path.join(os.getcwd(), 'pytorch_checkpoints', model_name)
if not os.path.exists(checkpoint_path):
    os.makedirs(checkpoint_path)

checkpoint=ModelCheckpoint(dirpath=checkpoint_path, filename=model_name+'_'+'{epoch}', 
                                        save_top_k=1, verbose=True, monitor='val_dice', mode='max')

# 3 INIT TRAINER
trainer = pl.Trainer(
    gpus=1,
    max_epochs=hparams['epochs'],
    checkpoint_callback=True,
    callbacks=[checkpoint,lr_monitor],
    )

# 4 START TRAINING
trainer.fit(model)

# 5 Evaluate model on test set
trainer.test()

## Monitoring the training

Execute the cell below to fire up Tensorboard (wait for a minute for the app to load). 

Remember to set 'Smoothing' to zero, and that you don't have to rerun the cell to load new training results, you can just press the 'Refresh' button in Tensorboard.




In [None]:
#@title
%tensorboard --logdir lightning_logs

# Model Evaluation

To evaluate model performance, performance metrics are used. For a segmentation task such as here, a popular metric is the dice coefficient. Metrics are a very convenient way to measure performance because they can easily be averaged over the entire dataset (or the train / validation / test sets). They therefore allow you to summarize model performance with a single scalar. 

However, in order to get a feeling for model performance it is equally important to visually inspect the predictions from the model, to see if they make sense. In this section, we both visualized the predictions and calculated the mean dice coefficients for the different subsets.



## Visualizing the results

Execute the code cell below. This will bring up a widget in which you can select a model checkpoint and a dataset to evaluate. Use the dropdown boxes to select an appropriate checkpoint and dataset (for instance the test or validation set). 

Next, you can use the slider to walk through the images in the dataset and inspect the image (left), mask (middle) and model prediction (right) for each image.



In [None]:
#@title
# -----------------------------------
# HELPER FUNCTIONS FOR VISUALIZATION
# -----------------------------------

def predict_on_image(pytorch_model, index=0, dataset='test'):
  # Function to generate a prediction using pytorch_model,
  # on any of the images from the test set
    dataset_map = {'test':pytorch_model.testset, 
                 'validation': pytorch_model.validset, 
                 'train': pytorch_model.trainset}
    ds = dataset_map[dataset]
    img, mask = ds.__getitem__(index)
    pred = pytorch_model.eval()(img.float().cuda(device=0).unsqueeze(0))
    pred = pred.cpu().detach().numpy().squeeze()
    return img.cpu().squeeze().numpy()/255, mask, pred


def predict_and_plot(pytorch_model, index=0, dataset='test', dice_mode='soft'):
    img, mask, pred = predict_on_image(pytorch_model, index, dataset)
    dice = dice_coefficient(mask, pred, mode=dice_mode)
    print('Dice coefficient for {} image {} is: {:.3f}'.format(dataset, index, dice))
    image = np.moveaxis(img, 0, -1)
    image_masked = copy.deepcopy(image)
    image_predicted = copy.deepcopy(image)
    prediction = pred > 0.5

    image_masked[:,:,0][mask!=0] = image_masked[:,:,0][mask!=0]*1
    image_masked[:,:,1][mask!=0] = image_masked[:,:,1][mask!=0]*0
    image_masked[:,:,2][mask!=0] = image_masked[:,:,2][mask!=0]*0

    image_predicted[:,:,0][prediction!=0] = image_predicted[:,:,0][prediction!=0]*0
    image_predicted[:,:,1][prediction!=0] = image_predicted[:,:,1][prediction!=0]*0
    image_predicted[:,:,2][prediction!=0] = image_predicted[:,:,2][prediction!=0]*1

    display_image = np.hstack((image, image_masked, image_predicted))
    fig = plt.figure(figsize=(24,8))
    plt.imshow(display_image, cmap='bone')
    plt.axis('off')
    return dice, fig 


#cps = os.listdir(os.path.join(os.getcwd(), 'pytorch_checkpoints'))
cps = glob(os.path.join(checkpoint_path, '*.ckpt'))
datasets = ['train', 'validation', 'test']

@interact(checkpoint=cps, dataset=datasets)
def select_model_checkpoint(checkpoint=cps[0], dataset='validation'):
    global eval_model, ds, model_checkpoint
    model_checkpoint = checkpoint
    ckpth_path = checkpoint
    eval_model = Model.load_from_checkpoint(ckpth_path)
    eval_model.cuda(device=0)
    ds = dataset

dsmap = {'test':eval_model.ntest, 'validation':eval_model.nvalid, 'train':eval_model.ntrain}
@interact
def plot_sample(index=(0,dsmap[ds]-1,1)):
    print('Evaluating model from checkpoint: {}'.format(model_checkpoint))
    predict_and_plot(pytorch_model=eval_model, index=index, dataset=ds, dice_mode='hard')

## The mean dice coefficient
The code below calculates the dice coefficient for the model (loaded from the checkpoint we selected with the widget above) for the training, validation and test datasets. To do so, predictions were generated for all images in the datasets, so the code takes some time to execute. 

In addition to the means, a histogram is generated that shows the distribution of dice scores for the train, validation and test set. 


In [None]:
#@title
print('Evaluating model from checkpoint: {}'.format(model_checkpoint))
evaldices = pd.DataFrame({})
for dsname, dslen in dsmap.items():
    for index in range(dslen):
        _, mask, pred = predict_on_image(eval_model, index, dataset=dsname)
        evaldices.at[index, 'dice_{}'.format(dsname)] = dice_coefficient(mask, pred, mode='hard') 

    print('Mean dice coefficient on the {} set is {:.3f}'.format(dsname, evaldices['dice_{}'.format(dsname)].mean()))

fig = go.Figure()
for dsname, _ in dsmap.items():
    fig.add_trace(go.Histogram(x=evaldices['dice_{}'.format(dsname)], name=dsname, xbins={'start':0.0, 'end': 1.05, 'size':0.05}))

    fig.update_layout(
    title_text='Histogram of hard dice scores across datasets, calculated from model checkpoint: {}'.format(model_checkpoint), 
    xaxis_title_text='Dice score', 
    yaxis_title_text='Count', 
    bargap=0.1, 
    bargroupgap=0.025,
    hovermode="x"
)

fig.show()

At this point we noticed two things:


1.   The mean dice scores calculated here are different from what is shown during the training and in Tensorboard
2.   There are peaks in the distribution of dice coefficients at a dice score of 0 (large peak) and 1 (small peak).


Observation #1 has to do with the way the dice coefficient is calculated. During training, dice is calculated using the model predictions (i.e. the probability values per pixel) directly. That means that if a certain pixel gets a probability of for example 0.67, it also counts for only 0.67 of overlap with the ground truth (provided it is a pixel that is positive in the ground truth). 


However, this is not the way the dice coefficient is formally defined: It should be calculated on *binary* predictions, i.e. we should first apply a threshold to the predicted probabilities for all pixels. I have applied a threshold of 0.5 to calculate the means and histograms above. 

To differentiate between the two methods, the method when dice is calculated from the predictions directly is known as the *soft dice coefficient*, whereas when a threshold is first applied is called the *hard dice coefficient*. While the soft dice is useful during training, the hard dice is the accepted measure to report model performance.

## Exporting predictions and masks for the test set


In [None]:
# Make sure to set this path to the folder on your google drive in which you want 
# the exported files to show up. The results will be copied there as a .zip file.

google_drive_path = '/gdrive/My Drive/'

print('Generating predictions from checkpoint: {}'.format(model_checkpoint))
dices = pd.DataFrame({})

target_path = os.path.join('export', model_checkpoint)

if not os.path.exists(target_path):
  os.makedirs(target_path)

folders = ['images', 'predictions', 'ground_truth']

for folder in folders:
    target_folder = os.path.join(target_path, folder)
    if not os.path.exists(target_folder):
        os.makedirs(target_folder)

for index in range(eval_model.ntest):
    image, mask, pred = predict_on_image(eval_model, index, dataset='test')
    dice = dice_coefficient(mask, pred, mode='soft')
    filename = '{:04d}.png'.format(index)
    image = np.moveaxis(image, 0, -1)
    cv2.imwrite(os.path.join(target_path, 'images', filename), (255*image).astype('uint8'))
    cv2.imwrite(os.path.join(target_path, 'predictions', filename), (255*pred).astype('uint8'))
    cv2.imwrite(os.path.join(target_path, 'ground_truth', filename), (255*mask).astype('uint8'))
    dices.at[filename, 'dice'] = dice
    if (index+1) % 300 == 0:
        print('Generated predictions for {} images'.format(index+1))
    
print('Predictions completed. Compressing files....')
dices.to_csv(os.path.join(target_path, 'dice_coefficients.csv'))
shutil.make_archive('export/{}'.format(model_checkpoint), 'zip', target_path)

print('Copying archive to google drive')
shutil.copyfile('export/{}.zip'.format(model_checkpoint), os.path.join(google_drive_path, '{}.zip'.format(model_checkpoint)))
print('Copy completed.')

# Assignment

**We needed to improve the model, by experimentally determining better choices for the hyperparameters.** 

Use the experience and insights you already have from the week 3 coding assignment!

---


**Hyperparameter interpretation**

These are the hyperparameters:

*   Number of epochs
*   Learning rate
*   Negative fraction (value between 0 and 1)
*   Loss function (valid choices are 'bce', 'dice' or 'combo')
*   Model backbone (valid choices include 'resnet18', 'resnet34', 'efficientnet-b4', and much more)
*   Data augmentation (True or False). 

##### Negative fraction
The 'negative fraction' hyperparameter controls the amount of images **without** any pneumothorax in the *training* dataset. A value of 1 means that an equal amount of images with and without pneumothorax are used. A value of 0 means that only images with pneumothorax are used during training. Note that the test and validation set contain both images with and without pneumothorax.

##### Model backbone
The 'model backbone' hyperparameter controls the structure of the model itself: ResNet18 is a simple, relatively shallow architecture that consists of 18 layers, while ResNet34 is deeper, consisting of 34 layers. Deeper models can lead to better results, but are also more prone to overfitting. For a full list of all available backbones, see https://github.com/qubvel/segmentation_models.pytorch . I recommend that you focus on one or two model families (for example, resnet and efficientnet) and experiment a bit with those. **Note:** If you're using different backbones, make sure to (roughly) explain in your methods sections what the differences are!

##### Data augmentation
When set to true, certain random transforms will be applied to the data at every epoch with a pre-defined probability. The transformations that are applied are: HorizontalFlip (p=0.5), Shifting, Scaling, Rotation (within certain limits) and a random Brightness, Contrast or Gamma adjustment (also within certain limits). This might help to make the model more generalizable, because it increases diversity in the data. 