# **AI for Health Deep Learning workshop**

Welcome to the AI for Health Deep Learning workshop! In the first part of this workshop, you will be training a convolutional neural network (CNN) to classify handwritten digits. 

In the second part, you will employ a CNN to determine the malignancy of cells in pathology images. Let's see if the neural network performs better than you!

# How to use this notebook
This notebook contains cells with snippets of code that help you to use to load, process and visualize data, to train CNNs and to visualize results. The code snippets are accompanied by snippets of text that explain what the code does. 

Some text cells also contain questions for you to answer and discuss. 

To execute a cell of code, move the cursor into the cell by clicking and press ctrl+Enter. Some code snippets build on results of previous cells, so make sure that you run all code cells preceding the current cell. 

Feel free to experiment by altering the code and to ask any questions you have!

# MNIST: training a network to 'read' handwritten digits

## **Understanding the data** 

First we import the libraries necessary for the code in this notebook to run. There are different libraries available that make it easier to create and train deep learning networks without having to code things like back propagation or convolutions yourself. Here we use PyTorch. Another much used library is Tensorflow.

In [4]:
from __future__ import print_function

import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'

import torch
import torch.optim as optim
import torch.nn as nn
import torch.nn.functional as F

import torchvision
import torchvision.transforms as transforms

from matplotlib import pyplot as plt
from matplotlib.colors import LogNorm

import numpy as np
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
from math import sqrt, ceil
import pandas as pd
import seaborn as sn

MNIST is a famous dataset of images of handwritten digits. We use it as toy example. And the good thing is: Pytorch has this dataset available via torchvision!

In [5]:
transform = transforms.Compose(
    [transforms.ToTensor()]
    )

batch_size = 128

mnist_train = torchvision.datasets.MNIST('/data/mnist', download=True, train=True, transform = transform)

mnist_test = torchvision.datasets.MNIST('/data/mnist', download=True, train=False, transform = transform)


OSError: [Errno 30] Read-only file system: '/data'

In [None]:
print('Training set input size:', mnist_train.data.shape)
print('Training set output size:', mnist_train.targets.shape)
print('Testing set input size:', mnist_test.data.shape)
print('Testing set output size:', mnist_test.targets.shape)

*Exercise:* 
 - *What do you think these numbers mean?* 
 - *How many images does each set contain?* 
 - *How large are these images (in pixels)?*

We are going to split the train set into a smaller training set and a validation set. The validation set will be used during training to see intermediate performance. Also, we can change the model architecture or hyperparameters and check back with the validation performance what worked best. In this way the test set will be left untouched during the training and development process and will be a real final test to see how our model works.

In [None]:
#from sklearn.model_selection import train_test_split
#x_train, x_val, y_train, y_val = train_test_split(x_train, y_train, test_size=1/6, random_state=44, shuffle=True)

train_set, val_set = torch.utils.data.random_split(mnist_train, [50000,10000])

In [None]:
print('Training set input length:', len(train_set))
print('Training set output length:', len(train_set))
print('Validation set input length:', len(val_set))
print('Validation set output length:', len(val_set))
print('Testing set input length:', len(mnist_test.data))
print('Testing set output length:', len(mnist_test.targets))

*Exercise:*

- *The following snippet of code prints the `i`th image in the train set. Change the value of index `i` below (between 0 and the size of the train set) to see different samples.*

In [None]:
i=5
plt.figure()
plt.imshow(train_set[i][0][0])
plt.title('label: {}'.format(train_set[i][1]));

## **Preparing the data**

An image dataset that can be fed to a neural network usually has 4 dimensions.
- The number of images
- The number of color channels
  - When using color images there are three channels for red, green and blue (RGB).
  - When using black and white there is only 1 channel.
- The number of pixel rows
- The number of pixel columns



*Exercise:*
- *What should the dimensions of the train set be?*

Pytorch has very useful DataLoaders that can be used to feed data to the network when training, validating and testing. As we will see later you can loop over these DataLoader objects to feed the images and labels to your model during train, validation and testing. When creating these DataLoaders we already define the size of the batches that we feed into the model. A batch contains multiple images (e.g. 128) that are fed to the model at the same time. This helps us making more accurate updates to the weights of the model.

In [None]:
# Create DataLoaders

from torch.utils.data import DataLoader

batch_size = 128

trainloader = DataLoader(train_set, batch_size=batch_size,
                                          shuffle=True, num_workers=2)

valloader = DataLoader(val_set, batch_size=batch_size,
                                          shuffle=False, num_workers=2)

testloader = DataLoader(mnist_test, batch_size=batch_size,
                                         shuffle=False, num_workers=2)


## **Building a convolutional neural network**

Now that the data is ready, let's build a convolutional neural network (CNN)! In Pytorch, neural networks are class objects. In the '__init__' function you define all the layers that your network will have. In the 'forward' function the different calculations that happen when you feed an image in the network are defined. 

In the network below we use 2D convolution layers (Conv2d), max pooling layers (MaxPool2d) and linear or dense layers (Linear). Generally, every layer also has an activation function associatied with it. Here we use the Rectified Linear Unit (relu).

In [None]:
import torch.nn as nn
import torch.nn.functional as F
from torchsummary import summary

# Define model
class Net(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(1, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(256, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = torch.flatten(x, 1) # flatten all dimensions except batch
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x


*Exercise:*

- Check out the above neural network. Do you understand the steps that happen in the 'forward' function?
- Why is torch.flatten() needed, do you think?

Now, we can compile our network which makes it ready for training. While doing so, we state what loss metric and optimizer we want to use during training.
- The optimizer is the algorithm we use to train the neural network. It determines what weigths have to be increased and decreased. We use stochastic gradient descent (SGD) as optimization algorithm.   
- The loss is the metric that the optimizer uses to determine the performance of the model (we use categorical cross-entropy). The larger the loss, the worse the performance. Zero indicates perfect performance.

To speed up training of neural networks, GPUs are very important. Where CPUs perform all operations in sequence, GPUs are able to do many similar computations in parallel. The computations that happen in neural networks are very well suited for parallel computations, especially the convolution computations used in many deep learning architectures.

To make use of a GPU we have to explicitly send our model and our data to the GPU. Below we first detect whether there is a GPU available, and then initiate the model and send it to the GPU.

In [None]:

device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')

# Assuming that we are on a CUDA machine, this should print a CUDA device:

print(device)

# Initiate model and send model to device
model = Net()
model.to(device)

# Define loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)

## **Training and testing a convolutional neural network**

Before we start training the network, let's see what happens if we let the current, randomly initialized model predict the digits for the validation set.
Run the next 2 cells to see the accuracy. The first cell defines a useful 'predict_output' function that loops over the testset and calculates the predictions. In the second cell we calculate the accuracy between these predictions and the actual labels.

In [None]:
# Define function to predict the digits for the test set
def predict_output(testset):

    x_test = []
    y_test_pred = []

    # since we're not training, we don't need to calculate the gradients for our outputs
    with torch.no_grad():
        for i, data in enumerate(testset, 0):
            # get the inputs; data is a list of [inputs, labels]
            inputs = data[0].to(device)
            # calculate outputs by running images through the network
            outputs = model(inputs)
            # the class with the highest energy is what we choose as prediction
            _, predicted = torch.max(outputs.data, 1)

            x_test.extend(inputs.data.cpu().numpy())
            y_test_pred.extend(predicted.data.cpu().numpy())

    return np.array(x_test), np.array(y_test_pred)

In [None]:
# Get preditions
x_test, y_test_pred = predict_output(testloader)

# Get labels
y_test = []
for i, data in enumerate(testloader):
    y_test.extend(data[1].data.numpy())
y_test = np.array(y_test)

# Calculate accuracy
correct = (y_test_pred == y_test)
print(correct)
print(f'Accuracy of the network on the 10000 test images: {100 * np.sum(correct)/len(correct)} %')

We can also visualize the predictions that are made. Below is a useful funtion 'visualize_predictions'. You do not have to understand what it does exactly. Run the next two cells to get a nice visualization of the predictions.

In [None]:
# Define function to visualize predictions

def visualize_predictions(x, y_true, y_pred, correct, label_to_name={}):
    per_row = int(0.5 + sqrt(len(x)))
    nr_rows = int(ceil(len(x) / float(per_row)))
    fig, grid = plt.subplots(nr_rows, per_row, figsize=(10,10))  
    cmap = None
    if x.shape[-1] == 1:
        cmap="gray"    
    for r in range(nr_rows):
        for c in range(per_row):
            if (r*per_row + c < len(x)):
                grid[r,c].imshow(np.squeeze(np.moveaxis(x[r*per_row + c],0,2)), cmap=cmap)
                grid[r,c].set_xticklabels([])
                grid[r,c].set_xticklabels([])
                grid[r,c].set_yticklabels([])
                if label_to_name:
                    grid[r,c].set_xlabel("Label = " + str(label_to_name[y_true[r*per_row + c]]) + "\nPredicted = " + str(label_to_name[y_pred[r*per_row + c]]))
                else:
                    grid[r,c].set_xlabel("Label = " + str(y_true[r*per_row + c]) + "\nPredicted = " + str(y_pred[r*per_row + c]))
                for tic in grid[r][c].xaxis.get_major_ticks():
                    tic.tick1line.set_visible(False)
                    tic.tick2line.set_visible(False)
                for tic in grid[r][c].yaxis.get_major_ticks():
                    tic.tick1line.set_visible(False)
                    tic.tick2line.set_visible(False)
                color = "green"
                if correct[r*per_row + c] == False:
                    color = "red"
                grid[r,c].spines['bottom'].set_color(color)
                grid[r,c].spines['bottom'].set_linewidth(2)
                grid[r,c].spines['top'].set_color(color)
                grid[r,c].spines['top'].set_linewidth(2)
                grid[r,c].spines['left'].set_color(color)
                grid[r,c].spines['left'].set_linewidth(2)
                grid[r,c].spines['right'].set_color(color)
                grid[r,c].spines['right'].set_linewidth(2)
            else :
                grid[r,c].axis('off')
    plt.tight_layout()

In [None]:
# Select some random images
random_indices = np.random.randint(0,len(x_test), 16)

# Visualize predictions
visualize_predictions(x_test[random_indices], y_test[random_indices], y_test_pred[random_indices], correct[random_indices])

*What do you think, how does the model do?*

To train our model we use the 'train_model' function below. It contains multiple loops. On the highest level you loop over epochs. During every epoch the model first loops over the training batches and subsequently over the validation batches. The loss and accuracy are printed for both the train and test set, after every epoch.

To train the network, the loss is calculated for every train batch based on the loss function, i.e. 'criterion(outputs, labels)'. Subsequently the gradients are calculated using backpropagation with 'loss.backward'. Finally the weights are updated based on the gradients during 'optimizer.step()'. These steps are the core of training every deep learning model.

In [None]:
def train_model(model,criterion,optimizer,trainloader,valloader,epochs):
    train_loss = []
    val_loss = []
    train_acc = []
    val_acc = []

    for epoch in range(epochs):  # loop over the dataset multiple times

        # train step
        model.train(True)
        running_loss = 0.0
        running_acc = 0.0
        total = 0.0

        for i, data in enumerate(trainloader, 0):
            # get the inputs; data is a list of [inputs, labels]
            inputs, labels = data[0].to(device), data[1].to(device)

            # zero the parameter gradients
            optimizer.zero_grad()

            # forward + backward + optimize
            outputs = model(inputs)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()


            # print statistics
            running_loss += loss.item()
            _, predicted = torch.max(outputs.data, 1)
            running_acc += (predicted == labels).sum().item()
            total += outputs.shape[0]

        print(f'epoch: {epoch + 1}, loss: {running_loss / i:.3f}')
        train_loss.append(running_loss / i)
        train_acc.append(running_acc / total)

        # validation step
        model.train(False)
        running_vloss = 0.0
        running_vacc = 0.0
        vtotal = 0.0

        for i, data in enumerate(valloader, 0):
            # get the inputs; data is a list of [inputs, labels]
            inputs, labels = data[0].to(device), data[1].to(device)

            # forward + backward + optimize
            outputs = model(inputs)
            loss = criterion(outputs, labels)

            running_vloss += loss.item()
            _, predicted = torch.max(outputs.data, 1)
            running_vacc += (predicted == labels).sum().item()
            vtotal += outputs.shape[0]

        print(f'epoch: {epoch + 1}, vloss: {running_vloss / i:.3f}')
        val_loss.append(running_vloss / i)
        val_acc.append(running_vacc / vtotal)

    print('Finished Training')

    return train_loss, train_acc, val_loss, val_acc

Now we are ready to train our model! We use 5 epochs here to get some quick results. Training for more epochs probably gives you better results, but at some point you will start to overfit.

In [None]:
epochs = 5

train_loss, train_acc, val_loss, val_acc = train_model(model,criterion,optimizer,trainloader,valloader,epochs)

We can visualize what happened during the training. Here we plot the accuracy and loss after each epoch.

In [None]:
# Plot training & validation accuracy values
plt.plot(train_acc)
plt.plot(val_acc)
plt.title('Model accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['Train', 'Val'], loc='upper left')
plt.show()

# Plot training & validation loss values
plt.plot(train_loss)
plt.plot(val_loss)
plt.title('Model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['Train', 'Val'], loc='upper left')
plt.show()

*Exercise:*
- *What happens to the model loss and model accuracy during training. Why?*
- *How can differences between the train and validation performance be explained?*

Let's check out that accuracy and some of the results again after having trained the model!

In [None]:
# Let the model predict the test set labels
x_test,y_test_pred = predict_output(testloader)

y_test = []
for i, data in enumerate(testloader):
    y_test.extend(data[1].data.numpy())
y_test = np.array(y_test)

correct = (y_test_pred == y_test)

print(f'Accuracy of the network on the 10000 test images: {100 * np.sum(correct)/len(correct)} %')

random_indices = np.random.randint(0,x_test.shape[0], 16)
visualize_predictions(x_test[random_indices], y_test[random_indices], y_test_pred[random_indices], correct[random_indices])


A nice way to visualize your results on a classification task is to create a confusion matrix. This shows for every class how often it is correctly classified or confused with another class.

In [None]:
def plt_confusion_matrix(y_test, y_test_pred, labels, normalize='true', log_cmap=False):
  cm = confusion_matrix(y_test, y_test_pred, normalize = None)
  if log_cmap:
    cm2 = cm.copy().astype(float)
    cm2[cm==cm.min()] = 0.1
    norm  = LogNorm(vmin=cm2.min(), vmax=cm2.max())
  else:
    cm2 = cm
    norm = None
  df_cm = pd.DataFrame(cm2, index=labels, columns=labels)
  plt.figure(figsize = (len(labels),len(labels)))
  sn.heatmap(df_cm, annot=cm, norm=norm, cbar=False, fmt='.4g')
  plt.xlabel("Predicted label"); 
  plt.ylabel("True label")
  plt.title("Confusion matrix")

  
plt_confusion_matrix(y_test, y_test_pred, labels=range(10), log_cmap=True)

*Exercise:*

*It is always good to have a look at the outcomes and errors as a sanity check if nothing went wrong.*
- *Which digits got confused by the neural network most often? Does that make sense to you?*

*Bonus exercise:*

*The architecture of the CNN we trained is not the only one that could work. Variants on it could also work well for the task of classifying digits.* 
- *What variables could you change in the training process and what effects do you expect these changes to have?*
 - *E.g.: What are the trade-offs to keep in mind when increasing/decreasing the size or complexity of the model?*
- *Try changing the model by altering the number of parameters (filters) of the convolutional (Conv2D) layers or adding or removing layers in the code that builds the model and see what happens!* 
- *Play around with other variables such as number of epochs, learning rate (lr) and others!* 

*(Note: If you change the code in a cell, the code cells below it have to be re-executed to work with the changes. So for example, don't forget to re-compile and re-train your model after rebuilding it!*)

# PatchCamelyon: Classifying pathology patches as benign or malignant

Now that we know how to train and test a convolutional neural network, let's use the same methods to determine malignancy of pathology patches! Will the neural network beat you at it?

## **Understanding the data**

First, we download the pathology data we will use. Again we have a train, validation and test set. Because of time and memory constraints we use only 5% of the original dataset. If you are into this you can download the complete open source dataset here: https://patchcamelyon.grand-challenge.org/. Also we train for a limited amount of epochs to not keep you waiting for too long.


In [None]:
# load dataset
!pip install --upgrade --no-cache-dir gdown
import h5py    

if not os.path.exists('./data'):
  os.mkdir('./data')

def download_pathology_data(identifier):
    #gdd.download_file_from_google_drive(file_id=identifier, dest_path='./data/temp.h5.gz', overwrite=True, showsize=True)
    !gdown --id $identifier -O './data/temp.h5.gz'
    !gunzip -f ./data/temp.h5
    x = h5py.File('./data/temp.h5','r+')['x'][()]
    y = h5py.File('./data/temp.h5','r+')['y'][()].flatten()
    os.remove('./data/temp.h5')

    return x,y

images_train, labels_train = download_pathology_data('1tLw33To0BplzqTx8OVG8okjmXaJjVE_K')
images_val, labels_val = download_pathology_data('1iEouljzAbk8AtwoufRt8WK0aqxPO5MYd')
images_test,  labels_test  = download_pathology_data('14eQ32RHVbj112zf7ud01hndyqVrAITw1')

We use the same patches for the test set as were used in the reader study earlier today. This way, in the end you can compare your performance with that of the neural network.

In [None]:
# print data shape
print('Training set input size:', images_train.shape)
print('Training set output size:', labels_train.shape)
print('Validation set input size:', images_val.shape)
print('Validation set output size:', labels_val.shape)
print('Testing set input size:', images_test.shape)
print('Testing set output size:', labels_test.shape) 

*Exercise:* 
- *What differences do you spot between the dimensions of the MNIST data and the dimensions of the pathology data?*
- *What changes would we need to make to the neural network we used earlier to account for these differences?*

*Exercise:*
- *Again, we should take a look at the input images to make sure we have done everything right so far. Change the index `i` to view different patches.*

In [None]:
i=0
plt.figure()
plt.imshow(images_train[i])
plt.title('label: {}'.format(labels_train.flatten()[i]));

*Exercise:* 
- *What does the image label (title above the plot) mean?*

Below we create the dataset and dataloader object. A TensorDataset is a useful way to organize your data. It allows you to automatically keep track of images and their labels and you can perform transformations on the data if you want, such as normalization or augmentations. 

You can also create your own custom Dataset object that allows you to further specify how your data is modified before it is used to train the network.

In [None]:
from torch.utils.data import TensorDataset
from torch.utils.data import DataLoader

batch_size = 128  

patchcam_train = TensorDataset(torch.tensor(np.moveaxis(images_train,3,1), dtype = torch.float32),torch.tensor(labels_train, dtype = torch.uint8))
patchcam_val = TensorDataset(torch.tensor(np.moveaxis(images_val,3,1), dtype = torch.float32),torch.tensor(labels_val, dtype = torch.uint8))
patchcam_test = TensorDataset(torch.tensor(np.moveaxis(images_test,3,1), dtype = torch.float32),torch.tensor(labels_test, dtype = torch.uint8))

trainloader = DataLoader(patchcam_val, batch_size = batch_size, shuffle = True)
valloader = DataLoader(patchcam_val, batch_size = batch_size, shuffle = False)
testloader = DataLoader(patchcam_test, batch_size = batch_size, shuffle = False)

## **Training and testing a convolutional neural network**

### **Building the model** 

Execute the following three cells to build and compile the same model architecture that we trained for the MNIST.

In [None]:
# define model

class Net(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(7056, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 2)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = torch.flatten(x, 1) # flatten all dimensions except batch
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x



In [None]:
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')

# assuming that we are on a CUDA machine, this should print a CUDA device:

print(device)

model = Net()
model.to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)

### **Training and testing**

Execute the following two cells to see that the model performance is worthless again before training.

In [None]:
# Let the model predict the test set labels
x_test,y_test_pred = predict_output(testloader)

y_test = labels_test
correct = (y_test_pred == y_test)

print(f'Accuracy of the network on the 10000 test images: {100 * np.sum(correct)/len(correct)} %')

In [None]:
cm = confusion_matrix(y_test, y_test_pred, normalize = None)
disp = ConfusionMatrixDisplay(confusion_matrix=cm,
                              display_labels=['benign' , 'malignant'])
disp.plot(values_format='d'); 

Now let's train our malignancy classification model!

In [None]:
epochs = 5

train_loss, train_acc, val_loss, val_acc = train_model(model,criterion,optimizer,trainloader,valloader,epochs)

Just like with training the MINIST model, we visualize the training process here.

In [None]:
# Plot training & validation accuracy values
plt.plot(train_acc)
plt.plot(val_acc)
plt.title('Model accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['Train', 'Val'], loc='upper left')
plt.show()

# Plot training & validation loss values
plt.plot(train_loss)
plt.plot(val_loss)
plt.title('Model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['Train', 'Val'], loc='upper left')
plt.show()

*Exercise:*
- *What is happening here? Is the model learning/getting better just as with the MNIST dataset?*
- *What could be reasons that the model is performing this way?*

When you understand what is happening, continue to the next section.

## **A different neural network**

So our first try did not work. Run the next two blocks to train a different kind of network architecture. Run them first, then read the code and text around it while you wait for it to learn ;)

In [None]:
model = torch.hub.load('pytorch/vision:v0.10.0', 'mobilenet_v2', pretrained=True)

model.to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)

In [None]:
epochs = 20

train_loss, train_acc, val_loss, val_acc = train_model(model,criterion,optimizer,trainloader,valloader,epochs)

The previous cell of code trains the MobileNetV2 model: a standardized network architecture. As you can see in the summary, this model has more and also different layers than our previous model.

Instead of training it from scratch with random initial weights, we train it using transfer learning. As our starting point for training, we use the architecture and weights of the MobileNetV2 model that was trained on a completely different image dataset (ImageNet). This way, we only need to fine-tune the weights to make the model work for our malignancy classification problem. 

When the model is done training, you can run the next code cell to visualize the training process.

In [None]:
# Plot training & validation accuracy values
plt.plot(train_acc)
plt.plot(val_acc)
plt.title('Model accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['Train', 'Val'], loc='upper left')
plt.show()

# Plot training & validation loss values
plt.plot(train_loss)
plt.plot(val_loss)
plt.title('Model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['Train', 'Val'], loc='upper left')
plt.show()

That's looking better! 

*Exercise:*

- *Is there a difference between the performance on the train and validation set? If so, how can you explain this difference?* 

## **The neural network's performance on the reader study**

So far, the model was only trained and tested with data from the training and validation data sets. 

Now let's see how well the model on the data you scored during the reader study earlier today!

In [None]:
# Let the model predict the test set labels
x_test,y_test_pred = predict_output(testloader)

y_test = labels_test
correct = (y_test_pred == y_test)

print(f'Accuracy of the network on the 10000 test images: {100 * np.sum(correct)/len(correct)} %')


In [None]:
# Plot confusion matrix
cm = confusion_matrix(y_test, y_test_pred, normalize = None)
disp = ConfusionMatrixDisplay(confusion_matrix=cm,
                              display_labels=['benign', 'malignant'])
disp.plot(values_format='d');

*Exercise:*
- *So who would be better suited as a pathologist? You or the trained neural network? (Did you forget your own score, go to https://grand-challenge.org/reader-studies/)*
- *What kind of mistakes were made? If you would not want to miss any malginancies with the network, which number should be the lowest of them all?*

The network outputs a probability value between 0 and 1. Rounding this number gives us a binary classification of benign (0) and malignant (1).
Check out some examples of your model's predictions.

In [None]:
# Run again if you want to see a different set of example images
random_indices = np.random.randint(0, len(x_test), 16)
visualize_predictions(np.array(x_test)[random_indices]/255, np.array(y_test)[random_indices], np.array(y_test_pred)[random_indices], np.array(correct)[random_indices])

To show the distribution of the predicted probability scores, we plot histograms.

In [None]:
plt.figure()
plt.hist(np.squeeze(y_test_pred), bins=10)
plt.title('Histogram of all predictions')
plt.ylabel('Number of images')
plt.xlabel('Probability')
plt.figure()
plt.hist(np.squeeze(y_test_pred)[correct], bins=10, color='green')
plt.title('Histogram of correct predictions')
plt.ylabel('Number of images')
plt.xlabel('Probability')
plt.figure()
plt.hist(np.squeeze(y_test_pred)[~correct], bins=10, color='red')
plt.title('Histogram of incorrect predictions')
plt.ylabel('Number of images')
plt.xlabel('Probability');

*Exercise:*
- *In what probability range were most mistakes made?*
- *Think of measures you could take as an AI-software developer or clinician to make less mistakes based on the probability values. How can clinicians and machines work together?*

*Bonus exercise:*
- *Within the Patch Camelyon section, go back to the first model architecture block (the one that is the same as we used for the MNIST). Can you create an architecture yourself that performs better? Think of changing layers, using drop-out, etc. You could also play around with parameters as batch size and learning rate. NB: Make sure it does not only perform well on the trainset but also on the validation set!*

## Camelyon heatmap code

The network we trained predicts for a patch of 96 by 96 whether or not there is malignant tissue present. However, in real life, pathologists look at way larger images. We load a larger tile of an orginal image here and use our network to locate the malignant tissue.

In [None]:
import zipfile
import numpy as np
from matplotlib import pyplot as plt
from PIL import Image
from google_drive_downloader import GoogleDriveDownloader as gdd

gdd.download_file_from_google_drive(file_id='1G7rBoPoma5cmsQSPGF05NoS4x4SQ4Qa1', dest_path='./data/camelyon-tiles.zip', overwrite=True)
with zipfile.ZipFile('./data/camelyon-tiles.zip', "r") as zip_ref:
  zip_ref.extractall("./data")
os.remove('./data/camelyon-tiles.zip')

npz = np.load("./data/camelyon-test-tile-1.npz")

tile = npz['tile']
annotations = np.asarray(Image.open("./data/camelyon-test-tile-1.png"))

In [None]:
plt.figure(figsize=(30,30))
plt.subplot(1, 3, 1); plt.imshow(tile); plt.axis('off');
plt.title('1. Larger pathology image')
plt.subplot(1, 3, 2); plt.imshow(annotations); plt.axis('off');
plt.title('2. Manual annotations for malignancy')
plt.subplot(1, 3, 3); plt.imshow(tile); plt.ylim([tile.shape[0], 0]), plt.xlim([0,tile.shape[0]])
plt.vlines(range(96,tile.shape[0], 96), ymin=0, ymax=tile.shape[0], linestyle='-', linewidth=1)
plt.hlines(range(96,tile.shape[0], 96), xmin=0, xmax=tile.shape[0], linestyle='-', linewidth=1)
plt.title('3. Grid showing the patch size we trained the network on')
plt.show()

Above we see the original image (1) and the manual annotation of the malignancy (2). To get a feel for the scale we are looking at, the most right image (3) has a grid overlay where each square is 96 by 96 such as the patches we trained the network on.

To get predictions for the larger tile we slide over the image and make a prediction for each 96 by 96 patch. We do this with overlap to create a more detailed mapping (`stride=16`).

![alt text](https://raw.githubusercontent.com/iamaaditya/iamaaditya.github.io/master/images/conv_arithmetic/full_padding_no_strides_transposed.gif)

In [None]:
tile_size = tile.shape[0]
patch_size = 96
stride = 16

# Slide over the large tile
patches = []
for hor in range(0, tile_size-patch_size, stride):
  for vert in range(0, tile_size-patch_size, stride): 
    patch = tile[hor:hor+patch_size, vert:vert+patch_size]
    patches.append(patch)

patches_test = TensorDataset(torch.tensor(np.moveaxis(patches,3,1), dtype = torch.float32))

patchloader = DataLoader(patches_test, batch_size = batch_size, shuffle = False)

# Predict the outcome of all patches in the slide at once.
_, patch_pred = predict_output(patchloader)

In [None]:
# Some resizing to make the heatmap overlay the original tile
import scipy.ndimage
heatmap_size=np.int((tile_size-patch_size)/stride)  
heat_map = np.array(patch_pred).reshape([heatmap_size,heatmap_size])
heat_map = scipy.ndimage.zoom(heat_map, stride, order=1)
heat_map_pad = np.pad(heat_map, np.int(patch_size/2), mode='constant')

In [None]:
# Plotting the results
plt.figure(figsize=(30,30))
plt.subplot(1, 2, 1); plt.imshow(tile); plt.axis('off');
plt.imshow(heat_map_pad, 'jet', alpha=0.5); 
plt.title('Heatmap of probability values for malignancy likelihood predicted by the network')
plt.subplot(1, 2, 2); plt.imshow(annotations); plt.axis('off');
plt.title('Manual annotations for malignancy')
plt.show()

This shows how a neural network trained on small images (a simpler task) can aid in the classification or even segmentation of larger images.

# Wrap up

Great! You did it!
You can download the notebook to your own pc or save it to your drive to continue playing with it at home. Try your own data or different networks and see if you can get better at it!