# Day 2 - Morning

# Deep Learning

In this section of the course we will discuss **Deep Learning**, and how to apply deep learning to your data.

Deep Learning is fundamentally different to 'classical' machine learning in a number of ways, which we will discuss in due course. 

These days, deep learning is used mostly in 

- image data
    - image classification
    - object detection (counting objects within images)
    - image segmentation (lesions in MRI for example)
- generative models, for example LLMs like ChatGPT

In Deep Learning, in contrast to what we saw yesterday, you are normally performing much feature engineering. We generally do not select features or perform feature engineering - we in fact allow the network to learn which features are important all by itself. 

Normally, in the problems we discussed yesterday, we carefully selected features, removed features, or acquired datasets where features were curated in order for us to be able to train a model that works well. In deep learning, we often leave this to the network itself to work out. We 

I will illustrate this with a demo:

<https://playground.tensorflow.org>

In the playground, we will

1. Mimick feature engineering (all except $\sin$) and train a network using these features and 8 neurons on the swiss roll data
2. Remove all engineered features and instead use multiple layers, achieving the same thing (ensure the use of ReLU activation function)

As we saw, adding layers upon layers allows the network to learn very complex functions. 

We will look at an example of MNIST (Modified National Institute of Standards and Technology) dataset demonstrates this very well. 

Imagine you had a house price dataset. Well the features are quite easy to understand, and you could probably easily remove features that were not important, such as house colour, or street number. 

This also means there is far less domain knowledge required to train neural networks. You might not neccessarily need to know the details of your data in order to train something - for example in medicine, you might not have to know the details of the RNA sequence data that you wish to classify. 

## Why *Deep* Learning?

![simple-network](./img/net-single-layer.png)

*Source:* <http://neuralnetworksanddeeplearning.com>

So, neural networks have been around for decades, and for many years were actually ignored. It is only recently that we have seen a resurgance of interest in neural networks and specifically the rise in **deep** learning.

This has happened for 2 main reasons

1. We have become very good at gathering lots of data
2. Hardware has become very powerful at training neural networks

This has led to networks getting larger and larger, meaning they can perform ever more complex tasks. In many fields, such as image classificaiton, they are in a class of their own in terms of capabilities, as is the case with machine translation, image generation, image segmentation, and so on.

![deep-network](./img/net-deep-layer.png)

*Source:* <http://neuralnetworksanddeeplearning.com>

This brings up a further point: hardware. Neural networks require lots of computing power to train, and it turns out that the devices that are best suited for performing the calculations required to train a neural network are in fact GPUs (graphics processing units) and not CPUs. Most computers, especially laptops, do not have a dedicated GPU that is powerful enough to train neural networks. However, it is still possible to train networks using a PC's CPU, it is magnitudes slower.

However, for smaller datasets, as we will see here, training can be performed on a CPU. For example training a small network that has to classify small images can be trained on a normal PC or laptop. 

Second, there are ways to get a GPU for free to perform training tasks. For example, Google Colab offers GPUs for free, and I will demo this later.

![feature-hierarchy](./img/feature-hierarchy.png)

Image source: Raphael, Dubinsky, Iluz, Netanyahu, (2020). Neural Network Recognition of Marine Benthos and Corals. Diversity. 12 (29).

### Areas Where Deep Learning is Used

Neural Networks and Deep Learning algorithms can be applied to nearly any type of problem, but they are more suited to some problems than others. For example, they tend not to be used for tabular/text data, such as those we discussed yesterday, where algorithms like random forests can still outperform them. See for example the paper: Grinsztajn L, Oyallon E, Varoquaux G. Why do tree-based models still outperform deep learning on tabular data? arXiv: <http://arxiv.org/abs/2207.08815>

So in this section we will concentrate on deep learning for image analysis, and specifically image classification. In medicine, deep learning is mostly used for image classification or image segmentation. In this seminar, we will demo image classification, but we will also discuss how segmentation is performed.  

Here are some of the fields and areas where deep learning is very successful

- NLP: In natural language processing (NLP), networks have been trained at answering questions, speech recognition, document summarisation, document classification 
- Computer vision: Satellite imagery interpretation, face recognition, image captioning, reading traffic signs, highlighting and recognising pedestrians and vehicles in autonomous vehicles 
- Medicine: Finding anomalies in radiology images, including CT, MRI, and X-ray images; counting features in pathology slides; measuring features in ultrasounds; diagnosing diabetic retinopathy
- Image segmentation: recent developments in networks such as U-Net have made deep neural networks state of the art at performing image segmentation, and is particularly used in medicine
- Biology: protein folding (DeepMind), classifying proteins, genomics tasks, protein/protein interactions
- Image generation: Colorising images, increasing image resolution (super resolution), de-noising/removing noise from images, converting images to art in the style of famous artists
- Recommendation systems: product recommendations, film recommendations, music recommendations
- Playing games: Chess, Go, Atari video games, and many real-time strategy games
- Robotics: Boston Dyamics robots that can run, climb, and so on.

(Some of the points above from: <https://course.fast.ai> which provides some very good introductory material on Deep Learning.)

Also see this dataset: <https://www.robots.ox.ac.uk/~vgg/data/pets/>

## Funny Joke

Deep learning has moved very fast in the past few years. Here is an example from the comic xkcd:

<img src="./img/task.png" width="400"/>

*Source*: <https://xkcd.com/1425/>

This comic is not particulary old, and the text suggests that to build a system that can recognise birds would be 'virtually impossible', yet what we will see now is that in fact, we can perform a classification of this type very quickly, and we will do so during this course.

## How Neural Networks Work

Here we will describe briefly the concepts behind neural networks. 

They are called neural networks because they are based on the idea of neurons and loosely mimick the neurons in the (human) brain.

![neuron](./img/chapter7_neuron.png)

*Source:* <https://colab.research.google.com/github/fastai/fastbook/blob/master/01_intro.ipynb>

The neuron in neural network is based loosely on the biological neuron. In a neural network, we have a lot of interconnected neurons organised in layers, where neurons can fire or not fire based on their inputs and a weight. The weight associated with the neuron controls if it fires or not, and these weights are adjusted (or learned) during the training of the network. Ultimately, a network's ability to perform some function is based on the firing of these neurons. The neuron fires based on a threshold which is controlled by something called an activation function. By 'fires' we mean it outputs something (a value). The threshold is controlled using a weight. These weights are what are learned when the network is training. So what the network learns, is which neurons should fire and which should not.

These weights are updated when you train the network. During training you pass data through the network, and the weights are updated based on the error or loss of the network. If the error is very high for a given prediction during training, these weights are nudged more than if the error was very small.

Hopefully you now have some understanding how neural networks work, and how they are trained. Check the TensorFlow Playground website again if you wish to visually verify any of these concepts.

## Today's Topics

**Morning**:
- Deep Learning Frameworks
- PyTorch
- Training a simple network
- Training a deep network
- Medical Image Segmentation
- Walkthrough of a Medical Image classification algorithm
- Exercise 1

**Afternoon**:
- Deploying a deep learning model
- Training with GPUs and Exercise 2

---
# PyTorch

There are a number of frameworks available for neural network development:

- PyTorch (which we will use during this course), from Facebook
- TensorFlow, from Google
- Keras (front-end for TensorFlow and PyTorch, higher level API)
- FastAI (high level API, good for quick experimentation, and getting a model trained quickly and easily)

As mentioned, we will be using Torch during this course. Torch is probably the most popular framework for deep learning currently. 

Unfortunately, traing neural networks require a lot of boilerplate code in order to get running. 

However, there are easier approaches and I will demonstrate these after we have covered how a network is trained generally.

First some imports:

In [None]:
import torch
import torchvision
import torchvision.transforms as transforms
from torch.utils.data import DataLoader
import torch.nn as nn
import torch.optim as optim
from tqdm.notebook import tqdm
import matplotlib.pyplot as plt
import itertools
import torchinfo
import seaborn as sns

## Classifying Hand Written Digits

In this example, we are going to train a simple neural network to learn how to classify hand-written digits. 

MNIST is included in PyTorch, but there are many datasets to choose from, see the PyTorch documentation for details: <https://pytorch.org/vision/stable/datasets.html>

The MNIST dataset consists for 60,000 training images and 10,000 testing images. They are only 28x28 pixels in size, meaning it is a very popular dataset for benchmarking and demonstrating neural networks.

![MNIST](./img/MnistExamplesModified.png)

By Suvanjanprasai - Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=132282871

So, the aim of this task is to train a neural network to recognise the digits. In other words, once trained, the network should be able to predict the digit contained in an image passed to the network.

Learning to recognise digits is a **supervised** machine learning task: the vast majority of neural network based trained is supervised. Neural networks can be used for both classification and regression. In the case of MNIST, it is a supervised, multi-class classification.

Training a neural network in PyTorch is a relatively complex task. However, later we will discuss much easier ways to do this.

For now, however, we will demonstrate how this is done by first defining a neural network, training it over a number of **epochs** and then testing it on some unseen data. An epoch is one pass through 

The general procedure for training a neural network is as follows:

1. Define your dataset, and any transformations you wish to perform on the data
2. Define the neural network structure itself. This requires piecing together the layers and define the network's shape and structure, and so on
3. Define the network's optimiser. You might remember from yesterday that algorithms in Sci-Kit Learn do not normally require you to define the optimisation algorithm. This is not true for PyTorch, as an optimiser must be defined explicitly.
4. Define the network's loss: this depends on your problem, e.g. classification requires a certain type of loss function, etc. 
5. Train the network: you train it over a number of epochs that you decide on

Note: this is the procedure when you want to train a network from scratch. There are easier options, and we will see these later.

In [None]:
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,))])

# Set download to true if the data is not already downloaded.
train_set = torchvision.datasets.MNIST(root='./data', train=True, download=True, transform=transform)
test_set = torchvision.datasets.MNIST(root='./data', train=False, download=True, transform=transform)

train_loader = DataLoader(train_set, batch_size=64, shuffle=True)
test_loader = DataLoader(test_set, batch_size=64, shuffle=False)

## Creating a Simple Network

In [None]:
# Step 2: Define the Neural Network Model
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.fc1 = nn.Linear(28 * 28, 500)  # Size must match the input
        self.fc2 = nn.Linear(500, 10)

    def forward(self, x):
        x = x.view(-1, 28 * 28)
        x = torch.relu(self.fc1(x))
        x = self.fc2(x)
        return x
    
net = Net()   

torchinfo.summary(net)

## Define Loss and Optimizer

Next we define our loss function and optimizer. The loss function is something which gives us a numeric value for how well or poorly the network has made a prediction. In other words,  it measures how far a neural network's predictions are from the correct answers. The higher the loss, the worse the prediction. 

An optimiser is the algorithm that updates and adjusts the neural network's weights to reduce this loss. It decides how to adjust these values to make the network produce a better prediction next time it sees the data, helping the network learn from its mistakes.

We won't discuss the details of loss functions or optimisers in this course, only to say that for classification tasks Cross Entropy Loss is very commonly used, and common optimisers include SGD (Stochastic Gradient Descent) or Adam.

In [None]:
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(net.parameters(), lr=0.001)

## Train

Now, we will train our network over 3 epochs. 

That means we will pass all our training data through the network 3 times in total.

Execute this code to do so:

In [None]:
def train(net, train_loader, optimizer, criterion, epochs=5):
    for epoch in tqdm(range(epochs)):
        for i, data in enumerate(train_loader, 0):
            inputs, labels = data
            optimizer.zero_grad()
            outputs = net(inputs)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()

train(net, train_loader, optimizer, criterion, 3)

## Evaluate the Model

Now that the model is trained, we can evaluate it. 

This means, we will take our test set data, and ask the network to make predictions for each of the images in the test set.

Then it will output an average accuracy, that is how many times the network was correct in its prediction.

Run the following code to evaluate the network:

In [None]:
def evaluate(net, test_loader):
    correct = 0
    total = 0
    with torch.no_grad():
        for data in test_loader:
            images, labels = data
            outputs = net(images)
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()
    accuracy = 100 * correct / total
    return accuracy

accuracy = evaluate(net, test_loader)
print(f"Model accuracy: {accuracy}%")

So we can see the network was correct over 95% in its predictions, given a new image from the test set.

Later, we will see how to evaluate neural networks more carefully, including looking at the classification per class, and we will see how merely looking at the accuracy is not sufficient most of the time. However for now, we just want to demonstrate how to train a simple network. 

Now, we will show some predictions from the network. Running the following code will do so:

In [None]:
iterator = iter(test_loader)

# Get the next batch from the iterator
#images, labels = next(iterator)

# Or loop over the first 1 batches: (use for images,labels in iterator: for all of them)
for images,labels in itertools.islice(iterator, 1):

    for image, label in zip(images, labels):
        image_for_pred = image.unsqueeze(0)  # Add a batch dimension
        
        with torch.no_grad():
            output = net(image_for_pred)
            predicted_label = output.argmax(dim=1, keepdim=True)

        # Show false positives by saying 'is not; here
        if predicted_label.item() is label.item():
            plt.imshow(image.squeeze(), cmap='gray')
            plt.title(f'Actual Label: {label}')
            plt.show()
            print(f'Predicted Label: {predicted_label.item()}')

# CIFAR10

Let's move on to a more difficult example, where we have images of natural objects from 10 distinct classes. The images are also in colour. 

In this section we will take a look at more classification metrics, and how to better understand how well your network is performing.

For this tutorial, we will use the CIFAR10 dataset. It has the classes: 'plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', and 'truck'. The images in CIFAR10 are of size 3x32x32, in other words 3 channel (RGB) images, 32x32 pixels in size.

Here are some examples of the images contained in the CIFAR10 dataset:

![CIFAR-Examples](./img/cifar10.png)

Let's get the data, which we will get using Torch's `datasets` module, 

In [None]:
import torch
import torchvision
import torchvision.transforms as transforms
from torchvision.utils import make_grid
import torch.optim as optim
import torch.nn as nn
import torch.nn.functional as F

import torchinfo
from tqdm.notebook import tqdm
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# These are transformations applied to every image in the dataset
# They can include image resizing, coverting to greyscale, and so on
transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

batch_size = 32

trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=batch_size,
                                          shuffle=True, num_workers=2)

testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                       download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=batch_size,
                                         shuffle=False, num_workers=2)

classes = ('plane', 'car', 'bird', 'cat',
           'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

One thing that always makes sense is to take a look at the distribution of the data, first with the training set:

In [None]:
unique, counts = np.unique(trainset.targets, return_counts=True)
pd.DataFrame(np.asarray((unique, counts)).T, columns=['class id', 'count'], index=classes)

You can see that the classes are perfectly evenly distributed. This is by far not always the case. Often you will get datasets with a very wide class distribution. We will see examples of this later, and how this affects how we evaluate the network.

Let's also take a look at the distrubtion of the test set:

In [None]:
unique, counts = np.unique(testset.targets, return_counts=True)
pd.DataFrame(np.asarray((unique, counts)).T, columns=['class id', 'count'], index=classes)

We can preview some of the images from the dataset and also see their labels:

In [None]:
# functions to show an image
def imshow(img):
    img = img / 2 + 0.5 # unnormalize
    npimg = img.numpy()
    plt.imshow(np.transpose(npimg, (1, 2, 0)))
    plt.show()

# get some random training images
data_iterator = iter(trainloader)
images, labels = next(data_iterator)

# show images
imshow(make_grid(images))

# print labels
labels_t = [f'{classes[labels[x]]}' for x in range(batch_size)]
print(labels_t)

Let's now define a network. 

This is a more complex network that the one above, as we will see when we print its structure and see how many parameters it has.

In [None]:
class Net(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = torch.flatten(x, 1) # flatten all dimensions except batch
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

net = Net()

torchinfo.summary(net)

Again, we define an optimiser and a loss:

In [None]:
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

And train over 5 epochs (the data is passed through the network 5 times):

In [None]:
for epoch in range(5):  # loop over the dataset multiple times

    running_loss = 0.0
    
    for i, data in enumerate(tqdm(trainloader), 0):
        # get the inputs; data is a list of [inputs, labels]
        inputs, labels = data

        # zero the parameter gradients
        optimizer.zero_grad()

        # forward + backward + optimize
        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        # print statistics
        #running_loss += loss.item()
        #if i % 2000 == 1999:    # print every 2000 mini-batches
        #    print(f'[{epoch + 1}, {i + 1:5d}] loss: {running_loss / 2000:.3f}')
        #    running_loss = 0.0

print('Finished Training')

Now that we have finished training, we can test the network on the held out test set! 

First let's look at a batch from the test set, along with their labels:

In [None]:
data_iterator = iter(testloader)
images, labels = next(data_iterator)

# Create a grid of images
grid_img = torchvision.utils.make_grid(images)

# Denormalize: Convert from [-1,1] to [0,1] range
grid_img = (grid_img + 1) / 2

# Convert to the format matplotlib expects
plt.figure(figsize=(12, 6))
plt.imshow(grid_img.permute(1, 2, 0).cpu().numpy())
plt.axis('off')
plt.title('GroundTruth: ' + ' '.join(f'{classes[labels[j]]:5s}' for j in range(len(labels))))
plt.show()

And now check out the predictions:

In [None]:
outputs = net(images)
# outputs[0]
# classes[np.argmax(outputs[0])]

_, predicted = torch.max(outputs, 1)

print('Predicted: ', ' '.join(f'{classes[predicted[j]]}'
                              for j in range(batch_size)))

We can get the accuracy of across all images in the test set:

In [None]:
correct = 0
total = 0

# since we're not training, we don't need to calculate the gradients for our outputs
with torch.no_grad():
    for data in testloader:
        images, labels = data
        # calculate outputs by running images through the network
        outputs = net(images)
        # the class with the highest energy is what we choose as prediction
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print(f'Accuracy of the network on the 10000 test images: {100 * correct // total} %')

This is much better than random guessing, which would be around 10%. And we trained for about 1 minute, and we were using massively downsized images.

## Evaluating a Neural Network

To properly evaluate the network, we cannot rely on just the average accuracy above, especially when we have 10 classes or more (later we will see a 1000 class classification problem).

Let's see the accuracy by class:

In [None]:
# prepare to count predictions for each class
correct_pred = {classname: 0 for classname in classes}
total_pred = {classname: 0 for classname in classes}

cr_true_labels = []
cr_pred_labels = []

# again no gradients needed
with torch.no_grad():
    for data in testloader:
        images, labels = data
        outputs = net(images)
        _, predictions = torch.max(outputs, 1)
        
        # collect the correct predictions for each class
        for label, prediction in zip(labels, predictions):
            cr_true_labels.append(int(label))
            cr_pred_labels.append(int(prediction))
            if label == prediction:
                correct_pred[classes[label]] += 1
            total_pred[classes[label]] += 1

# print accuracy for each class
for classname, correct_count in correct_pred.items():
    accuracy = 100 * float(correct_count) / total_pred[classname]
    print(f'Accuracy for class {classname:5s} is {accuracy:.1f} %')

We can use the `classification_report()` function from SciKit Learn that we used several times yesterday to print a report. This works even though we did not use SciKit Learn to train the model at all (as it just requires lists of predictions and labels):

In [None]:
from sklearn.metrics import classification_report
print(classification_report(cr_true_labels, cr_pred_labels, target_names=classes))

And we can also plot a confusion matrix:

In [None]:
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(cr_true_labels, cr_pred_labels)

In [None]:
plt.figure(figsize=(10, 8))
# sns.set(font_scale=1.2)  # Adjust font size
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', 
            xticklabels=classes, 
            yticklabels=classes)

plt.xlabel('Predicted Labels')
plt.xticks(rotation=45)
plt.ylabel('True Labels')
plt.yticks(rotation=0)
plt.title('Confusion Matrix')
plt.tight_layout()

Here you see the confusion across classes of the model's predictions. 

For example, you can quickly see that cat has a lot of confusion, as it is often confused with dog, or that truck is often confused for car.

## Saving a Model

Once you have trained a model, you can saved it to be used later. 

In essence what you are really doing is saving the model's weights. That means that the larger the network, the bigger the saved model. 

In the case of a network where each weight is some 32-bit precision number (float32), as is the generally the default, then 4 bytes are required per weight. 

So, for a model with 6 million parameters, we have 4 x 6 million bytes = 24 million bytes, which is 24MB. 

You can also see how large some of the large language models are. For example Llama from Facebook has 65 billion parameters, this is 260GB of weights for the model. Often, however, you will see 16 bit floats or even less precision, and the files sizes can be smaller, at 1 byte per parameter, which still results in 65GB file size and 65GB of GPU memory required to run the file.

Luckily, our trained model is much smaller:

In [None]:
torchinfo.summary(net, (batch_size, 3, 32, 32))

We have 62,006 parameters, which makes 248,024 bytes, which is only 0.25MB. As you can see there is some overhead, so the entire model is about 2.31 MB. 

To save the model, we do the following:

In [None]:
torch.save(net.state_dict(), './cifar10-model.pt')

We can have a look at the size of the file:

In [None]:
%whos

In [None]:
! ls -lAhF ./cifar10-model.pt

A very small model in fact! 

To load a model, it is a matter of a few lines of code also:

```python
model = TheModelClass(*args, **kwargs)
model.load_state_dict(torch.load(PATH))
model.eval()
```

We will load a pretrained model in an example later.

---

# PyTorch Hub

PyTorch Hub lets you download pre-trained models very easily.

What is a pretrained model? This is a model that has been trained already, perhaps on millions of images, and where the weights have been made available. 

Using a pre-trained model saves you the effort of training a model for a long time with data you don't have. 

These pre-trained models can be used as is, or can be fine-tuned to suit your specific task. We will discuss fine-tuning later.

Here we download ResNet18 and ask that we retrieve the pre-trained weights also. Note that is has been trained on ImagetNet, a 1,000 class classification problem. We will look at the classes soon.

Download the model as follows:

In [None]:
import torchvision
from torchvision.models import ResNet18_Weights
import torch

# Get a ResNet18 pretrained model
# model = torch.hub.load('pytorch/vision:v0.10.0', 'resnet18', pretrained=True)
model = torch.hub.load('pytorch/vision:v0.10.0', 'resnet18', weights=ResNet18_Weights.DEFAULT)

# Important to set the model to evaluation mode before trying to get predictions. 
model.eval()

Once downloaded and loaded, it will print this information.

Let's make it predict an image:

First, we download an example image:

In [None]:
# Download an example image from the pytorch website
import urllib
url, filename = ("https://github.com/pytorch/hub/raw/master/images/dog.jpg", "dog.jpg")
try: urllib.URLopener().retrieve(url, filename)
except: urllib.request.urlretrieve(url, filename)

Then we convert this image in to a format we can view, and preview it here in the notebook:

In [None]:
from PIL import Image
input_image = Image.open(filename)
input_image

Now we can ask the network to make its predictions based on the dog image above:

In [None]:
from torchvision import transforms

preprocess = transforms.Compose([
    transforms.Resize(224),
    #transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])

input_tensor = preprocess(input_image)
input_batch = input_tensor.unsqueeze(0) # create a mini-batch as expected by the model

# move the input and model to GPU for speed if available
#if torch.cuda.is_available():
#    input_batch = input_batch.to('cuda')
#    model.to('cuda')

with torch.no_grad():
    output = model(input_batch)
    
# Tensor of shape 1000, with confidence scores over ImageNet's 1000 classes
# print(output[0])
# The output has unnormalized scores. To get probabilities, you can run a softmax on it.
probabilities = torch.nn.functional.softmax(output[0], dim=0)
print(probabilities)

As you can see, the raw output is a list of 1,000 probabilities. Each probability is a confidence score of the how likely the network thinks the image belongs to that class.

These probabilities sum to 1, as we can see:

In [None]:
probabilities.sum()

Let's take the top 5 probabilities which would be the top 5 predictions of which class this image belongs to:

In [None]:
# Read the categories
with open("./data/imagenet_classes.txt", "r") as f:
    categories = [s.strip() for s in f.readlines()]

# Show top categories per image
top5_prob, top5_catid = torch.topk(probabilities, 5)
for i in range(top5_prob.size(0)):
    print(categories[top5_catid[i]], top5_prob[i].item())

Looks like the network was right, even recognising the difference between a samoyed and a white wolf.

# Summary

- We use neural networks for a variety of tasks
- Building models from scratch requires quite a lot of in depth knowledge of how neural networks operate
- Using pre-built, and pre-trained models we can quickly get started without needing in depth knowledge
- You evaluate neural networks in them much the same way as other algorithms such as random forests, however because they are trained over epochs, you also need to watch out for overfitting and watch the network learn over time

# Large Language Models

In recent years, large language models (LLMs) have gained a lot of attention due to products such as ChatGPT. These are a type of network known as generative models. 

Generative models also exist in the area of imaging, however we will not cover this. The technlogy behind most of the image generation technology are known as diffusion models, you can read more about this here: <https://en.wikipedia.org/wiki/Diffusion_model>

For this we will be using the Transformers package from Hugging Face. This package:

> Transformers provides APIs and tools to easily download and train state-of-the-art pretrained models. Using pretrained models can reduce your compute costs, carbon footprint, and save you the time and resources required to train a model from scratch. These models support common tasks in different modalities, such as:
>
> - 📝 Natural Language Processing: text classification, named entity recognition, question answering, language modeling, code generation, summarization, translation, multiple choice, and text generation.
> - 🖼️ Computer Vision: image classification, object detection, and segmentation.
> - 🗣️ Audio: automatic speech recognition and audio classification.
> - 🐙 Multimodal: table question answering, optical character recognition, information extraction from scanned documents, video classification, and visual question answering.
>
> *Source: <https://huggingface.co/docs/transformers/en/index>*

If we visit the link above, we will see that there are literally hundreds of models that you can download.

Most of these models can be fine-tuned to your needs, and come pre-trained.

LLMs are very versitile and can be used for a number of tasks. For example, they can be used to summarise text, or for sentiment analysis.

LLMs use the concept of tokenisers to split your text in to tokens. These can more or less be considered words. 

LLMs are trained on really large datasets of text, where the job of the LLM is to predict the next token given some text. 

Networks are trained using sequences of text, which are gathered from books, newspapers, journal articles, and so on (know as a corpus). Within this dataset, we have millions, if not billions of sentences. For example, let's say that this sentence appears in our corpus:

```
The weather is very nice today
```

We can take a subset of that text, and leave out the last word. Meaning we know the next token in the sentence (`today`).

So, we take the following:

```
The weather is very nice 
```

And we ask network to predict the next word. We know the correct answer is `today`. Therefore if the network returns:

```
The weather is very nice yesterday
```

We can then tell the network this was incorrect, and update the network's weights accorindly. 

If however, the network predicted the following:

```
The weather is very nice today
```

then we know this is correct, as we know that `today` is the correct answer. We can therefore update the network's weights accordingly. 

In [None]:
from transformers import pipeline
from transformers import DistilBertTokenizer, DistilBertForSequenceClassification
from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import GPT2LMHeadModel, GPT2Tokenizer

Now we will load the tokeniser and 

In [None]:
tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-uncased')

In [None]:
# Demonstrate tokenization
text = "Welcome to the ATSP 2025 course!"
tokens = tokenizer.tokenize(text)
token_ids = tokenizer.encode(text)

print("Original text:", text)
print("Tokens:", tokens)
print("Token IDs:", token_ids)

We can go from tokenised text back to original text in the same way:

In [None]:
# Show token-to-text conversion
decoded = tokenizer.decode(token_ids)
print("Decoded text:", decoded)

Why is tokenisation performed? Basically it is so that words, which are variable in length, can be treated as fixed length vectors. This makes it much easier to train a neural network, as every word that is input is a vector of length 5.

### Sentiment Analysis

In [None]:
# Load sentiment analysis pipeline
sentiment_analyzer = pipeline('sentiment-analysis', model='distilbert/distilbert-base-uncased-finetuned-sst-2-english')

# Interactive examples
texts = [
    "I really like the ATSP 2025 class a lot.",
    "I'm not sure I understand the concepts Marcus is trying to teach me.",
    "Learning about LLMs and AI is exciting! But I am not sure how well it will help me in my day to day life."
]

for text in texts:
    result = sentiment_analyzer(text)
    print(f"Text: {text}")
    print(f"Sentiment: {result[0]['label']}")
    print(f"Confidence: {result[0]['score']:.2f}\n")

Sentiment analysis is not just limited to positive or negative, it could also be used for some kind of patient wellbeing analysis or anything along those lines. 

## Named Entity Recognition

What is particularly used in medicine is a sub-category of nautrual language processing called Named Entity Recognition. 

From the Wikipedia article:

> Named-entity recognition is a subtask of information extraction that seeks to locate and classify named entities mentioned in unstructured text into pre-defined categories such as person names, organizations, locations, medical codes, time expressions, quantities, monetary values, percentages, etc.

For example, it might be useful to extract ICD codes from medical texts using such a technique. Often, such extraction techniques use rule-based approaches that are quite prone to failure.

Let's try this out using Transformers:

Example text:

> Mrs. Finley presents today after having a new cabinet fall on her last week (W20.8XXA), suffering a concussion, as well as some cervicalgia. She was cooking dinner (Y93.G3) at the home she shares with her husband. She did not seek treatment at that time. She states that the people that put in the cabinet in her kitchen (Y92.010) missed the stud by about two inches. Her husband, who was home with her at the time told her she was 'out cold' for about two minutes (S06.0X1A). The patient continues to have cephalgias since it happened, primarily occipital, extending up into the bilateral occipital and parietal regions. The headaches come on suddenly, last for long periods of time, and occur every day. They are not relieved by Advil (G44.311). She denies any vision changes, any taste changes, any smell changes. The patient has a marked amount of tenderness across the superior trapezius.
> 
> *Source: https://www.aapc.com/icd-10/icd-10-documentation-example.aspx*

Once extracted, we can look at specific tokens that it found:

In [None]:
tokeniser = AutoTokenizer.from_pretrained("ugaray96/biobert_ncbi_disease_ner")

model = AutoModelForTokenClassification.from_pretrained("ugaray96/biobert_ncbi_disease_ner")

ner_pipeline = pipeline("ner", model=model, tokenizer=tokeniser)

# Source: https://huggingface.co/ugaray96/biobert_ncbi_disease_ner
text = """The patient was diagnosed with lung cancer and started 
chemotherapy. He also haas a history of diabetes and heart disease."""

# Source: https://www.aapc.com/icd-10/icd-10-documentation-example.aspx
long_text = """
Mrs. Finley presents today after having a new cabinet fall on her last week 
(W20.8XXA), suffering a concussion, as well as some cervicalgia. She was 
cooking dinner (Y93.G3) at the home she shares with her husband. She 
did not seek treatment at that time. She states that the people that 
put in the cabinet in her kitchen (Y92.010) missed the stud by about 
two inches. Her husband, who was home with her at the time told her 
she was out cold for about two minutes (S06.0X1A). The patient continues 
to have cephalgias since it happened, primarily occipital, extending 
up into the bilateral occipital and parietal regions. The headaches 
come on suddenly, last for long periods of time, and occur every day. 
They are not relieved by Advil (G44.311). She denies any vision changes, 
any taste changes, any smell changes. The patient has a marked amount 
of tenderness across the superior trapezius.
"""

result = ner_pipeline(long_text)

diseases = []

for entity in result:
    if entity["entity"] == "Disease":
        word = entity['word']
        word = word.strip()
        word = entity['word'][2:] if entity['word'].startswith('##') else entity['word']
        diseases.append(f"{word} ")
    elif entity["entity"] == "Disease Continuation" and diseases:
        word = entity['word']
        word = word.strip()
        word = entity['word'][2:] if entity['word'].startswith('##') else entity['word']
        diseases[-1] += f"{word}"

print(f"Diseases: {', '.join(diseases)}")

## Question Answering

In [None]:
qa_pipeline = pipeline("question-answering", model='distilbert/distilbert-base-cased-distilled-squad')

In [None]:
context = """Mrs. Finley presents today after having a new cabinet 
fall on her last week (W20.8XXA), suffering a concussion, as 
well as some cervicalgia. She was cooking dinner (Y93.G3) at 
the home she shares with her husband. She did not seek treatment 
at that time. She states that the people that put in the cabinet 
in her kitchen (Y92.010) missed the stud by about two inches. 
Her husband, who was home with her at the time told her she 
was out cold for about two minutes (S06.0X1A). The patient 
continues to have cephalgias since it happened, primarily 
occipital, extending up into the bilateral occipital and parietal 
regions. The headaches come on suddenly, last for long periods of 
time, and occur every day. They are not relieved by Advil (G44.311). 
She denies any vision changes, any taste changes, any smell changes. 
The patient has a marked amount of tenderness across the superior trapezius."""

questions = [
    "What happened to Mrs. Finley?",
    "What did her husband do?",
    "Has she taken medication?",
    "What are the ICD codes from this text?"
]
# Get answers to the questions
for question in questions:
    answer = qa_pipeline(question=question, context=context)
    print(f"Question: {question}")
    print(f"Answer: {answer['answer']}")
    print(f"Confidence: {answer['score']:.4f}\n")

### Text Generation

We can of course use LLMs to generate text also.

In [None]:
# Load GPT-2 model and tokenizer
generator = pipeline('text-generation', model='gpt2')
generator.generation_config.pad_token_id = tokenizer.pad_token_id  # Stop warnings about tokeniser padding

# Generate text from prompts
prompts = [
    "Artificial Intelligence is",
    "The future of technology will",
    "Students learning about AI should"
]

for prompt in prompts:
    generated = generator(prompt, max_length=50, num_return_sequences=1, truncation=True)
    print(f"Prompt: {prompt}")
    print(f"Generated: {generated[0]['generated_text']}\n")

## Translation

We can even use it to translate

In [None]:
en_to_de = pipeline('translation_en_to_de', model='google-t5/t5-base')

german_result = en_to_de("""The patient 
continues to have cephalgias since it happened, primarily 
occipital, extending up into the bilateral occipital and parietal 
regions. The headaches come on suddenly, last for long periods of 
time, and occur every day. They are not relieved by Advil""")

Once this is complete, we can view the results:

In [None]:
print(german_result)

## Text Summarisation

In [None]:
summarise = pipeline("summarization", model='sshleifer/distilbart-cnn-12-6')

Now that this has loaded, we can summarise text:

In [None]:
long_text = """Mrs. Finley presents today after having a new cabinet 
fall on her last week, suffering a concussion, as 
well as some cervicalgia. She was cooking dinner at 
the home she shares with her husband. She did not seek treatment 
at that time. She states that the people that put in the cabinet 
in her kitchen missed the stud by about two inches. 
Her husband, who was home with her at the time told her she 
was out cold for about two minutes. The patient 
continues to have cephalgias since it happened, primarily 
occipital, extending up into the bilateral occipital and parietal 
regions. The headaches come on suddenly, last for long periods of 
time, and occur every day. They are not relieved by Advil. 
She denies any vision changes, any taste changes, any smell changes. 
The patient has a marked amount of tenderness across the superior trapezius."""

summary_short = summarise(long_text, max_length=75, min_length=30, do_sample=False)
summary_medium = summarise(long_text, max_length=150, min_length=50, do_sample=False)

print("Text Summarization Pipeline Results:")
print("Original text length:", len(long_text.split()))
print("\nShort Summary:")
print(summary_short[0]['summary_text'])

print("\nMedium Summary:")
print(summary_medium[0]['summary_text'])

print('\n' + '-' * 50)

And we could translate the summary in to German:

In [None]:
en_to_de(summary_short[0]['summary_text'])

# Medical Image Segmentation

In this section we will discuss the area of medicine where deep learning has probably the largest impact, that is in image segmentation. 

![segmentation](./img/segmentation-numbered.png)

*Source:* University of Waterloo Faculty of Mathematics, <https://wiki.math.uwaterloo.ca/statwiki/index.php?title=Mask_RCNN>

As you can see from the image above, there are several types of problems that can be addressed using deep learning. We have seen image classification above, but there is also 1) Image Recognition, where several objects can be detected in an image (for example lung nodules), 2) Object Detection, where bounding boxes are placed around the objects, 3) Semantic Segmentation, where different classes of object are detected on a pixel by pixel basis, and 4) Instance Segmentation where classes and instances of classes are detected. 

In medical imaging, currently by far the most research effort is spent on tasks 3 and 4, Semantic Segmentation and Instance Segmentation, and in particular 3D image segmentation, such as CT or MR imaging in the area of Radiology. Of course, 2D segmentation is also a topic of interest, for example Whole Slide image analysis in the field of Pathology. 

In this section we will look at some of ways in which you could train such a network, and what kind of data is required to do so. 

We will not train such a model live, as 3D segmentation models tend to require days or even weeks of training time.

## Foundation Models

Meta recently released SAM, the Segment Anything Model. This is a model known as a 'foundation model' and has been trained on 10s of millions of images, from many different areas. Therefore, these are general segmentation models, and can perform segmentation on a wide variety of different 

This contrasts to the very specific networks that you might train for 

We can take a look at the SAM demo, and use it to segment a brain MRI.

https://sam2.metademolab.com/

Note that these foundation models, however, require some form of user input to work. They are not completely automated. For this, a specific model for a specific tumour type, for example, needs to be trained. 

## U-Nets

U-Nets are a special type of deep network that are used specifically for the task of learning. We will not discuss them in detail, only to mention that they are the current state of the art at image segmentation. 

If you wish to use a U-Net to perform segmentation, then by far the most straighfoward way to do this is using nnUNet, a framework for automatically configuring U-Nets: See: <https://github.com/MIC-DKFZ/nnUNet>

## Data 

The biggest hurdle in developing segmentation models is getting **annotated** data. If you think about MR, like the brain MR we saw previously, each layer of the MR needs to be carefully annotated with segmentations by a radiologist. This is manual work and requires expertise and experience. For rare diseases, there may not be that many radiologists who are qualified to perform the annotations. Hence, getting enough data to train these deep segmentation models can be a challenging task.

There are a number of resources for finding segmentation 

### The Cancer Imaging Archive

The Cancer Imaging Archive is a repository of image data that is freely and openly available. 

As an example, here is a dataset regarding Soft Tissue Sarcoma: <https://www.cancerimagingarchive.net/collection/soft-tissue-sarcoma/>. This is a collection of MR and CT images of various sarcoma, including liposarcoma and fibrosarcoma. There are a total of 51 patients in the dataset, and the MR images have been annotated by radiologists, meaning the tumours have been highlighted and annotated manually for every layer of each MR. 

In order to navigate it, do the following:

- URL: https://www.cancerimagingarchive.net
- Navigate to "Access the Data" -> "Data Portals Dashboard"
- From here, you can access Radiology Portal or the Pathology Portal, as well as the Browse Collections feature

If you open the Radiology portal for example, you will eventually ger here: <https://nbia.cancerimagingarchive.net/nbia-search/> 

The browser lets you filter by modaility, such as MR, anatomical site, such as prostate or pancreas, and by study. Studies are collections of images that might come from different anatomical sites and institutions, but are grouped due to being part of some study. Once you have narrowed down your search, click the "Search Results" tab in order to see the data.

To download data, you need to use the NBIA (National Biomedical Imaging Archive) Data Retriever. When you browse TCIA, you can add data to a cart, and once you are finished, you use the NBIA Data Retriever to download all the data in bulk.

---

# Medical Image Analysis: An Example

In the following paper by Beck, et al. we see the following:

![Patho](./img/patho-workflow.png)

*Source*: Beck, A.H., Sangoi, A.R., Leung, S., Marinelli, R.J., Nielsen, T.O., Van De Vijver, M.J., West, R.B., Van De Rijn, M. and Koller, D. Systematic analysis of breast cancer morphology uncovers stromal features associated with survival. Science Translational Medicine, 3(108), 2011. <https://www.science.org/doi/10.1126/scitranslmed.3002564>

This highlights exactly the difference between the classical approache that we looked at yesterday, and the deep learning approach. 

The workflow from the paper above is time consuming, requires much more expertise and effort, as well as a lot of domain knowledge.

The deep networks, you could likely skip the vast majority of that work, and simply train a network to do this from end to end, without wortying about any of these intermediate steps. It would require none of the preprocessing and expertise and domain knowledge that was required to do the work in the paper above.

This is a quite recent development, the paper above is from 2011 and certainly before deep learning became mainstreeam. 

We can illustrate this now by training a network to predict skin lesions types.

# MedMNIST

We have seen MNIST, let's take a look at MedMNIST.

MedMNIST is a collection of pre-processed medical image datasets, designed specifically for benchmarking machine learning models in the context of healthcare. To demonstrate its use, we will walk you through the process of loading some of its datasets and previewing the data using Python.

Let's make a number of imports and define a function to view the images easily, as we will do this several times:

In [None]:
import medmnist
from medmnist import Evaluator
import numpy as np
import matplotlib.pyplot as plt
import random

# Let's also define a function to visualise images easily later
def visualise_medmnist(data, num_images=6):
    plt.figure(figsize=(12, 8))

    for i in range(num_images):
        plt.subplot(2, 3, i + 1)
        idx = random.randint(0, len(data) - 1)
        image, label = data[idx]
        # image = image.squeeze()  # Remove channel dimension if it exists
        plt.imshow(image, cmap='gray')
        plt.title(f'Label: {label}')
        plt.axis('off')

    plt.tight_layout()
    plt.show()

There are several datasets included in MedMNIST (see <https://medmnist.com>), covering a wide arary of medical image types:

![MedMNIST](./img/MedMNIST.png)

*Source*: Yang J, Shi R, Wei D, Liu Z, Zhao L, Ke B, Pfister H, Ni B. MedMNIST v2-A large-scale lightweight benchmark for 2D and 3D biomedical image classification. Nature Scientific Data. 2023 Jan 19;10(1):41.

The data included in MedMNIST covers several types of medical imagery, from dermatology, to histopathology as well as 3D datasets such as the OrganMNIST3D abdominal CT dataset. 

Let's have a look at the Dermatology dataset.

The data can be downloaded in various file sizes, which you can specify with the `size=128` parameter:

In [None]:
from medmnist import DermaMNIST

derma_dataset = DermaMNIST(split="train", download=True, size=128)  # Can be up to 256

The documentation for MedMNIST is actaully not very good on the official website, however it is well documented within the package itself. 

Therefore, we can use `?` at any time, as we have seen several times:

In [None]:
DermaMNIST?

---

Let's preview the data:

In [None]:
derma_dataset.montage(10)

Using the `visualise_medmnist` function we defined above, we can preview this data. The function prints a random 6 images from the dataset. Each time we call the function we will get another random 6 images.

In [None]:
visualise_medmnist(derma_dataset)

First thing to notice is that we are dealing with a dataset of skin lesions that were collected using a **dermatoscope**.

A dermatoscope is similar to a camera, and provides a non-invasive method to analyse skin lesions:

![Dermatoscope](./img/dermatoscope.jpeg)

*Source*: <https://commons.wikimedia.org/w/index.php?curid=2431174> 

The camera can fully enclose the lesion and therefore is unaffected by ambient light, and has its own powerful light that illuminates the lesion in a way that enhances the visibility of surface patterns and colours not seen by the naked eye. Dermatoscopes have a magnifier, typically around 10x. 

This close-up view provides detailed information about the morphology, skin lesions, melanomas and other types of skin cancer, moles and so on.

Let's take a look at what `derma_dataset` contains:

In [None]:
derma_dataset

By reading the text, we can see, we have 10,015 images in total. They are already split in to a train, validation, and test sets as this is a benchmarking dataset (therefore various groups might want to compare their results using the exact same test set).

So the dataset has a total of 10,015 images and they are split up as follows:

| Split      | Number     |
|------------|------------|
| Train      | 7,007      |
| Validation | 1,003      |
| Test       | 2,005      |
| **Total**  | **10,015** |

The images are in colour, as indicated by the `Number of channels: 3` for the three channels of RGB.

Last, the images are of 7 distinct diseases, and is therefore a multi-class classification problem.

The 7 diseases are shown in the `Meaning of labels` field and are as follows: 

0. actinic keratoses and intraepithelial carcinoma
1. basal cell carcinoma
2. benign keratosis-like lesions
3. dermatofibroma
4. melanoma
5. melanocytic nevi
6. vascular lesions

Maybe it would be interesting to see how each of the 7 different classes look. For example, are the melanoma vastly different in appearance to nevi? 

Note here that melanocytic nevi or nevi are benign lesions normally called moles, while melanoma are cancerous, malignant lesions that are harmful and can spread (metastasise). 

We can also take a look at how each class looks.

We can use `np.where()` to search for subsets of data by their labels:

In [None]:
labels = ['keratoses', 'basal', 'benign keratosis', 'dermatofibroma', 'melanoma', 'nevi', 'vascular']

keratoses_idx = np.where(derma_dataset.labels==0)[0]
basal_idx = np.where(derma_dataset.labels==1)[0]
benign_keratoses_idx = np.where(derma_dataset.labels==2)[0]
dermatofibroma_idx = np.where(derma_dataset.labels==3)[0]
melanoma_idx = np.where(derma_dataset.labels==4)[0]
nevi_idx = np.where(derma_dataset.labels==5)[0]
vascular_idx = np.where(derma_dataset.labels==6)[0]

Now we can preview each class to see if something interesting is going on:

In [None]:
visualise_medmnist([(derma_dataset.imgs[i], "Keratosis") for i in keratoses_idx])

In [None]:
visualise_medmnist([(derma_dataset.imgs[i], "Melanoma") for i in melanoma_idx])

In [None]:
visualise_medmnist([(derma_dataset.imgs[i], "Nevi") for i in nevi_idx])

## ABCDE

If we think back to how we might tackle the problem of classifying or diagnosing skin lesions based on features, it might make sense to understand how a dermatologist or doctor might make such a diagnosis. 

Upon visual examination by dermatologist, the following characteristics are normally noted (ABCDE test):

- A: Asymmetry - melanoma often present as asymetrical and non-uniform, non-round. 
- B: Border - melanoma have less well-defined borders than non-cancerous moles. They appear to smudge between skin tissue and the lesion itself.
- C: Colour - melanoma are darker in appearance (melanoma start in the melanocytes, and these are the cells that give your skin its pigment) and have multiple shades of colours, while non-cancerous moles tend to have one colour
- D: Diameter - melanoma are larger and will grow larger than 5 or 6mm. 
- E: Evolution - melanoma tend to change shape, size, and colour oven time, or nayb begin to itch or bleed, unlike benign moles.

See: <https://www.mayoclinic.org/diseases-conditions/melanoma/symptoms-causes/syc-20374884>

## Closer Look at the Data

Let's take a closer look at the data and determine how to train a network with it.

For example, we can take a look at the distribution of the various classes.

In [None]:
import pandas as pd

# Get the unique values and their counts/frequencies
unique, counts = np.unique(derma_dataset.labels, return_counts=True)

# Print nicely using a DataFrame
pd.DataFrame(np.asarray((unique, counts)).T, columns=['class id', 'freq.'], index=labels)

Notice the very large imbalance!

We will see how this might be an issue later.

First though, let's train a network.

First, define some values:

In [None]:
number_of_epochs = 3
batch_size = 128
lr = 0.001

# task = 'multi-task'
number_of_channels = 3
number_of_classes = 7

Load the relevant Torch libraries

In [None]:
import torch.utils.data as data
import torchvision.transforms as transforms
# from torchvision.transforms import v2 as transforms

Apply transforms, including resizing the images to 28x28:

In [None]:
# preprocessing
data_transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.RandomResizedCrop(size=(28, 28), antialias=True),
    #transforms.RandomHorizontalFlip(p=0.5),
    #transforms.RandomInvert(p=0.5),
    # transforms.Normalize(mean=[.5], std=[.5])
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),

])

In [None]:
train_dataset = DermaMNIST(split='train', transform=data_transform, download=True)
test_dataset = DermaMNIST(split='test', transform=data_transform, download=True)

In [None]:
# encapsulate data into dataloader form
train_loader = data.DataLoader(dataset=train_dataset, batch_size=batch_size, shuffle=True)
train_loader_at_eval = data.DataLoader(dataset=train_dataset, batch_size=batch_size, shuffle=False)
test_loader = data.DataLoader(dataset=test_dataset, batch_size=2*batch_size, shuffle=False)

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim

# Define a simple CNN model
class Net(nn.Module):
    def __init__(self, in_channels, num_classes):
        super(Net, self).__init__()

        self.layer1 = nn.Sequential(
            nn.Conv2d(in_channels, 16, kernel_size=3),
            nn.BatchNorm2d(16),
            nn.ReLU())

        self.layer2 = nn.Sequential(
            nn.Conv2d(16, 16, kernel_size=3),
            nn.BatchNorm2d(16),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2))

        self.layer3 = nn.Sequential(
            nn.Conv2d(16, 64, kernel_size=3),
            nn.BatchNorm2d(64),
            nn.ReLU())
        
        self.layer4 = nn.Sequential(
            nn.Conv2d(64, 64, kernel_size=3),
            nn.BatchNorm2d(64),
            nn.ReLU())

        self.layer5 = nn.Sequential(
            nn.Conv2d(64, 64, kernel_size=3, padding=1),
            nn.BatchNorm2d(64),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2))

        self.fc = nn.Sequential(
            nn.Linear(64 * 4 * 4, 128),
            nn.ReLU(),
            nn.Linear(128, 128),
            nn.ReLU(),
            nn.Linear(128, num_classes))

    def forward(self, x):
        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)
        x = self.layer5(x)
        x = x.view(x.size(0), -1)
        x = self.fc(x)
        return x

model = Net(in_channels=number_of_channels, num_classes=number_of_classes)
    
# define loss function and optimizer
#if task == "multi-label, binary-class":
#    criterion = nn.BCEWithLogitsLoss()
#else:
criterion = nn.CrossEntropyLoss()
    
optimizer = optim.SGD(model.parameters(), lr=lr, momentum=0.9)

We can check out the model's structure here:

In [None]:
torchinfo.summary(model)

And now do the actual training:

In [None]:
from tqdm.notebook import tqdm

for epoch in range(number_of_epochs):
    train_correct = 0
    train_total = 0
    test_correct = 0
    test_total = 0
    
    model.train()
    for inputs, targets in tqdm(train_loader):
        # forward + backward + optimize
        optimizer.zero_grad()
        outputs = model(inputs)
        
        #if task == 'multi-label, binary-class':
        #    targets = targets.to(torch.float32)
        #    loss = criterion(outputs, targets)
        #else:
        targets = targets.squeeze().long()
        loss = criterion(outputs, targets)
        
        loss.backward()
        optimizer.step()

Evaluate:

In [None]:
from sklearn.metrics import classification_report

y_pred_cr = []
y_true_cr = []

def test(split):
    model.eval()
    y_true = torch.tensor([])
    y_score = torch.tensor([])
    
    data_loader = train_loader if split == 'train' else test_loader

    with torch.no_grad():
        for inputs, targets in data_loader:
            outputs = model(inputs)

            #if task == 'multi-label, binary-class':
            #    targets = targets.to(torch.float32)
            #    outputs = outputs.softmax(dim=-1)
            #else:
            targets = targets.squeeze().long()
            outputs = outputs.softmax(dim=-1)
            targets = targets.float().resize_(len(targets), 1)

            y_true = torch.cat((y_true, targets), 0)
            y_score = torch.cat((y_score, outputs), 0)

        
        y_true = y_true.numpy()
        y_score = y_score.detach().numpy()
        
        evaluator = Evaluator('dermamnist', split)
        metrics = evaluator.evaluate(y_score)
    
        # print('%s  auc: %.3f  acc:%.3f' % (split, *metrics))
        print(f"Model accuracy: {metrics[1]:.3f}")
        
        # For classification report
        for score in y_score:
            y_pred_cr.append(np.argmax(score))

        for score in y_true:
            y_true_cr.append(int(score[0]))
        
#test('train')
test('test')

At first, this looks pretty good, close to 70% accuracy. 

Let's break down the numbers. 

First we print a classification report:

In [None]:
print(classification_report(y_true_cr, y_pred_cr, zero_division=0.0, target_names=labels))

Let's try a confusion matrix:

In [None]:
cm = confusion_matrix(y_true_cr, y_pred_cr)

plt.figure(figsize=(10, 8))
sns.set(font_scale=1.2)  # Adjust font size
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', 
            xticklabels=labels, 
            yticklabels=labels)

plt.xlabel('Predicted Labels')
plt.xticks(rotation=45)
plt.ylabel('True Labels')
plt.title('Confusion Matrix for Iris Dataset (SVM)')
plt.tight_layout()

## Exercise 

### Question 1

We got an average accuracy of nearly 70%.

Is this a good result?

Your answer here

### Question 2

How can the average accuracy look high even if the network is not performing well at all?

Your answer here

### Question 3

Interpret the confusion matrix and the classification report.

Your answer here

### Question 4

Would it be possible to restructure the test set to better evaluate model performance?

Your answer here

### Question 5

What might have caused the model to classify all lesions as one class (nevi)? For example, does the dataset contain some kind of class imbalance?

Your answer here

## End of Session

In the next session we will discuss the following:

- How to fine-tune pre-trained networks
- How to get more data
- How to save models
- How to publish and distribute models