In this note book we will attempt to classify a set of neutrino interactions as either CC $\nu_\mu$, CC $\nu_e$ and NC $\nu$ events using a Multi-Layer Perceptron and a CNN.

In [None]:
import torch
import torchvision

print("torch version:", torch.__version__)
print("torchvision version:", torchvision.__version__)

Let's load the dataset. This is a sample of 30,000 images from a simple LArTPC simulation using GENIE input neutrino events containing equal numbers of CC $\nu_\mu$, CC $\nu_e$ and NC $\nu$ interactions. It will save the `.png` images to the `images` directory.

In [None]:
import os

# Load the neutrino dataset:
if not os.path.isfile('images/images.tgz'):
  !mkdir images
  !wget --no-check-certificate 'https://www.hep.phy.cam.ac.uk/~lwhitehead/genie_neutrino_images.tgz' -O images/images.tgz
  !tar -xzf images/images.tgz -C images/

# Work out the number of classes form the directory structure
root_dir = 'images/'
dir_contents = os.listdir(root_dir)
num_classes = sum(os.path.isdir(os.path.join(root_dir, item)) for item in dir_contents)

print('Dataset consists of', num_classes, 'classes')

class_names = ['CC numu', 'CC nue', 'NC']
for c in range(num_classes):
  print('Number of',class_names[c],'images:')
  !ls -1 images/$c/*.png | wc -l

We need to manipulate the input images a bit to get them into the prefered format. We also downsample them by a factor of two for convenience here (to save time for training the networks)

In [None]:
import numpy as np

# We need to define a transform to resize and scale the images when loaded
transform = torchvision.transforms.Compose([
    torchvision.transforms.Resize((112, 112)),   # reduce size (they are 224 x 224)
    torchvision.transforms.ToTensor(),           # convert to tensor [0,1]
    torchvision.transforms.Lambda(lambda x: x[2].unsqueeze(0)) # extract the w view
])

# Now we can use a torchvision dataset to load these images
dataset = torchvision.datasets.ImageFolder(root="images/", transform=transform)
print("Dataset classes:", dataset.classes)       # list of class names (sorted by folder name)

# Now we need to divide this into train and validation dataloader objects
np.random.seed(24601)
indices = np.arange(len(dataset))
np.random.shuffle(indices)

# Define split points
train_idx, val_idx, = np.split(indices, [int(0.7*len(indices))])
print("Using", len(train_idx), "images for training and", len(val_idx), "for validation")

# Create samplers
train_sampler = torch.utils.data.SubsetRandomSampler(train_idx)
val_sampler = torch.utils.data.SubsetRandomSampler(val_idx)

# Create dataloaders
batch_size = 64
train_loader = torch.utils.data.DataLoader(dataset, batch_size=batch_size, sampler=train_sampler, num_workers=2)
val_loader = torch.utils.data.DataLoader(dataset, batch_size=batch_size, sampler=val_sampler, num_workers=2)

It is always a good idea to visualise your data to make sure it looks how you expect it to. With image-based inputs then it is especially easy to do this!

In [None]:
import matplotlib.pyplot as plt

numu_event = dataset.__getitem__(0)
print('True class:', numu_event[1])
fig, axes = plt.subplots(1,3)
axes[0].imshow(numu_event[0][0])

nue_event = dataset.__getitem__(10000)
print('True class:', nue_event[1])
axes[1].imshow(nue_event[0][0])

nc_event = dataset.__getitem__(20000)
print('True class:', nc_event[1])
axes[2].imshow(nc_event[0][0])

Now that we have loaded our dataset and are happy that it looks how we expect it to then we can move on and think about our networks.

Let's start an MLP with three hidden layers with 256, 128 and 64 neurons, respectively. The final layer of the network has 3 neurons as those are the number of classes we are trying to identify in the dataset.In order to keep the number of parameters sensible, we will also downsample the image again by a factor of two.

To fill in the blanks below, you'll need the following layers:
* `torch.nn.MaxPool2d(kernel_size)` for the downsampling
* `torch.nn.Flatten()` to go from 2D -> 1D
* `torch.nn.Linear(in_features, out_features)` for the fully-connected layers
* `torch.nn.ReLU()` for the activation function
* `torch.nn.Dropout(p)` to apply fraction `p` dropout and prevent (hopefully) overtraining

I've used the `torch.nn.Sequential` class here for convenience, but I'd typically define my model as a class that inherits from `torch.nn.Module`.

In [None]:
# Let's start with an MLP
mlp_model = torch.nn.Sequential(
    # Downsample by a factor two
    ...,
    # Flatten ready for input to the Linear layer
    ...,
    # Linear layer with 256 outputs. Calculate the number of inputs!
    ...,
    # Apply the ReLU activation function
    ...,
    # Apply 50% dropout
    ...,
    # Linear layer with 256 inputs and 128 outputs
    ...,
    # Apply the ReLU activation function
    ...,
    # Another 50% dropout
    ...,
    # Linear layer with 128 inputs and 64 outputs
    ...,
    # Apply the ReLU activation function
    ...,
    # Another 50% dropout
    ...,
    # Output Linear layer with 64 inputs and n_classes outputs
    ...
    # We would expect to have a final torch.nn.Softmax activation layer here but
    # for whatever reason there is an implicit one in the PyTorch implementation
    # of the categorical cross-entropy loss function.
)

# Check that the model looks how we expect
print(mlp_model)

# A little bit of code to calculate the number of trainable parameters
n_params = sum(p.numel() for p in mlp_model.parameters() if p.requires_grad)
print('Number of trainable parameters =', n_params)

Now we need to define the loss function and optimiser. We need to use categorical cross-entropy loss and we'll choose the Adam optimiser.
* `torch.nn.CrossEntropyLoss()`
* `torch.nn.AdamW(params, lr)` where params are the model weights and lr is the learning rate

In [None]:
# Set the learning rate to 1e-3
learning_rate = 0.001
# Use categorical cross-entropy loss
mlp_loss_fn = ...
# Use the AdamW optimiser and note that in the previous block of code we saw
# how to access the model parameters
mlp_optimiser = ...

In [None]:
# This block of code allows us to use a GPU if we have one available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print('Using device', device)
mlp_model.to(device)

We are now ready to train our network. The block of code is a bit lengthy, so please refer to the comments in the code below about what we need to do. That said, here are a couple of important things:
* We run the model just by calling it like a function: `outputs = some_model(inputs)`
* Similarly to calculate the loss: `loss = some_loss_function(outputs, labels)`

In [None]:
# Define the number of epochs that we want to train for
n_epochs = 10

for epoch in range(0, n_epochs):
  # We put the model into training mode (allows weights to be learned)
  mlp_model.train()
  running_loss = 0.0
  # We iterate over each batch in the data loader
  for (images, labels) in train_loader:
    # Send the tensors to the correct device (CPU or GPU)
    images = images.to(device)
    labels = labels.to(device)

    # Forward pass - we get a prediction and calculate the loss. Much easier
    # than when we calculated it by hand in the lecture!
    outputs = ...
    loss = ...

    # Backward pass - we make sure the gradients have been cleared and then
    # do the back propagation
    mlp_optimiser.zero_grad()
    loss.backward()
    mlp_optimiser.step()

    running_loss += loss.item()

  # Now we put the network in evaluation mode for the validation sample
  running_val_loss = 0.0
  mlp_model.eval()
  # Disable gradients so that we don't waste time calculating them when not
  # in training mode
  with torch.no_grad():
    # Iterate over the batches as before
    for (images, labels) in val_loader:
      images = images.to(device)
      labels = labels.to(device)

      # All we need to do now is make the prediction and calculate the loss
      outputs = ...
      loss = ...
      running_val_loss += loss.item()

  print("Epoch", epoch, "training loss:", running_loss/len(train_loader),
        "validation loss:", running_val_loss/len(val_loader))

This next block just contains a couple of functions to allow us to find out which events we incorrectly classified and then have a look at them.

In [None]:
# Make a list of incorrect classifications
def get_incorrect_classifications(model, dataloader):
  incorrect_indices = []
  with torch.no_grad():
    for (images, labels) in dataloader:
      images = images.to(device)
      predictions = model(images).cpu().numpy()

      for i in range(len(labels)):
        prediction = np.argmax(predictions[i])
        truth = labels[i].numpy()
        if prediction != truth:
          image = images[i].cpu().numpy()
          image = image.transpose([1,2,0])
          incorrect_indices.append([image, prediction, truth])

  print('Accuracy =',1 - len(incorrect_indices)/len(val_idx))
  return incorrect_indices

def draw_event(incorrect_indices, index):
  image_to_plot = incorrect_indices[index][0]
  image_to_plot = np.clip(image_to_plot, 0.0, 1.0)
  fig, ax = plt.subplots(1, 1)
  print('Incorrect classification for image',index,
        ': predicted =',incorrect_indices[index][1],
        'with true =',incorrect_indices[index][2])
  ax.imshow(image_to_plot)

Here we get the indices of the incorrectly classifed images

In [None]:
# Get the failures
incorrect_indices = get_incorrect_classifications(mlp_model, val_loader)

And finally we can draw some examples - the value here goes from zero to `len(incorrect_indices)`

In [None]:
draw_event(incorrect_indices, 3)

Right, now that we've played a little with the MLP, lets get on with building a CNN. Many of the layers below will look familiar, but the one new one is:
* `torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride, padding)`
There are some additional arguments, but these are the important ones for now

I have chosen the values of the stride (s) and padding (s) such that the size of the image (m) is reduced by a factor of exactly two after the convultion (n). The full equation for calculating the dimensions after a convolutional layer are: $n = \frac{m - k + 2p}{s} + 1$, remembering that we are doing integer division here.

In [None]:
# Now lets define our CNN network using the torch.nn.Sequential class
cnn_model = torch.nn.Sequential(
    # The 1st convolution has 32 (5x5) filters with stride 2 and padding 2
    ...,
    # ReLU activation function
    ...,
    # The 2nd convolution has 32 (3x3) filters with stride 2 and padding 1
    ...,
    # ReLU activation function
    ...,
    # The 3rd convolution has 32 (3x3) filters with stride 2 and padding 1
    ...,
    # ReLU activation function
    ...,
    # We need to flatten our tensor before the final layers
    ...,
    # Apply some dropout with probability 0.5
    ...,
    # The final classification layer with 3 outputs. The number of inputs needs
    # to be calculated... think how big the 112 x 112 images is after the
    # strided convolutions, and remember that we had 32 filters
    ...
    # Again, the final softmax is implicit in the loss function
)

print(cnn_model)

n_params = sum(p.numel() for p in cnn_model.parameters() if p.requires_grad)
print('Number of trainable parameters =', n_params)

Set up the loss function and optimiser as we did before

In [None]:
# Prepare for training
learning_rate = 0.001
cnn_loss_fn = torch.nn.CrossEntropyLoss()
cnn_optimiser = torch.optim.AdamW(cnn_model.parameters(), lr=learning_rate)

Send the model to the GPU if necessary

In [None]:
cnn_model.to(device)

This block is the same as when we trained the MLP (you'd probably write a function that does this generically for a given model, loss and otpimiser to save repeating code if this wasn't a quick tutorial).

In [None]:
# Training loop
n_epochs = 10

for epoch in range(0, n_epochs):
  cnn_model.train()
  running_loss = 0.0
  for (images, labels) in train_loader:
    images = images.to(device)
    labels = labels.to(device)

    # Forward pass
    outputs = cnn_model(images)
    loss = cnn_loss_fn(outputs, labels)

    # Backward pass and optimisation
    cnn_optimiser.zero_grad()
    loss.backward()
    cnn_optimiser.step()

    running_loss += loss.item()

  # Validation
  running_val_loss = 0.0
  cnn_model.eval()
  with torch.no_grad():
    for (images, labels) in val_loader:
      images = images.to(device)
      labels = labels.to(device)

      # Make the predictions
      outputs = cnn_model(images)
      loss = cnn_loss_fn(outputs, labels)
      running_val_loss += loss.item()

  print("Epoch", epoch, "training loss:", running_loss/len(train_loader), "validation loss:", running_val_loss/len(val_loader))

Let's have a look at what we got wrong

In [None]:
# Get the failures
incorrect_indices = get_incorrect_classifications(cnn_model, val_loader)

And feel free to have a look at the failures!

In [None]:
draw_event(incorrect_indices, 0)