# DAML4 notes
## Week 10 - Convolutional neural networks

In [None]:
# Torch has an annoying tendancy to crash on MacOS
# This line helps, but please just run it on Notable instead!
import os
os.environ["KMP_DUPLICATE_LIB_OK"] = "TRUE"

<hr style="border:2px solid black"> </hr>

Last week, we looked at MLPs. These take in a vector input $\mathbf{x}\in\mathbb{R}^{D}$ and pass it through multiple layers. Each layer takes in some vector $\mathbf{h}^{(l-1)}\in\mathbb{R}^{H_{l-1}}$, multiplies it by a matrix, adds a bias, and then applies an elementwise nonlinearity:

$$\mathbf{h}^{(l)} = g(\mathbf{W}^{(l) }\mathbf{h}^{(l-1)} + \mathbf{b}^{(l)}) \,\,\text{for}\,\,l=0,1,\dots$$

An image is represented as a 3D tensor $\mathbf{x}\in\mathbb{R}^{C\times H \times W}$. By vectorising an image for use in an MLP we are actually throwing away lots of useful spatial information. In the lecture, we looked at convolutional neural networks (ConvNets). These take in images as 3D tensors, and use a mixture of convolutional, pooling, and dense (fully connected) layers.

### Convolutional layers

A convolutional layer is quite similar to a fully connected in an MLP, in that it consists of a linear transformation followed by a non-linearity. However, it takes in a 3D tensor $\mathbf{h}^{(l-1)}\in\mathbb{R}^{C_{l-1}\times H_{l-1}\times W_{l-1}}$ instead of a vector, and the linear transformation is a **2D convolution**:

$$\mathbf{h}^{(l)} = g(\mathbf{W}^{(l) }\ast \mathbf{h}^{(l-1)} + \mathbf{b}^{(l)})\,\,\text{for}\,\,l=0,1,\dots$$

We can use [Conv2d](https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html) from Pytorch to perform this convolution (and add a bias). Let's first create a dummy minibatch of 100 images $\{\mathbf{x}^{(n)}\}_{n=0}^{100-1}$, where each image has 3 channels, height 32, and width 32 i.e. $\mathbf{x}\in\mathbb{R}^{3\times 32\times 32}$. We can store this whole minibatch in a 4D tensor $\mathbf{X}\in\mathbb{R}^{100\times 3\times 32\times 32}$.

In [None]:
import torch  # The whole pytorch package
from torch import nn  # the nn package for making neural networks
import torch.nn.functional as F  # Useful functions

# Create a minibatch of 100 dummy images, each with 3 channels, height 32 and width 32
X = torch.rand(size=(100, 3, 32, 32))
print("-------")
print(f"the shape of X is {X.shape}")
print("We store minibatches of images in 4D tensors")
print(f"We have {X.shape[0]} images")
print(f"Each has {X.shape[1]} channels")
print(f"Each has a height of {X.shape[2]}")
print(f"Each has a width of {X.shape[3]}")
print("-------")

Now we'll apply a (random) 2D convolution to each data point in this minibatch. Recall from the lecture that this consists of $C_{out}$ filters each of size  $C_{in}\times k\times k$. $C_{in}$ must match the number of input channels, and $k$ is the kernel size of the filter which is usually 3. Let's create such a convolution where $C_{out}=5$. We can then look at the shape of its weights and bias.

In [None]:
# Create a 2D convolution
# Padding pads the input with some empty pixels. Don't worry too much about this
# It's just so the input and output have the same width and height
conv = nn.Conv2d(in_channels=3, out_channels=5, kernel_size=3, padding=1)

print("-------")
print(f"the shape of the conv weights is {conv.weight.shape}")
print("We store the conv weights in a 4D tensor")
print(f"There are {conv.weight.shape[0]} convolutional filters")
print(f"Each filter has {conv.weight.shape[1]} channels")
print(f"Each filter has a height of {conv.weight.shape[2]}")
print(f"Each filter has a width of {conv.weight.shape[3]}")
print("-------")


print("-------")
print(f"the shape of the conv bias is {conv.bias.shape[0]}")
print("This simply adds a constant value on to each ouput channel")
print("-------")

Finally, let's apply Conv2D to our input and look at the ouput shape.

In [None]:
# Apply a convolution
Y = conv(X)
print("-------")
print(f"the shape of the output is {Y.shape}")
print(f"We have {Y.shape[0]} 3D tensors")
print(f"Each has {Y.shape[1]} channels")
print(f"Each has a height of {Y.shape[2]}")
print(f"Each has a width of {Y.shape[3]}")
print("-------")

### Pooling layers

Pooling layers reduce the spatial input of the input. They build translation invariance into ConvNets, and allow us to effectively keep computational cost at a similar level throughout the network by increasing channels alongside decreasing spatial resolution. Let's use max pooling on our dummy input to half its spatial resolution.

In [None]:
pool = nn.MaxPool2d(kernel_size=2)

print(f"Shape before pooling is {X.shape}")
Z = pool(X)
print(f"Shape after pooling is {Z.shape}")

## A ConvNet for MNIST

Now that we have covered all the constituent parts of a ConvNet, I am going to provide code that trains the ConvNet introduced in the lecture on MNIST. Make sure you are happy with how this works before attending the lab.

First, let's load in MNIST, scale it to between 0 and 1, and reshape it so data points are 3D tensors (images) instead of vectors. Note that MNIST is greyscale so there is only 1 colour channel.

In [None]:
# Use sklearn to read in mnist
from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split

mnist = fetch_openml("mnist_784", as_frame=False, cache=False)

In [None]:
X = mnist.data.astype("float32")
y = mnist.target.astype("int64")

# Scale X to between 0 and 1
X /= 255.0


# X here is a 70000 times 784 matrix
# We want to store it as images, so it should be 70000 times 1 times 28 times 28
# The 1 is because the images are black and white.

X = X.reshape(-1, 1, 28, 28)

# Split into train and test
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.25, random_state=42
)

# Check shape
print(f"shape of X_train is {X_train.shape}")

Now let's create the ConvNet we walked through in the lecture.

In [None]:
import torch  # The whole pytorch package
from torch import nn  # the nn package for making neural networks
import torch.nn.functional as F  # Useful functions
import numpy as np

torch.manual_seed(1)  # Fix RNG
np.random.seed(2)


class ConvNetModule(nn.Module):
    def __init__(self):
        super(ConvNetModule, self).__init__()

        self.conv0 = nn.Conv2d(1, 32, kernel_size=3, padding=1)
        self.maxpool0 = nn.MaxPool2d(kernel_size=2)

        self.conv1 = nn.Conv2d(32, 64, kernel_size=3, padding=1)
        self.maxpool1 = nn.MaxPool2d(kernel_size=2)

        self.linear = nn.Linear(3136, 10)  # 1600 = number channels * width * height

    def forward(self, x):

        h0 = torch.relu(self.conv0(x))
        h0_pool = self.maxpool0(h0)

        h1 = torch.relu(self.conv1(h0_pool))
        h1_pool = self.maxpool1(h1)

        # flatten over channel, height and width = 64 * 7 * 7 = 3136
        phi = h1_pool.view(-1, 3136)

        fx = self.linear(phi)

        return fx

Let's use skorch to train the ConvNet. Unfortunately, this will take a few minutes on a CPU.

In [None]:
from skorch import NeuralNetClassifier

In [None]:
torch.manual_seed(0)

cnn = NeuralNetClassifier(
    ConvNetModule,
    max_epochs=5,
    lr=0.002,
    optimizer=torch.optim.Adam,
    criterion=nn.CrossEntropyLoss,
)
cnn.fit(X_train, y_train);


Finally, we'll evaluate the accuracy of this ConvNet on the test set.

In [None]:
y_pred = cnn.predict(X_test)
from sklearn.metrics import accuracy_score
accuracy_score(y_test, y_pred)


<hr style="border:2px solid black"> </hr>

#### Written by Elliot J. Crowley and &copy; The University of Edinburgh 2022-23