# 3. Using the Convolution Operation in Neural Networks

### About this notebook

This notebook was used in the 50.039 Deep Learning course at the Singapore University of Technology and Design.

**Author:** Matthieu DE MARI (matthieu_demari@sutd.edu.sg)

**Version:** 1.1 (06/07/2023)

**Requirements:**
- Python 3 (tested on v3.11.4)
- Scipy (tested on v1.9.3)
- Torch (tested on v2.0.1+cu118)
- Torchvision (tested on v0.15.2+cu118)

### Imports and CUDA

In [1]:
# Torch
import torch
import torchvision
from torch.utils.data import Dataset
from torchvision import datasets
from torchvision.transforms import ToTensor, Compose, Normalize
from torchvision.datasets import MNIST
import torch.nn.functional as F
import torch.nn as nn

In [2]:
# Use GPU if available, else use CPU
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(device)

cuda


### MNIST Dataset

As in the notebooks from the previous week.

In [3]:
# Define transform to convert images to tensors and normalize them
transform_data = Compose([ToTensor(),
                          Normalize((0.1307,), (0.3081,))])

# Load the data
batch_size = 256
train_dataset = MNIST(root='./mnist/', train = True, download = True, transform = transform_data)
test_dataset = MNIST(root='./mnist/', train = False, download = True, transform = transform_data)
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size = batch_size, shuffle = True)
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size = batch_size, shuffle = False)

### Using Conv2d in our model

As we have seen, this convolution operation is key when processing images. We could typically use it to detect edges, sharpen the image so it can be processed better, etc. Eventually, we hope that it will allow the neural network to recognize shapes and features in the images, and eventuall use this information to perform classification on our images.

We will therefore build our first Convolutional Neural Network (CNN). This network will have the same logic as the previous DNN from last week, but it will replace the first few fully connected layers with Conv2d operations instead. We are hoping that the neural network will then be able to figure out which kernel values to use, to eventually detect appropriate and useful features for classification, using backpropagation. The final layers (here two of them), will consist of fully connected layers like before, which will eventually produce a vector of 10 values, one for each class.

Between the last Conv2d (which produces a 2D image) and the first fully connected vector (which expects a 1D vector), we will implement a **flattening** of the image, basically reshaping the 2D image into a 1D vector, using ```x = x.view(-1, 64*28*28)```.

Finally, you will have probably noticed that the number of channels in the ouput images of our convolutions has changed. Our original images only had one channel (greyscale), but the output of the first convolution will have 32 channels, as shown in the Conv2d call ```self.conv1 = nn.Conv2d(1, 32, kernel_size = 3, stride = 1, padding = 1)```.

This typically is a way for us to have 32 convolution operations in parallel, each with their own kernels.

Having a larger number of output channels in a Conv2D layer allows for the model to learn more complex and diverse features from the input data. With more channels, the model can learn a wider range of filters, which can in turn detect more nuanced features in the input. This can lead to improved performance on tasks such as image classification or object detection. However, it's important to note that increasing the number of channels also increases the number of parameters in the model, which can make it more computationally expensive and may also lead to overfitting if not properly regularized.

To summarize, this will be our CNN.

Input $ \rightarrow $ Conv2D(1, 32, 3, 1, 1) $ \rightarrow $ Cond2D(32, 34, 3, 1, 1) $ \rightarrow $ Flatten $ \rightarrow $ Linear(50176, 128) $ \rightarrow $ Linear(128, 10) $ \rightarrow $ Output

The full network implementation is shown below.

In [4]:
class MNIST_CNN(nn.Module):
    def __init__(self):
        super(MNIST_CNN, self).__init__()
        
        # Two convolutional layers
        self.conv1 = nn.Conv2d(1, 32, kernel_size = 3, stride = 1, padding = 1)
        self.conv2 = nn.Conv2d(32, 64, kernel_size = 3, stride = 1, padding = 1)

        # Two fully connected layers
        self.fc1 = nn.Linear(64*28*28, 128) # 64*28*28 = 50176
        self.fc2 = nn.Linear(128, 10)

    def forward(self, x):
        # Display initial shape
        print("Initial: ", x.shape)
        
        # Pass input through first convolutional layer
        x = self.conv1(x)
        x = F.relu(x)
        print("After conv1: ", x.shape)

        # Pass output of first conv layer through
        # second convolutional layer
        x = self.conv2(x)
        x = F.relu(x)
        print("After conv2: ", x.shape)

        # Flatten output of second conv layer
        x = x.view(-1, 64*28*28)
        print("After flatten: ", x.shape)

        # Pass flattened output through first Linear layer
        x = self.fc1(x)
        x = F.relu(x)
        print("After FC1: ", x.shape)

        # Pass output of first Linear layer to second linear layer
        x = self.fc2(x)
        print("After FC2: ", x.shape)
        return x

We can then create and use our CNN model, and observe how our input will be processed by the network, having its size eventually changing after each operation. When designing a CNN, it is very important to spend some time checking the sizes of the different outputs produced by each layer, making sure it will connect smoothly with what the next layer expects.

In [5]:
model = MNIST_CNN()
print(model.modules)

<bound method Module.modules of MNIST_CNN(
  (conv1): Conv2d(1, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (conv2): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (fc1): Linear(in_features=50176, out_features=128, bias=True)
  (fc2): Linear(in_features=128, out_features=10, bias=True)
)>


In [6]:
for inputs, labels in train_loader:
    out = model(inputs)
    break

Initial:  torch.Size([256, 1, 28, 28])
After conv1:  torch.Size([256, 32, 28, 28])
After conv2:  torch.Size([256, 64, 28, 28])
After flatten:  torch.Size([256, 50176])
After FC1:  torch.Size([256, 128])
After FC2:  torch.Size([256, 10])


### What's next?

We have implemented a simple CNN and a forward method for it. In the next notebook, we will implement a backpropagation and training function for this model and will compare its performance with a full DNN model.