# Convolutional Neural Networks for Computer Vision

In a fully connected neural network, which is also called a dense layer, every node from one layer is connected to every other node in the subsequent layer. A CNN leverages the spatial structure between the pixels to reduce the number of connections between two layers, significantly improving the speed of training while at the same time reducing the model parameters.

This is a fully conected network:

![cnn_image](../media/cnn.png)

This is a convolutional network:

![cnn_image](../media/cnn2.png)

A CNN picks up features from an input image using a filter; a CNN with a sufficient number of filters detects various features in the image. These filters become more and more sophisticated in detecting complex features as we move more and more toward the later layers. Convolutional networks use these filters and map them one by one to create a map of feature occurrences.

In a convolution layer, we slide a filter matrix over the entire image matrix from left to right and from top to bottom, and we take the dot product of the filter, with this patch spanning the size of the filter over the image channel.

In [2]:
import torch
import torch.nn as nn

We will apply 2D convolution to an image:

In [3]:
nn.Conv2d(3, 16, 3)

Conv2d(3, 16, kernel_size=(3, 3), stride=(1, 1))

Add padding of the desired size to the edge of an image:

In [8]:
nn.Conv2d(3, 16, 3, padding=1) # parameters: number channels, number of output channels / filters, kernel size

Conv2d(3, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))

Create a non square kernel (filter) by using the following code:

In [5]:
nn.Conv2d(3, 16, (3,4), padding=1)

Conv2d(3, 16, kernel_size=(3, 4), stride=(1, 1), padding=(1, 1))

Add stride to our convolution using the following code:

In [6]:
nn.Conv2d(3, 16, 3, stride=2)

Conv2d(3, 16, kernel_size=(3, 3), stride=(2, 2))

Have unequal stride and padding along the horizontal and vertical directions:

In [7]:
nn.Conv2d(3, 16, (3,4), stride=(3,3), padding=(1,2))

Conv2d(3, 16, kernel_size=(3, 4), stride=(3, 3), padding=(1, 2))

## Exploring pooling

The pooling layer is used to reduce the spatial dimension of an input, preserving its depth. As we move from the initial layer to the later layers in a CNN, we want to identify more conceptual meaning in the image compared to actual pixel by pixel information, and so we want to identify and keep key pieces of information from the input
and throw away the rest. A pooling layer helps us do that.

In [9]:
max_pool = nn.MaxPool2d(3, stride=1)

Let's define a tensor to perform the pooling on: 

In [10]:
a = torch.FloatTensor(3,5,5).random_(0, 10)
a

tensor([[[2., 7., 1., 9., 3.],
         [8., 9., 5., 0., 7.],
         [6., 1., 8., 6., 5.],
         [0., 3., 4., 2., 8.],
         [4., 4., 5., 1., 8.]],

        [[4., 7., 9., 2., 6.],
         [7., 3., 3., 6., 5.],
         [9., 1., 2., 1., 6.],
         [6., 2., 2., 7., 2.],
         [7., 1., 7., 8., 3.]],

        [[8., 9., 7., 7., 6.],
         [2., 4., 4., 5., 9.],
         [6., 3., 5., 4., 4.],
         [6., 0., 9., 2., 3.],
         [0., 4., 0., 3., 6.]]])

Apply pooling to the tensor:

In [11]:
max_pool(a)

tensor([[[9., 9., 9.],
         [9., 9., 8.],
         [8., 8., 8.]],

        [[9., 9., 9.],
         [9., 7., 7.],
         [9., 8., 8.]],

        [[9., 9., 9.],
         [9., 9., 9.],
         [9., 9., 9.]]])

We can now try average pooling in a similar fashion:

In [13]:
avg_pool = nn.AvgPool2d(3, stride=1)
avg_pool(a)

tensor([[[5.2222, 5.1111, 4.8889],
         [4.8889, 4.2222, 5.0000],
         [3.8889, 3.7778, 5.2222]],

        [[5.0000, 3.7778, 4.4444],
         [3.8889, 3.0000, 3.7778],
         [4.1111, 3.4444, 4.2222]],

        [[5.3333, 5.3333, 5.6667],
         [4.3333, 4.0000, 5.0000],
         [3.6667, 3.3333, 4.0000]]])

## Exploring transforms

PyTorch cannot process an image pixel directly and needs to have the contents as tensors. To get around this, torchvision, being a specialized library for vision and image-related tasks, provides a module called transform, which provides APIs for converting pixels into tensors, normalizing standard scaling, and so on.

In [14]:
from torchvision import transforms

In [15]:
transforms.ToTensor()

ToTensor()

Let's normalize the image tensor:

In [16]:
transforms.Normalize((0.5,),(0.5,))

Normalize(mean=(0.5,), std=(0.5,))

To resize an image:

In [17]:
transforms.Resize(10)

Resize(size=10, interpolation=bilinear, max_size=None, antialias=None)

To crop the image:

In [18]:
transforms.CenterCrop(10)

CenterCrop(size=(10, 10))

To pad the image tensors:

In [19]:
transforms.Pad(1, 0)

Pad(padding=1, fill=0, padding_mode=constant)

Chain multiple transforms:

In [20]:
transforms.Compose([
    transforms.CenterCrop(10),
    transforms.ToTensor(),
])

Compose(
    CenterCrop(size=(10, 10))
    ToTensor()
)

## Performing data augmentation

Data augmentation prevents models from memorizing the limited amount of data rather than making generalizations about the observed data. Data augmentation increases the diversity of data for training the model by creating variations from the original images without actually collecting new data.

In [21]:
import torchvision

Crop a section of the image at random:

In [22]:
transforms.RandomCrop(10)

RandomCrop(size=(10, 10), padding=None)

Flip the image horizontally:

In [23]:
transforms.RandomHorizontalFlip(p=0.3)

RandomHorizontalFlip(p=0.3)

Adding brightness, contrast, saturation, and hue variations:

In [24]:
transforms.ColorJitter(0.25, 0.25, 0.25, 0.25)

ColorJitter(brightness=[0.75, 1.25], contrast=[0.75, 1.25], saturation=[0.75, 1.25], hue=[-0.25, 0.25])

Add rotational variation:

In [25]:
transforms.RandomRotation(10)

RandomRotation(degrees=[-10.0, 10.0], interpolation=nearest, expand=False, fill=0)

Compose all the transformations:

In [26]:
transforms.Compose([
        transforms.RandomRotation(10),
        transforms.ToTensor(),
])

Compose(
    RandomRotation(degrees=[-10.0, 10.0], interpolation=nearest, expand=False, fill=0)
    ToTensor()
)

## Loading image data

We will use the CIFAR-10 dataset, which consists of 60,000 32 x 32 pixel colored images for each of the 10 classes in the dataset. These classes are Airplane, Automobile, Bird, Cat, Deer, Dog, Frog, Horse, Ship, and Truck.

In [29]:
from torchvision import datasets
from torchvision import transforms

Create a transformation pipeline:

In [30]:
transformations = transforms.Compose([
    transforms.RandomHorizontalFlip(),
    transforms.RandomRotation(20),
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5),(0.5, 0.5, 0.5))
])

Use the datasets module to create the training dataset:

In [31]:
train_data = datasets.CIFAR10('CIFAR10', train=True, download=True, transform=transformations)

Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to CIFAR10/cifar-10-python.tar.gz


  0%|          | 0/170498071 [00:00<?, ?it/s]

Extracting CIFAR10/cifar-10-python.tar.gz to CIFAR10


In [32]:
test_data = datasets.CIFAR10('CIFAR10', train=False, download=True, transform=transformations)

Files already downloaded and verified


In [33]:
len(train_data), len(test_data)

(50000, 10000)

Create a validation set from our training set; for this, we will make an import from the torch module:

In [35]:
from torch.utils.data.sampler import SubsetRandomSampler
import numpy as np

In [36]:
validation_size = 0.2
training_size = len(train_data)
indices = list(range(training_size))
np.random.shuffle(indices)
index_split = int(np.floor(training_size * validation_size))

In [37]:
validation_indices, training_indices = indices[:index_split], indices[index_split:]

Use the subset random sampler from torch:

In [38]:
training_sample = SubsetRandomSampler(training_indices)
validation_sample = SubsetRandomSampler(validation_indices)

In [39]:
batch_size = 16

In [40]:
from torch.utils.data.dataloader import DataLoader

Create training, validation, and test dataset batches:

In [41]:
train_loader = DataLoader(train_data, batch_size=batch_size, sampler=training_sample)
valid_loader = DataLoader(train_data, batch_size=batch_size, sampler=validation_sample)
test_loader = DataLoader(train_data, batch_size=batch_size)

## Defining the CNN architecture

In [42]:
import torch.nn as nn
import torch.nn.functional as F

In [43]:
class CNN(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(3, 16, 3, padding=1) # the n of input channels was 3 (RGB), and the number of output
                                                    #channels was defined as 16 and had a square kernel size of 3
        self.conv2 = nn.Conv2d(16, 32, 3, padding=1) # 16 input channels and 32 output channels with a kernel 3x3
        self.conv3 = nn.Conv2d(32, 64, 3, padding=1) #  32 input channels and 64 output channels with a 3 x 3
        self.pool = nn.MaxPool2d(2, 2) # kernel size of 2 and a stride of 2
        self.linear1 = nn.Linear(64 * 4 * 4, 512) # 1,024 inputs (64x4x4 tensor after the max pool) and 512 outputs
        self.linear2 = nn.Linear(512, 10) # 512 inputs and 10 outputs
        self.dropout = nn.Dropout(p=0.3)
    
    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = self.pool(F.relu(self.conv3(x)))
        x = x.view(-1, 64 * 4 * 4) # to flatten the three dimensions of the tensor into one dimension
        x = self.dropout(x)
        x = F.relu(self.linear1(x)) 
        x = self.dropout(x)
        x = self.linear2(x) 
        return x

In [44]:
model = CNN()
model

CNN(
  (conv1): Conv2d(3, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (conv2): Conv2d(16, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (conv3): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (linear1): Linear(in_features=1024, out_features=512, bias=True)
  (linear2): Linear(in_features=512, out_features=10, bias=True)
  (dropout): Dropout(p=0.3, inplace=False)
)

## Training an image classifier

Check for the device that we need to run the model:

In [48]:
import torch.optim as optim

In [46]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
device.type

'cpu'

Move the model to the available device:

In [49]:
model = model.to(device)

Add the cross-entropy loss and optimizer:

In [51]:
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.01)

In [53]:
n_epochs = 10

Start the training loop:

In [54]:
for epoch in range(1, n_epochs+1):
    train_loss = 0.0
    valid_loss = 0.0
    
    model.train() # set the model in train mode in the loop
    for batch_idx, (data, target) in enumerate(train_loader): #  loop through each batch
        data, target = data.to(device), target.to(device)
        optimizer.zero_grad()
        output = model(data) # pass data to the model in the loop
        loss = criterion(output, target) #  get the loss
        loss.backward()
        optimizer.step() # update the model parameters
        train_loss += loss.item()*data.size(0) # update the total loss
        
    model.eval() #  switch the model into evaluation mode
    for batch_idx, (data, target) in enumerate(valid_loader): # iterate through the validation set batches
        data, target = data.to(device), target.to(device)
        output = model(data)
        loss = criterion(output, target)
        valid_loss += loss.item()*data.size(0)
        
    train_loss = train_loss/len(train_loader.sampler) #  calculate the loss per epoch
    valid_loss = valid_loss/len(valid_loader.sampler)
    # print the model performance in each epoch
    print(f'| Epoch: {epoch:02} | Train Loss: {train_loss:.3f} | Val. Loss: {valid_loss:.3f} |') 

| Epoch: 01 | Train Loss: 2.106 | Val. Loss: 1.851 |
| Epoch: 02 | Train Loss: 1.697 | Val. Loss: 1.519 |
| Epoch: 03 | Train Loss: 1.526 | Val. Loss: 1.428 |
| Epoch: 04 | Train Loss: 1.428 | Val. Loss: 1.343 |
| Epoch: 05 | Train Loss: 1.351 | Val. Loss: 1.251 |
| Epoch: 06 | Train Loss: 1.283 | Val. Loss: 1.193 |
| Epoch: 07 | Train Loss: 1.230 | Val. Loss: 1.138 |
| Epoch: 08 | Train Loss: 1.194 | Val. Loss: 1.115 |
| Epoch: 09 | Train Loss: 1.147 | Val. Loss: 1.048 |
| Epoch: 10 | Train Loss: 1.110 | Val. Loss: 1.049 |
