## Convolutional Neural Networks

Let's try now a different dataset: **CIFAR10**

CIFAR10 contains 60,000 images from the classes of real-life objects. PyTorch already supports downloading and loading the dataset.

In [None]:
import torch
from matplotlib import pyplot as plt
from torchvision import datasets
import torchvision.transforms.functional as TF

In [None]:
# Load CIFAR10 dataset
train_dataset = datasets.CIFAR10(root=".", download=True, train=True)
test_dataset = datasets.CIFAR10(root=".", download=True, train=False)

In [None]:
# Dataset interface: len
print(f"Num. training samples: {len(train_dataset)}")
print(f"Num. test samples:     {len(test_dataset)}")

In [None]:
# Compute dataset sizes
num_train = len(train_dataset)
num_test = len(test_dataset)

In [None]:
# Dataset classes
train_dataset.classes

In [None]:
# Get image size ("size" is a property of PIL.Image)
train_dataset[0][0].size

In [None]:
# Show an image of a given class
train_dataset[train_dataset.targets.index(3)][0]

Let's split our data into training, validation and test sets.

In [None]:
# List of indexes on the training set
train_idx = list(range(num_train))

In [None]:
# List of indexes on the test set
test_idx = list(range(num_test))

In [None]:
# Import
import random

In [None]:
# Shuffle training set
random.shuffle(train_idx)

In [None]:
# Validation fraction
val_frac = 0.1
# Compute number of samples
num_val = int(num_train*val_frac)
num_train = num_train - num_val
# Split training set
val_idx = train_idx[num_train:]
train_idx = train_idx[:num_train]

### Image dataset transforms

The `torchvision.transforms` module includes additional classes specific for image pre-processing. Some of them are:

- `Resize`: resizes an image;
- `RandomCrop`: randomly crops an image (data augmentation during training);
- `RandomHorizontalFlip`: randomly flips an image (data augmentation during training);
- `CenterCrop`: crops the central area of an image (used in testing, as counterpart to `RandomCrop`);
- `Normalize`: performs standardization, given per-channel means and standard deviations.

Usually, to do data augmentation, you crop an image to an area which is slightly smaller than the full size.

In [None]:
# Import module
import torchvision.transforms as T

In [None]:
# Define single transforms
resize = T.Resize(32) # This won't do anything, since images are already at that size
random_crop = T.RandomCrop(28) # train
random_hor_flip = T.RandomHorizontalFlip() # train
center_crop = T.CenterCrop(28) # test
to_tensor = T.ToTensor()
normalize = T.Normalize(mean=(0.4914, 0.4822, 0.4465), std=(0.247, 0.243, 0.261))

In [None]:
# Compose transforms
train_transform = T.Compose([resize, random_crop, random_hor_flip, to_tensor, normalize])
test_transform = T.Compose([resize, center_crop, to_tensor, normalize])

In [None]:
# Load CIFAR10 dataset with transforms
train_dataset = datasets.CIFAR10(root=".", download=True, train=True, transform=train_transform)
test_dataset = datasets.CIFAR10(root=".", download=True, train=False, transform=test_transform)

In [None]:
# Import
from torch.utils.data import DataLoader, Subset

In [None]:
# Split train_dataset into training and validation
val_dataset = Subset(train_dataset, val_idx)
train_dataset = Subset(train_dataset, train_idx)

In [None]:
# Define loaders
train_loader = DataLoader(train_dataset, batch_size=64, num_workers=4, shuffle=True)
val_loader   = DataLoader(val_dataset,   batch_size=64, num_workers=4, shuffle=False)
test_loader  = DataLoader(test_dataset,  batch_size=64, num_workers=4, shuffle=False)

In [None]:
# Define dictionary of loaders
loaders = {"train": train_loader,
           "val": val_loader,
           "test": test_loader}

### `train` and `eval` modes

Certain layers have different behaviours when used in training and in test/evaluation/production (*inference*).

#### Dropout

During train, dropout randomly zeroes neuron outputs with probability $p$. During test, all neuron activations are used. However, this means that a neuron in the next layer, during training receives a fraction $1-p$ of the activations from the previous layer, but during test it receives all activations. This means that the total input to the neuron at test time is amplified by a factor of $\frac{1}{1-p}$.

To compensate, during test, the output of a dropout layer is multiplied by $1-p$. 

**Alternative**: scale during training, so you don't have to do anything special in inference.

#### Batch normalization

During training, batch normalization normalizes batches by their internal statistics. In this case, the larger the batch size, the better, because the statistics less stochastic. However, this also means that you need to have a batch to provide as input. PyTorch will raise an error if you try pass a single-element mini-batch to a batch normalization layer.

In inference, sometimes you only want to process a single input. In this case, the common approach is as follows:

- During training, keep track of batch statistics, and average the means and standard deviations.
- During test, normalize input based on the average statistics (not on the input batch statistics).

#### `train()` and `eval()`

In order to specify which behaviour a layer should have, the generic `nn.Module` class includes the `train()` and `eval()` methods, which alter a layer's behaviour accordingly.

For most layers (e.g. `nn.Linear`), there is no difference between the two modes, but for layers such as `Dropout2d` and `BatchNorm2d` it is important to set the layer to the correct working modality.

In general, you can call `train()` and `eval()` on the full model, and it will take care of calling the corresponding functions on its layers and sub-modules.


### CNN model

We will include **batch normalization** and **dropout**, thanks to the `nn.BatchNorm2d` and `nn.Dropout` modules. `BatchNorm2d` receives an argument that represents the number of *input channels*. `Dropout` is usually applied to fully-connected layers and receives as input the dropout probability.

In [None]:
# Import
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

In [None]:
# Define class
class CNN(nn.Module):
    
    # Constructor
    def __init__(self):
        # Call parent constructor
        super().__init__();
        # Create convolutional layers
        self.conv_layers = nn.Sequential(
            # Layer 1
            nn.Conv2d(3, 64, kernel_size=3, padding=0, stride=1),
            nn.ReLU(),
            nn.BatchNorm2d(64),
            # Layer 2
            nn.Conv2d(64, 128, kernel_size=3, padding=0, stride=1),
            nn.ReLU(),
            nn.BatchNorm2d(128),
            # Layer 3
            nn.Conv2d(128, 128, kernel_size=3, padding=0, stride=1),
            nn.ReLU(),
            nn.BatchNorm2d(128),
            nn.MaxPool2d(kernel_size=2, stride=2),
            # Layer 4
            nn.Conv2d(128, 256, kernel_size=3, padding=0, stride=1),
            nn.ReLU(),
            nn.BatchNorm2d(256),
            nn.MaxPool2d(kernel_size=2, stride=2)
        )
        # Create fully-connected layers
        self.fc_layers = nn.Sequential(
            # FC layer
            nn.Linear(4096, 1024),
            nn.Dropout(0.5),
            nn.ReLU(),
            # Classification layer
            nn.Linear(1024, 10)
        )

    # Forward
    def forward(self, x):
        x = self.conv_layers(x)
        x = x.view(x.size(0), -1)
        x = self.fc_layers(x)
        return x

### Model training

In [None]:
# Select device
print(f"CUDA is available? {torch.cuda.is_available()}")
dev = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(dev)

In [None]:
def train(epochs, dev, lr=0.001):
    try:
        # Create model
        model = CNN()
        model = model.to(dev)
        print(model)
        # Optimizer
        optimizer = optim.SGD(model.parameters(), lr=lr)
        # Initialize history
        history_loss = {"train": [], "val": [], "test": []}
        history_accuracy = {"train": [], "val": [], "test": []}
        # Process each epoch
        for epoch in range(epochs):
            # Initialize epoch variables
            sum_loss = {"train": 0, "val": 0, "test": 0}
            sum_accuracy = {"train": 0, "val": 0, "test": 0}
            # Process each split
            for split in ["train", "val", "test"]:
                # Process each batch
                for i,(input, labels) in enumerate(loaders[split]):
                    # Move to CUDA
                    input = input.to(dev)
                    labels = labels.to(dev)
                    # Reset gradients
                    optimizer.zero_grad()
                    # Compute output
                    pred = model(input)
                    loss = F.cross_entropy(pred, labels)
                    # Update loss
                    sum_loss[split] += loss.item()
                    # Check parameter update
                    if split == "train":
                        # Compute gradients
                        loss.backward()
                        # Optimize
                        optimizer.step()
                    # Compute accuracy
                    _,pred_labels = pred.max(1)
                    batch_accuracy = (pred_labels == labels).sum().item()/input.size(0)
                    # Update accuracy
                    sum_accuracy[split] += batch_accuracy
            # Compute epoch loss/accuracy
            epoch_loss = {split: sum_loss[split]/len(loaders[split]) for split in ["train", "val", "test"]}
            epoch_accuracy = {split: sum_accuracy[split]/len(loaders[split]) for split in ["train", "val", "test"]}
            # Update history
            for split in ["train", "val", "test"]:
                history_loss[split].append(epoch_loss[split])
                history_accuracy[split].append(epoch_accuracy[split])
            # Print info
            print(f"Epoch {epoch+1}:",
                  f"TrL={epoch_loss['train']:.4f},",
                  f"TrA={epoch_accuracy['train']:.4f},",
                  f"VL={epoch_loss['val']:.4f},",
                  f"VA={epoch_accuracy['val']:.4f},",
                  f"TeL={epoch_loss['test']:.4f},",
                  f"TeA={epoch_accuracy['test']:.4f},")
    except KeyboardInterrupt:
        print("Interrupted")
    finally:
        # Plot loss
        plt.title("Loss")
        for split in ["train", "val", "test"]:
            plt.plot(history_loss[split], label=split)
        plt.legend()
        plt.show()
        # Plot accuracy
        plt.title("Accuracy")
        for split in ["train", "val", "test"]:
            plt.plot(history_accuracy[split], label=split)
        plt.legend()
        plt.show()

In [None]:
train(100, dev, 0.01)