<a href="https://colab.research.google.com/github/spolivin/cifar10-website/blob/master/nn_dev/training_cnns.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Training Computer Vision models (CIFAR 10 Dataset)

This notebook is dedicated to creating and training the model that would perform well on CIFAR-10 dataset. The resulting weights of the best model will be later uploaded to HuggingFace and used for converting the model to ONNX format.

## Preparing the environment

As is well known, when training and validating the models written in PyTorch, one needs to write a training loop, paying particular attention to backpropagation. At the same time metrics should be computed in order to be able to make sense of the quality of the model.

In order to simplify training and validation of models in PyTorch, I decided to create my own Trainer package in Python that would enable doing minumum actions to launch training. Hence, I created two packages in my Github repository:

*  `pytorch_trainer` => Python package with the `Trainer` class for training models coded in PyTorch;
* `pytorch_models` => Additional package containing ready-to-use custom models I built: currently there are two models (my implementations) - `EDNet` (short for *Encoder Decoder Network*) and `ResNet` (a more lightweight analog of PyTorch's *ResNet* model with some custom additions).

Let's firstly clone the repository with these packages and import them:

In [1]:
try:
  from pytorch_trainer import Trainer
  from pytorch_models import ednet4, resnet20
except ImportError:
  !git clone https://github.com/spolivin/cifar10-website.git
  %cd cifar10-website/nn_dev
  from pytorch_trainer import Trainer
  from pytorch_models import ednet4, resnet20
  %cd ../..

Cloning into 'cifar10-website'...
remote: Enumerating objects: 52, done.[K
remote: Counting objects: 100% (52/52), done.[K
remote: Compressing objects: 100% (38/38), done.[K
remote: Total 52 (delta 19), reused 39 (delta 10), pack-reused 0 (from 0)[K
Receiving objects: 100% (52/52), 17.60 KiB | 8.80 MiB/s, done.
Resolving deltas: 100% (19/19), done.
/content/cifar10-website/nn_dev
/content


Now let's import the other required libraries for running this notebook:

In [2]:
import torch
import torch.nn as nn
import torchvision
import torchvision.transforms as transforms
from tqdm import tqdm

# Setting batch size for PyTorch's Dataloaders
BATCH_SIZE = 128
# Setting means and st deviations for normalizing image data
MEANS_NORMALIZATION =  [0.4914, 0.4822, 0.4465]
STDEVS_NORMALIZATION = [0.247,  0.243,  0.261]

## Data preparation

Now we can move to preprocessing the image dataset. Firstly, let's define which transformations we will be applying to training, validation and test sets:

In [3]:
# Transformations for training data
train_transform = transforms.Compose(
    [
        # Data Augmentation
        transforms.RandomCrop(32, padding=4),
        transforms.RandomHorizontalFlip(),
        # Conversion to tensor
        transforms.ToTensor(),
        # Normalization
        transforms.Normalize(mean=MEANS_NORMALIZATION, std=STDEVS_NORMALIZATION),
    ]
)

# Transformations for validation/test data
test_transform = transforms.Compose(
    [
        transforms.ToTensor(),
        transforms.Normalize(mean=MEANS_NORMALIZATION, std=STDEVS_NORMALIZATION),
    ]
)

Now, we load the data with specifying the transformations we want to apply.

In [4]:
# Downloading training data
training_set = torchvision.datasets.CIFAR10(
    root="./data",
    train=True,
    transform=train_transform,
    download=True,
)

# Downloading test data
testing_set = torchvision.datasets.CIFAR10(
    root="./data",
    train=False,
    transform=test_transform,
    download=True,
)

Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ./data/cifar-10-python.tar.gz


100%|██████████| 170M/170M [00:03<00:00, 49.1MB/s]


Extracting ./data/cifar-10-python.tar.gz to ./data
Files already downloaded and verified


The next step is to divide the dataset into batches using PyTorch's Dataloaders:

In [5]:
# Dividing test set into validation and test sets
validation_set = torch.utils.data.Subset(testing_set, torch.arange(1000, len(testing_set))) # The other testing examples -> validation set
testing_set_2 = torch.utils.data.Subset(testing_set, torch.arange(1000)) # The first 1000 testing examples -> new test set

# Create data loaders for the datasets
training_dataloader = torch.utils.data.DataLoader(
    training_set, batch_size=BATCH_SIZE, shuffle=True
)
validation_dataloader = torch.utils.data.DataLoader(
    validation_set, batch_size=BATCH_SIZE, shuffle=False
)
testing_dataloader = torch.utils.data.DataLoader(
    testing_set_2, batch_size=BATCH_SIZE, shuffle=False
)

print(f"Training set: Dataset ({len(training_dataloader.dataset)} examples), Dataloader ({len(training_dataloader)} batches)")
print(f"Validation set: Dataset ({len(validation_dataloader.dataset)} examples), Dataloader ({len(validation_dataloader)} batches)")
print(f"Testing set: Dataset ({len(testing_dataloader.dataset)} examples), Dataloader ({len(testing_dataloader)} batches)")

Training set: Dataset (50000 examples), Dataloader (391 batches)
Validation set: Dataset (9000 examples), Dataloader (71 batches)
Testing set: Dataset (1000 examples), Dataloader (8 batches)


## Application of Custom Trainer for PyTorch

In this section we will see the functionality of the trainer mentioned above. We will train the models in two specifications:

* Encoder-Decoder
* ResNet

### Encoder-Decoder model - `EDNet`

We can use `ednet4()` function from `pytorch_models` in order to load the prebuilt Encoder-Decoder model with 4 layers:

In [6]:
encoder_decoder_model = ednet4()
encoder_decoder_model.eval()

EDNet(
  (encoder): Encoder(
    (encoder_blocks): Sequential(
      (0): Sequential(
        (0): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (1): ReLU()
        (2): MaxPool2d(kernel_size=3, stride=3, padding=0, dilation=1, ceil_mode=False)
      )
      (1): Sequential(
        (0): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (1): ReLU()
        (2): MaxPool2d(kernel_size=3, stride=3, padding=0, dilation=1, ceil_mode=False)
      )
    )
  )
  (decoder): Decoder(
    (decoder_blocks): Sequential(
      (0): Sequential(
        (0): Linear(in_features=576, out_features=250, bias=True)
        (1): Sigmoid()
      )
    )
    (last): Linear(in_features=250, out_features=10, bias=True)
  )
  (flatten): Flatten(start_dim=1, end_dim=-1)
)

Next, we set the loss function appropriate for the image classification problem at hand as well as the optimizer:

In [7]:
# Defining the loss function
loss_func = nn.CrossEntropyLoss()

# Defining optimizer
optimizer_params = {
    "lr": 0.25,
    "momentum": 0.75,
}
optimizer = torch.optim.SGD(encoder_decoder_model.parameters(), **optimizer_params)

Now we can instantiate an object of `Trainer` class by specifying:

* Built model => `model`
* Loss function => `criterion`
* Optimizer => `optimizer`
* Training dataloader => `train_loader`
* Validation dataloader => `valid_loader`
* CPU or GPU-based training => `train_on`

In [8]:
trainer_ed = Trainer(
    model=encoder_decoder_model,
    criterion=loss_func,
    optimizer=optimizer,
    train_loader=training_dataloader,
    valid_loader=validation_dataloader,
    train_on="cuda",
)

We would like to be able to adjust the learning rate during training using Torch scheduler. We can also use the Trainer to do this and in order to be able to tell it to apply scheduler on a certain level, we do the following:

In [9]:
# Setting scheduler
scheduler_params = {
    "step_size": 4,
    "gamma": 0.5,
}
trainer_ed.set_scheduler(
    scheduler_class=torch.optim.lr_scheduler.StepLR,
    scheduler_params=scheduler_params,
    level="epoch",
)

Now we can finally launch the trainer. We will train during 10 epochs with a set seed of 42 for random number generator and a progress bar that would show the status of training and validation. At the end of each validation step we will see loss and accuracy in the following format:

`loss=(  {train_loss}, {valid_loss}  )`

`acc=(  {train_acc}, {valid_acc}  )`

In [10]:
trainer_ed.run(
    epochs=10,
    seed=42,
    enable_tqdm=True,
)

Epoch 1/10 [Training]: 100%|██████████| 391/391 [00:23<00:00, 16.44batches/s, lr=0.25]
Epoch 1/10 [Validation]: 100%|██████████| 71/71 [00:02<00:00, 24.89batches/s, loss=(1.678, 1.4434), acc=(0.3788, 0.4912)]
Epoch 2/10 [Training]: 100%|██████████| 391/391 [00:23<00:00, 16.59batches/s, lr=0.25]
Epoch 2/10 [Validation]: 100%|██████████| 71/71 [00:02<00:00, 29.64batches/s, loss=(1.2959, 1.093), acc=(0.5341, 0.6143)]
Epoch 3/10 [Training]: 100%|██████████| 391/391 [00:22<00:00, 17.20batches/s, lr=0.25]
Epoch 3/10 [Validation]: 100%|██████████| 71/71 [00:02<00:00, 30.18batches/s, loss=(1.1408, 1.0461), acc=(0.5955, 0.6283)]
Epoch 4/10 [Training]: 100%|██████████| 391/391 [00:22<00:00, 17.25batches/s, lr=0.25]
Epoch 4/10 [Validation]: 100%|██████████| 71/71 [00:02<00:00, 29.77batches/s, loss=(1.0663, 0.9913), acc=(0.6232, 0.6514)]
Epoch 5/10 [Training]: 100%|██████████| 391/391 [00:22<00:00, 17.03batches/s, lr=0.125]
Epoch 5/10 [Validation]: 100%|██████████| 71/71 [00:02<00:00, 29.21batches

Based on the progress of the training, we have managed to reach about 75% on the validation set. Let's save this checkpoint in order be able to quickly come back to it if necessary. Method `save_checkpoint()` will save the last trained epoch, model state and optimizer state.

In [11]:
trainer_ed.save_checkpoint(path="ednet4_ckpt_10.pt")

#### Continuation of training of ED Net

Let's see if we will see metrics improvements if we continue training for 5 additional epochs without scheduler:

In [12]:
# Turning off scheduler
trainer_ed.reset_scheduler()

When continuing training we will see that the trainer is able to remember the last trained epoch and conveniently show the consequent epochs in a progress bar:

In [13]:
trainer_ed.run(
    epochs=5,
    seed=42,
    enable_tqdm=True,
)

Epoch 11/15 [Training]: 100%|██████████| 391/391 [00:21<00:00, 17.90batches/s, lr=0.0625]
Epoch 11/15 [Validation]: 100%|██████████| 71/71 [00:03<00:00, 23.03batches/s, loss=(0.7408, 0.7165), acc=(0.7415, 0.7517)]
Epoch 12/15 [Training]: 100%|██████████| 391/391 [00:21<00:00, 17.96batches/s, lr=0.0625]
Epoch 12/15 [Validation]: 100%|██████████| 71/71 [00:03<00:00, 22.86batches/s, loss=(0.7218, 0.7346), acc=(0.7497, 0.7466)]
Epoch 13/15 [Training]: 100%|██████████| 391/391 [00:22<00:00, 17.57batches/s, lr=0.0625]
Epoch 13/15 [Validation]: 100%|██████████| 71/71 [00:03<00:00, 23.23batches/s, loss=(0.7078, 0.6998), acc=(0.7532, 0.7582)]
Epoch 14/15 [Training]: 100%|██████████| 391/391 [00:22<00:00, 17.72batches/s, lr=0.0625]
Epoch 14/15 [Validation]: 100%|██████████| 71/71 [00:02<00:00, 24.19batches/s, loss=(0.6889, 0.7082), acc=(0.759, 0.7547)]
Epoch 15/15 [Training]: 100%|██████████| 391/391 [00:22<00:00, 17.57batches/s, lr=0.0625]
Epoch 15/15 [Validation]: 100%|██████████| 71/71 [00:02

### ResNet model with 20 layers - `ResNet`

We will now do exactly the same but with the second model:

In [14]:
resnet_20 = resnet20()
resnet_20.eval()

ResNet(
  (conv1): Conv2d(3, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
  (bn1): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (layer1): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (shortcut): Sequential()
    )
    (1): BasicBlock(
      (conv1): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=

We can see that we basically have 3 layers here where each one consists of 3 blocks with 2 convolution layers inside. In other words, we have 18 layers here, plus additional convolution layer at the very beginning and a FC layer at the end, thus totalling 20 layers.

I decided to opt for this more lightweight model, since the out-of-the-box ResNet models from PyTorch (ResNet18 or ResNet54) are specifically tailored to image data of 224x224x3 and may not be suitable to CIFAR-10 dataset with images of 32x32x3. By using my implementation, we can save more information when going through convolution layers and achieve better results, hence potentially improving on Encoder-Decoder model above.

In [15]:
loss_func = nn.CrossEntropyLoss()

optimizer = torch.optim.SGD(resnet_20.parameters(), **optimizer_params)

In [16]:
trainer_resnet = Trainer(
    model=resnet_20,
    criterion=loss_func,
    optimizer=optimizer,
    train_loader=training_dataloader,
    valid_loader=validation_dataloader,
    train_on="cuda",
)

trainer_resnet.set_scheduler(
    scheduler_class=torch.optim.lr_scheduler.StepLR,
    scheduler_params=scheduler_params,
    level="epoch",
)

In [17]:
trainer_resnet.run(
    epochs=10,
    seed=42,
    enable_tqdm=True,
)

Epoch 1/10 [Training]: 100%|██████████| 391/391 [00:33<00:00, 11.77batches/s, lr=0.25]
Epoch 1/10 [Validation]: 100%|██████████| 71/71 [00:02<00:00, 26.03batches/s, loss=(1.7261, 1.8801), acc=(0.3528, 0.385)]
Epoch 2/10 [Training]: 100%|██████████| 391/391 [00:32<00:00, 12.02batches/s, lr=0.25]
Epoch 2/10 [Validation]: 100%|██████████| 71/71 [00:02<00:00, 26.39batches/s, loss=(1.3062, 1.2349), acc=(0.5244, 0.5636)]
Epoch 3/10 [Training]: 100%|██████████| 391/391 [00:32<00:00, 12.11batches/s, lr=0.25]
Epoch 3/10 [Validation]: 100%|██████████| 71/71 [00:02<00:00, 26.59batches/s, loss=(1.0489, 1.0369), acc=(0.6277, 0.6384)]
Epoch 4/10 [Training]: 100%|██████████| 391/391 [00:32<00:00, 11.91batches/s, lr=0.25]
Epoch 4/10 [Validation]: 100%|██████████| 71/71 [00:02<00:00, 25.35batches/s, loss=(0.8597, 1.081), acc=(0.6977, 0.6583)]
Epoch 5/10 [Training]: 100%|██████████| 391/391 [00:33<00:00, 11.64batches/s, lr=0.125]
Epoch 5/10 [Validation]: 100%|██████████| 71/71 [00:02<00:00, 24.72batches

In [18]:
trainer_resnet.save_checkpoint("resnet20_ckpt_10.pt")

#### Continuation of training of ResNet-20

In [19]:
trainer_resnet.reset_scheduler()

In [20]:
trainer_resnet.run(
    epochs=5,
    seed=42,
    enable_tqdm=True,
)

Epoch 11/15 [Training]: 100%|██████████| 391/391 [00:32<00:00, 12.02batches/s, lr=0.0625]
Epoch 11/15 [Validation]: 100%|██████████| 71/71 [00:02<00:00, 25.85batches/s, loss=(0.4295, 0.5848), acc=(0.8506, 0.8039)]
Epoch 12/15 [Training]: 100%|██████████| 391/391 [00:32<00:00, 11.95batches/s, lr=0.0625]
Epoch 12/15 [Validation]: 100%|██████████| 71/71 [00:02<00:00, 25.19batches/s, loss=(0.4203, 0.4963), acc=(0.8538, 0.8368)]
Epoch 13/15 [Training]: 100%|██████████| 391/391 [00:32<00:00, 11.88batches/s, lr=0.0625]
Epoch 13/15 [Validation]: 100%|██████████| 71/71 [00:02<00:00, 25.53batches/s, loss=(0.4001, 0.5117), acc=(0.8606, 0.8353)]
Epoch 14/15 [Training]: 100%|██████████| 391/391 [00:32<00:00, 12.17batches/s, lr=0.0625]
Epoch 14/15 [Validation]: 100%|██████████| 71/71 [00:03<00:00, 21.51batches/s, loss=(0.3869, 0.5834), acc=(0.8654, 0.8092)]
Epoch 15/15 [Training]: 100%|██████████| 391/391 [00:31<00:00, 12.26batches/s, lr=0.0625]
Epoch 15/15 [Validation]: 100%|██████████| 71/71 [00:0

## Testing

From the result of trainings above, we have determined that ResNet-20 model significantly outperforms EDNet model. Let's now check both models on test sets:

In [21]:
@torch.no_grad
def test_torch_model(
    trainer: Trainer,
    dataloader: torch.utils.data.DataLoader = testing_dataloader,
    device: str = "cuda",
) -> dict[str, float]:
    """Tests the model saved as Trainer attribute on a set.

    Args:
      trainer (Trainer): Instance of Trainer class.
      dataloader (torch.utils.data.DataLoader, optional): Torch Dataloader.
      device (str, optional): Indicator of CPU or CUDA-based testing.

    Returns:
      dict[str, float]: Dict of loss and accuracy.
    """
    # Setting counters
    test_loss = 0
    correct_ans = 0
    for _, batch in enumerate(tqdm(dataloader)):
      # Separating batch into examples and labels + moving to device
      x_batch, y_batch = batch
      x_batch, y_batch = x_batch.to(torch.device(device)), y_batch.to(torch.device(device))
      # Computing batch loss and outputs of the last layer
      loss, outputs = trainer(x_batch, y_batch)

      # Adding loss to counter
      test_loss += loss
      # Computing number of correct predictions
      predictions = torch.argmax(outputs, dim=1)
      correct = (predictions == y_batch).sum().cpu().item()
      correct_ans += correct

    # Computing epoch-level metrics
    test_loss /= len(dataloader)
    test_loss = test_loss.item()
    test_accuracy = correct_ans / len(dataloader.dataset)

    test_metrics = {
        "test_loss": round(test_loss, 4),
        "test_accuracy": round(test_accuracy, 4)
    }

    return test_metrics

In [22]:
test_torch_model(trainer=trainer_ed)

100%|██████████| 8/8 [00:00<00:00, 29.90it/s]


{'test_loss': 0.6721, 'test_accuracy': 0.774}

In [23]:
test_torch_model(trainer=trainer_resnet)

100%|██████████| 8/8 [00:00<00:00, 21.19it/s]


{'test_loss': 0.4681, 'test_accuracy': 0.848}

## Saving best model weights

In [24]:
best_model = trainer_resnet.model

torch.save(best_model.state_dict(), "resnet20_weights.pth")