# Dog Breed Classifier

## Accessing the Dog Breed Recognition dataset

I have created a directory called "dog-breed-recognition". There, I have put the directory called "dogs" as refering to the dataset itself. For training, it is only used the samples contained at "train" directory.

In [1]:
from google.colab import drive
drive.mount('/content/drive/')
root = '/content/drive/My Drive/Colab Notebooks/dog-breed-recognition'

Mounted at /content/drive/


## Importing basic Python libraries

In [2]:
import os
import sys
import tqdm
import random
import copy

from PIL import Image
import numpy as np

## Importing PyTorch library

For GPU usage, go to "Edit > Notebook Settings" and make sure the hardware accelerator is set to GPU.

In [3]:
import torch
import torchvision
from torchvision import transforms

# Creating a PyTorch device, so that inputs, outputs and models are apllied to
#   the available GPU
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')

## Splitting dataset into training, validation and test

Considering a split ratio between these three categories, the instances of each class (dog breed) are randomly distributed.

`dataset_labels[<PHASE>]` is a list of `(class_index, instance_index)` occurences, where:
- `class_index` refers to the index of its dog breed;
- `instance_index` refers to the index of the instance at the current dog breed list.

In [4]:
def get_dataset_split_labels(dataset_path, split_ratio):
  '''
  Calculates the split ratio of the training, validation and test sets.

  Parameters
  ----------
  dataset_path : str
    root of the dataset
  split_ratio : list<float>
    ratios for the dataset splitting

  Returns
  -------
  dataset_labels : dict
    list of instance labels for each set
  '''

  # Attribute `split_ratio` to each set individually
  train_ratio, val_ratio, test_ratio = split_ratio

  # `dataset_labels` encodes the list of instance labels for each set
  dataset_labels = { 'train': [], 'val': [], 'test': [] }

  # `dataset_path` divides the dataset in a list of directories, where each
  #   directory represent a class (dog breed). When listing the presented
  #   directories in `dataset_path`, `classes` will contain the list of dog
  #   breeds presented in the dataset
  classes = sorted(os.listdir(dataset_path))

  # Iterate through each existing class (`curr_class`) and its index (`i_class`)
  #   (the dataset splitting is done for each class individually)
  for i_class, curr_class in enumerate(classes):

    # `class_path` appends the root of the dataset (`dataset_path`) to the
    #   current class' directory name
    class_path = os.path.join(dataset_path, curr_class)

    # `instances` list all images' filename of the current class directory
    instances = sorted(os.listdir(class_path))

    # `n_instances` computes the number of instances of the presented class
    n_instances = len(instances)

    # `labels` encodes a list of pairs of class index and instance index, which
    #   will be used when loading the dataset later
    labels = [(i_class, label) for label in list(range(n_instances))]

    # randomize the labels occurences for dataset splitting afterwards
    random.shuffle(labels)

    # Calculate the number of instances of each split for the current class
    train_l = int(n_instances * train_ratio)
    val_l = int(n_instances * val_ratio)
    test_l = int(n_instances * test_ratio)

    # Access the current labels list (`labels`) according to the number of
    #   instances of each split and apply to the list of labels of each split
    #   (these three sets are disjoints)
    curr_train_labels = labels[:train_l]
    curr_val_labels = labels[train_l:train_l + val_l]
    curr_test_labels = labels[train_l + val_l:train_l + val_l + test_l]
    
    # Apply the current labels lists to the final list containing all classses
    dataset_labels['train'] += curr_train_labels
    dataset_labels['val'] += curr_val_labels
    dataset_labels['test'] += curr_test_labels

  return dataset_labels

## Creating the dataset loader

For reading an entry from the dataset from an index:
- `class_index` and `instance_index` are obtained from the previously generated labels;
- The image path is obtained (`img_path`);
- The image is read and converted to RGB channels (`img`), just in case the original image has a transparency channel (which will be not used) or the original image is in grayscale;
- The network input (`x`) is generated by preprocessing the image. This preprocess depends of the current phase (training, validation or test), since the training phase deals with data augmentation;
- The network output (`y`) is generated by creating a vector of `self.n_classes`-zeros, where `self.n_classes` is the number of classes (dog breeds). In `y`, it is attributed a value of 1 to its `class_index`'th position (one-hot encoding).

In [5]:
class ImageDataset(torch.utils.data.Dataset):
  """
  A class to read the dataset instances.

  Attributes
  ----------
  labels : list<tuple<int,int>>
    list of dataset instances that can be accessed
  transform : torch.transforms
    input preprocessing pipeline
  classes : list<str>
    existing classes (dog breeds) on the dataset
  n_classes : int
    number of existing classes (dog breeds) on the dataset
  instances_path : list<list<str>>
    Path to the instances of the dataset presented in `labels`

  Data descriptors
  ----------------
  __getitem__
    Gets the model's input and output from a dataset instance's index.

  __len__
    Gets the number of samples presented in the dataset.
  """

  def __init__(self, dataset_path, labels, transform):
    '''
    Constructs all the attributes for the dataset object.

    Parameters
    ----------
    dataset_path : str
      root of the dataset
    labels : list<tuple<int,int>>
      list of dataset instances that can be accessed
    transform : torch.transforms
      input preprocessing pipeline
    '''
    
    self.labels = labels
    self.transform = transform

    # `dataset_path` divides the dataset in a list of directories, where each
    #   directory represent a class (dog breed). When listing the presented
    #   directories in `dataset_path`, `self.classes` will contain the list of
    #   dog breeds presented in the dataset
    self.classes = sorted(os.listdir(dataset_path))
    self.n_classes = len(self.classes)

    # `classes_path` appends the root of the dataset (`dataset_path`) to the
    #   directory name of all classes (`classes`)
    classes_path = [os.path.join(dataset_path, c) for c in self.classes]
    self.instances_path = [[os.path.join(class_path, instance)
        for instance in sorted(os.listdir(class_path))]
        for class_path in classes_path]

  def __getitem__(self, index):
    '''
    Gets the model's input and output from a dataset instance's index.

    Parameters
    ----------
    index : int
      index of the instance to be accessed

    Returns
    -------
    x : torch.Tensor
      tensor refering to the preprocessed sample to be used as an model input
    y : torch.Tensor
      tensor refering to the class (dog breed) to be used as an model output
    '''
    
    # Access the indexes of the class (`class_index`) and instance
    #   (`instance_index`) present in the labels
    class_index, instance_index = self.labels[index]

    # `img_path` refers to the filepath of the image refering to (`class_index`,
    #   `instance_index`)
    img_path = self.instances_path[class_index][instance_index]

    # Read image (`img`) and convert to red-green-blue channels (RGB), ensuring
    #   the input will have 3 channels
    img = Image.open(img_path).convert('RGB')

    # `x` refers to the image when the preprocessing pipeline (`self.transform`)
    #   is applied to the image (`img`)
    x = self.transform(img)
    
    # `y` refers to the output tensor. It is in the one-hot encoding format,
    #   containing `self.n_classes` zeroed-values.
    y = torch.LongTensor(self.n_classes)
    y.zero_()
    
    # In the `class_index`'th position, the value is set to 1, refering that
    #   this sample belongs to the `class_index` class
    y[class_index] = 1
    
    return x, y

  def __len__(self):
    '''
    Gets the number of samples presented in the dataset.
    
    Returns
    -------
    l : int
      the length of the dataset
    '''
    l = len(self.labels)

    return l

## Creating the CNN model architecture

The model shall have a fixed input size with 3 channels (corresponding to the red, green and blue channels). Also, the model shall output a vector with a size of `n_classes`, where each value corresponds to the confidence of the input image (a dog) of representing each class (dog breed).

A ResNet50-based model was used; its last layer (which corresponds to a fully connecter layer) is replaced by another FCL whose output size correspond to `n_classes`.

In [6]:
def classifier_model(n_classes):
  '''
  Generates a new CNN ResNet50-based model.

  Parameters
  ----------
  n_classes : int
    number of classes (dog breeds) to be outputted

  Returns
  -------
  x : torch.nn
    the model
  '''
  
  # First, `x` is a new ResNet50 CNN model, containing pre-trained weights from
  #   ImageNet
  x = torchvision.models.resnet50(pretrained=True)

  # Change the final fully connected layer so that the output size matches the
  #   desired `n_classes` size. Also, apply sigmoid function
  x.fc = torch.nn.Sequential(
      torch.nn.Linear(2048, n_classes),
      torch.nn.Sigmoid())

  return x

## Model training and validation algorithm

In [7]:
def train(model, criterion, optimizer, scheduler, n_epochs):
  '''
  Trains the model.
  
  Parameters
  ----------
  model : torch.nn
    the model to be trained
  criterion : torch.nn
    the model's loss metric
  optimizer : torch.optim
    pptimization algorithm
  scheduler : torch.optim
    updater of model's training parameters
  n_epochs : int
    number of iterations of the training

  Returns
  -------
  model : torch.nn
    the trained model
  best_acc : float
    the accuracy of the trained model
  '''

  # Keep track of the best achieved accuracy and the corresponding model weights
  best_weights = copy.deepcopy(model.state_dict())
  best_acc = 0.0

  # The model iterates a number of times (`n_epochs`)
  for epoch in range(n_epochs):

    # Each epoch includes the model weights tuning phase (indicated by the
    #   `train` flag) and the model validation (indicated by the `val` flag)
    for phase in ['train', 'val']:

      # Change the model mode (training or validation)
      if phase == 'train':
        model.train()
      else:
        model.eval()

      # `epoch_loss` computes the loss sum (according to the used `criterion`)
      #   of the model iteratively as batches are read
      epoch_loss = 0.0

      # `epoch_acc` computes the accuracy of the model when predicting the class
      #   of the input iteratively as batches are read 
      epoch_acc = 0.0

      # `n_seen_samples` computes the number of input-output samples that were
      #   already read in the current epoch-phase
      n_seen_samples = 0

      # Using tqdm to iteratively keep track on the number of iterated batches
      #   on the console
      dataloader = tqdm.tqdm(dataloaders[phase], total=len(dataloaders[phase]),
          position=0, leave=True)
      
      # The dataloader refering to the current phase is iterated, in order to
      #   access all pairs of input-output, denoted by `(x, y_gt)`
      for x, y_gt in dataloader:

        # `curr_batch_size` computes the number of samples in the current batch
        #   (this may vary when the current batch is the last one)
        curr_batch_size = x.shape[0]

        # Increment the number of seen samples on the current epoch-phase
        n_seen_samples += curr_batch_size

        # Reset current gradients
        optimizer.zero_grad()

        # Pass the input and ouput tensors to the used device (GPU or CPU)
        x = x.to(device)
        y_gt = y_gt.to(device)
        
        # Calculate the prediction (`y_pred`) of the current model according to
        #   the input `x`
        y_pred = model(x)

        # `label_gt` denotes the class indexes of each instance from the batch
        #   of ground-truths `y_gt` by getting the index of its max value
        _, label_gt = torch.max(y_gt, 1)

        # `label_pred` denotes the class indexes of each instance from the batch
        #   of predictions `y_pred` by getting the index of its max value
        _, label_pred = torch.max(y_pred, 1)

        # `loss` encodes the difference between the predictions and the
        #   ground-truth of the current batch, according to the used `criterion`
        loss = criterion(y_pred, label_gt)

        # Update the model weights, if the current phase is `train` (if the 
        #   current phase is `val`, the model weights are not changed)
        if phase == 'train':
          loss.backward()
          optimizer.step()

        # Add the current batch's loss to `epoch_loss`
        epoch_loss += loss.item() * curr_batch_size
        
        # Add the current batch's correct predictions to `epoch_acc`
        epoch_acc += torch.sum(label_pred == label_gt).item()

        # Calculate the current epoch-phase's average loss and correct
        #   predictions rate (`curr_loss` and `curr_acc`)
        curr_loss = epoch_loss / n_seen_samples
        curr_acc = epoch_acc / n_seen_samples

        # Iteratively print on console the number of iterated batches, as well
        #   as the current loss and accuracy
        dataloader.set_postfix(Epoch='%s/%s' % (epoch+1, n_epochs),
            Loss=curr_loss, Acc=curr_acc, refresh=True)

      # Update the learning rate in the `train` phase as epochs goes by
      if phase == 'train':
        scheduler.step()
      
      # Calculate the final loss and accuracy of the current epoch
      epoch_loss /= len(datasets[phase])
      epoch_acc /= len(datasets[phase])

      # Save new weights if the best accuracy is achieved in the `val` phase
      if phase == 'val' and epoch_acc > best_acc:
        best_acc = epoch_acc
        best_weights = copy.deepcopy(model.state_dict())

  # Apply best weights to the model
  model.load_state_dict(best_weights)

  return model, best_acc

## Model testing algorithm

In [8]:
def test(model):
  '''
  Tests the model on an unseen set of samples.

  Parameters
  ----------
  model : torch.nn
    the model to be evaluated

  Returns
  -------
  acc : float
    the final accuracy of the model
  '''

  # Set the model mode to `eval`, as the model weights are not updated
  model.eval()

  # Keep track of the current accuracy (`acc`)
  acc = 0.0

  # `n_seen_samples` computes the number of input-output samples that were
  #   already read
  n_seen_samples = 0

  # Using tqdm to iteratively keep track on the number of iterated batches on
  #   the console
  dataloader = tqdm.tqdm(dataloaders['test'], total=len(dataloaders['test']),
      position=0, leave=True)
  
  # The dataloader refering to the current phase is iterated, in order to
  #   access all pairs of input-output, denoted by `(x, y_gt)`

  for x, y_gt in dataloader:
  
    # `curr_batch_size` computes the number of samples in the current batch
    #   (this may vary when the current batch is the last one)
    curr_batch_size = x.shape[0]

    # Increment the number of seen samples
    n_seen_samples += curr_batch_size


    # Pass the input and ouput tensors to the used device (GPU or CPU)
    x = x.to(device)
    y_gt = y_gt.to(device)

    # Calculate the prediction (`y_pred`) of the current model according to the
    #   input `x`
    y_pred = model(x)


    # `label_gt` denotes the class indexes of each instance from the batch of
    #   ground-truths `y_gt` by getting the index of its max value
    _, label_gt = torch.max(y_gt, 1)
    
    # `label_pred` denotes the class indexes of each instance from the batch of
    #   predictions `y_pred` by getting the index of its max value
    _, label_pred = torch.max(y_pred, 1)

    # Add the current batch's correct predictions to `epoch_acc`
    acc += torch.sum(label_pred == label_gt).item()

    # Calculate the current correct predictions rate (`curr_acc`)
    curr_acc = acc / n_seen_samples
    
    # Iteratively print on console the number of iterated batches, as well as
    #   the current accuracy
    dataloader.set_postfix(Acc=curr_acc, refresh=True)

  # Calculate the final accuracy of the test
  acc /= len(datasets['test'])

  return acc

## Sets of preprocessing operations

Each set refers to each phase (training, validation and test)

- For training phase, samples are randomly cropped to a fixed (224,224) size, randomly flipped horizontally, applied to a torch.Tensor and normalized according to a set of RGB mean and standard deviation values;

- For validation phase, samples are centrally cropped to a fixed (224,224) size, applied to a torch.Tensor and normalized according to a set of RGB mean and standard deviation values;

- For testing phase, samples are resized to a fixed (256,256) size, centrally cropped to a fixed (224,224) size, applied to a torch.Tensor and normalized according to a set of RGB mean and standard deviation values.

In [9]:
dataset_transforms = {
    'train': transforms.Compose([
        transforms.RandomResizedCrop(224),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])]),
    'val': transforms.Compose([
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])]),
    'test': transforms.Compose([
        transforms.Resize(256),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])]) }

## Setting training parameters

In [10]:
# Path containing the dog breed training dataset
dataset_path = os.path.join(root, 'dogs', 'train')

# Split ratio of the training, validation and test portions (these portions sums
#   up to 1.0)
split_ratio = [0.7, 0.15, 0.15]

# Get dataset labels, splitted for each phase
dataset_labels = get_dataset_split_labels(dataset_path, split_ratio)

# Creating PyTorch dataset instance for each phase (training, validation and
#   test)
datasets = {
    'train': ImageDataset(dataset_path, dataset_labels['train'],
        dataset_transforms['train']),
    'val': ImageDataset(dataset_path, dataset_labels['val'],
        dataset_transforms['val']),
    'test': ImageDataset(dataset_path, dataset_labels['test'],
        dataset_transforms['test']) }

# Dog breeds in the training dataset
classes = datasets['train'].classes

# Number of dog breeds in the training dataset
n_classes = datasets['train'].n_classes

# Number of epochs for the model to be trained
n_epochs = 10

# Batch size for each phase
batch_size = 32

# Number of workers for multiprocessing the data loading
n_workers = 8

# Instantiate CNN classifier model
model = classifier_model(n_classes)
model = torch.jit.script(model).to(device)

# Use Cross Entropy loss
criterion = torch.nn.CrossEntropyLoss()

# Use SGD as the optimizer
optimizer = torch.optim.SGD(model.parameters(), lr=0.001, momentum=0.9)

# Use a learning rate scheduler
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=7, gamma=0.1)

# Create data loaders for the training, validation and testing steps
#   (Also shuffling samples for unbiased performance)
dataloaders = {
    'train': torch.utils.data.DataLoader(datasets['train'],
        batch_size=batch_size, num_workers=n_workers, shuffle=True),
    'val': torch.utils.data.DataLoader(datasets['val'], batch_size=batch_size,
        num_workers=n_workers, shuffle=True),
    'test': torch.utils.data.DataLoader(datasets['test'], batch_size=batch_size,
        num_workers=n_workers, shuffle=True) }

# Path where to save the trained model
trained_model_ckpt_path = os.path.join(root, 'models', 'classifier.pth')

Downloading: "https://download.pytorch.org/models/resnet50-19c8e357.pth" to /root/.cache/torch/hub/checkpoints/resnet50-19c8e357.pth


HBox(children=(FloatProgress(value=0.0, max=102502400.0), HTML(value='')))




## Executing training

In [None]:
# Generate a trained model (`trained_model`), as well as its accuracy
#   (`val_acc`)
trained_model, val_acc = train(model, criterion, optimizer, scheduler, n_epochs)

# Save model to `trained_model_ckpt_path`
torch.save({
    'state_dict': trained_model.state_dict(),
    'acc': val_acc,
    'classes': classes,
    'n_classes': n_classes }, trained_model_ckpt_path)

100%|██████████| 379/379 [02:10<00:00,  2.91it/s, Acc=0.132, Epoch=1/10, Loss=4.5]
100%|██████████| 80/80 [00:14<00:00,  5.51it/s, Acc=0.469, Epoch=1/10, Loss=4.32]
100%|██████████| 379/379 [02:05<00:00,  3.02it/s, Acc=0.496, Epoch=2/10, Loss=4.27]
100%|██████████| 80/80 [00:14<00:00,  5.58it/s, Acc=0.709, Epoch=2/10, Loss=4.09]
100%|██████████| 379/379 [02:06<00:00,  2.99it/s, Acc=0.634, Epoch=3/10, Loss=4.11]
100%|██████████| 80/80 [00:14<00:00,  5.54it/s, Acc=0.75, Epoch=3/10, Loss=3.97]
100%|██████████| 379/379 [02:07<00:00,  2.98it/s, Acc=0.671, Epoch=4/10, Loss=4.02]
100%|██████████| 80/80 [00:14<00:00,  5.62it/s, Acc=0.756, Epoch=4/10, Loss=3.9]
100%|██████████| 379/379 [02:06<00:00,  2.99it/s, Acc=0.681, Epoch=5/10, Loss=3.97]
100%|██████████| 80/80 [00:14<00:00,  5.54it/s, Acc=0.772, Epoch=5/10, Loss=3.87]
100%|██████████| 379/379 [02:06<00:00,  3.00it/s, Acc=0.69, Epoch=6/10, Loss=3.93]
100%|██████████| 80/80 [00:14<00:00,  5.51it/s, Acc=0.777, Epoch=6/10, Loss=3.84]
100%|███

## Executing testing

In [None]:
# Instantiating the architecture of the model
trained_model = classifier_model(n_classes)

# Load weights from the trained model
trained_model.load_state_dict(torch.load(trained_model_ckpt_path)['state_dict'])
trained_model.eval()
trained_model = torch.jit.script(trained_model).to(device)

# Perform testing and getting test accuracy (`test_acc`)
test_acc = test(trained_model)

100%|██████████| 80/80 [00:17<00:00,  4.50it/s, Acc=0.849]
