# Dog Breed Classifier

## Accessing the Dog Breed Recognition dataset

I have created a directory called "dog-breed-recognition". There, I have put the directory called "dogs" as refering to the dataset itself. For training, it is only used the samples contained at "train" directory.

In [None]:
from google.colab import drive
drive.mount('/content/drive/')
root = '/content/drive/My Drive/Colab Notebooks/dog-breed-recognition'

Drive already mounted at /content/drive/; to attempt to forcibly remount, call drive.mount("/content/drive/", force_remount=True).


## Importing basic Python libraries

In [None]:
import os
import sys
import tqdm
import random
import copy

from PIL import Image
import numpy as np
import cv2
import matplotlib.pyplot as plt

## Importing PyTorch library

For GPU usage, go to "Edit > Notebook Settings" and make sure the hardware accelerator is set to GPU.

In [None]:
import torch
import torchvision
from torchvision import transforms

# Remember to activate GPU
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')

if torch.cuda.is_available():
  torch.cuda.get_device_name(0)

'Tesla P4'

## Splitting dataset into training, validation and test

Considering a split ratio between these three categories, the instances of each class (dog breed) are randomly distributed.

`dataset_labels[<PHASE>]` is a list of `(class_index, instance_index)` occurences, where:
- `class_index` refers to the index of its dog breed;
- `instance_index` refers to the index of the instance at the current dog breed list.

In [None]:
def get_dataset_split_labels(dataset_path, split_ratio):
  train_ratio, val_ratio, test_ratio = split_ratio

  dataset_labels = { 'train': [], 'val': [], 'test': [] }

  classes = os.listdir(dataset_path)
  for i_class, curr_class in enumerate(classes):
    class_path = os.path.join(dataset_path, curr_class)
    instances = os.listdir(class_path)
    n_instances = len(instances)

    labels = [(i_class, label) for label in list(range(n_instances))]
    random.shuffle(labels)

    train_l = int(n_instances * train_ratio)
    val_l = int(n_instances * val_ratio)
    test_l = int(n_instances * test_ratio)

    curr_train_labels = labels[:train_l]
    curr_val_labels = labels[train_l:train_l + val_l]
    curr_test_labels = labels[train_l + val_l:train_l + val_l + test_l]
    
    dataset_labels['train'] += curr_train_labels
    dataset_labels['val'] += curr_val_labels
    dataset_labels['test'] += curr_test_labels

  return dataset_labels

## Creating the dataset loader

For reading an entry from the dataset from an index:
- `class_index` and `instance_index` are obtained from the previously generated labels;
- The image path is obtained (`img_path`);
- The image is read and converted to RGB channels (`img`), just in case the original image has a transparency channel (which will be not used) or the original image is in grayscale;
- The network input (`x`) is generated by preprocessing the image. This preprocess depends of the current phase (training, validation or test), since the training phase deals with data augmentation;
- The network output (`y`) is generated by creating a vector of `self.n_classes`-zeros, where `self.n_classes` is the number of classes (dog breeds). In `y`, it is attributed a value of 1 to its `class_index`'th position (one-hot encoding).

In [None]:
class ImageDataset(torch.utils.data.Dataset):
  def __init__(self, dataset_path, labels, transform):
    self.dataset_path = dataset_path
    self.labels = labels
    self.transform = transform

    classes = os.listdir(self.dataset_path)
    self.n_classes = len(classes)

    self.classes_path = [os.path.join(self.dataset_path, c) for c in classes]
    self.instances_path = [[os.path.join(class_path, instance)
        for instance in os.listdir(class_path)]
      for class_path in self.classes_path]

  def __getitem__(self, index):
    class_index, instance_index = self.labels[index]
    img_path = self.instances_path[class_index][instance_index]
    img = Image.open(img_path).convert('RGB')

    x = self.transform(img)
    
    y = torch.LongTensor(self.n_classes)
    y.zero_()
    y[class_index] = 1

    return x, y

  def __len__(self):
    return len(self.labels)

## Creating the CNN model architecture

The model shall have a fixed input size with 3 channels (corresponding to the red, green and blue channels). Also, the model shall output a vector with a size of `n_classes`, where each value corresponds to the confidence of the input image (a dog) of representing each class (dog breed).

A ResNet50-based (a greater) and a MobileNetV2-based (a more efficient) model were used; in both, their last layer (which corresponds to a fully connecter layer) is replaced by another FCL whose output size correspond to `n_classes`.

In [None]:
def resnet50_based_model(n_classes):
  x = torchvision.models.resnet50(pretrained=True)
  x.fc = torch.nn.Linear(2048, n_classes)

  return x

def mobilenetv2_based_model(n_classes):
  x = torchvision.models.mobilenet_v2(pretrained=True)
  x.fc = torch.nn.Linear(1280, n_classes)

  return x

## Training the model



In [None]:
def train(model, criterion, optimizer, scheduler, n_epochs):
  best_weights = copy.deepcopy(model.state_dict())
  best_acc = 0.0

  for epoch in range(n_epochs):
    for phase in ['train', 'val']:
      if phase == 'train':
        model.train()
      else:
        model.eval()

      epoch_loss = 0.0
      epoch_acc = 0.0
      n_seen_samples = 0

      dataloader = tqdm.tqdm(dataloaders[phase], total=len(dataloaders[phase]),
          position=0, leave=True)
      
      # Iterate through every pair of (input, output) from the dataset
      for x, y_gt in dataloader:
        curr_batch_size = x.shape[0]
        n_seen_samples += curr_batch_size

        optimizer.zero_grad()

        x = x.to(device)
        y_gt = y_gt.to(device)
        y_pred = model(x)

        _, label_gt = torch.max(y_gt, 1)
        _, label_pred = torch.max(y_pred, 1)

        loss = criterion(y_pred, label_gt)

        if phase == 'train':
          loss.backward()
          optimizer.step()

        epoch_loss += loss.item() * curr_batch_size
        epoch_acc += torch.sum(label_pred == label_gt).item()

        curr_loss = epoch_loss / n_seen_samples
        curr_acc = epoch_acc / n_seen_samples

        dataloader.set_postfix(Epoch='%s/%s' % (epoch+1, n_epochs),
            Loss=curr_loss, Acc=curr_acc, refresh=True)

      if phase == 'train':
        scheduler.step()
      
      epoch_loss /= len(datasets[phase])
      epoch_acc /= len(datasets[phase])

      if phase == 'val' and epoch_acc > best_acc:
        best_acc = epoch_acc
        best_weights = copy.deepcopy(model.state_dict())

  model.load_state_dict(best_weights)

  return model, best_acc

## Testing the model

In [None]:
def test(model):
  model.eval()
  acc = 0.0
  n_seen_samples = 0

  dataloader = tqdm.tqdm(dataloaders['test'], total=len(dataloaders['test']),
      position=0, leave=True)
  for x, y_gt in dataloader:
    curr_batch_size = x.shape[0]
    n_seen_samples += curr_batch_size

    x = x.to(device)
    y_gt = y_gt.to(device)
    y_pred = model(x)

    _, label_gt = torch.max(y_gt, 1)
    _, label_pred = torch.max(y_pred, 1)

    acc += torch.sum(label_pred == label_gt).item()

    curr_acc = acc / n_seen_samples
    
    dataloader.set_postfix(Acc=curr_acc, refresh=True)

  acc /= len(datasets['test'])

  return acc

## Sets of preprocessing operations

Each set refers to each phase (training, validation and test)

In [None]:
dataset_transforms = {
    'train': transforms.Compose([
        transforms.RandomResizedCrop(224),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])]),
    'val': transforms.Compose([
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])]),
    'test': transforms.Compose([
        transforms.Resize(256),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])]) }

In [None]:
# Path containing the dog breed training dataset
dataset_path = os.path.join(root, 'dogs', 'train')

# Split ratio of the training, validation and test portions (these portions sums
#   up to 1.0)
split_ratio = [0.7, 0.15, 0.15]

# Number of dog breeds in the training dataset
n_classes = datasets['train'].n_classes

# Number of epochs for the model to be trained
n_epochs = 5

# Batch size for the training, validation and test step
batch_size = 32

n_workers = 8

# Get dataset labels, splitted for each phase (training, validation and test)
dataset_labels = get_dataset_split_labels(dataset_path, split_ratio)

# 
datasets = {
    'train': ImageDataset(dataset_path, dataset_labels['train'],
        dataset_transforms['train']),
    'val': ImageDataset(dataset_path, dataset_labels['val'],
        dataset_transforms['val']),
    'test': ImageDataset(dataset_path, dataset_labels['test'],
        dataset_transforms['test']) }

model = resnet50_based_model(n_classes)
# model = mobilenetv2_based_model(n_classes)
model = torch.jit.script(model).to(device)

criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.001, momentum=0.9)
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=7, gamma=0.1)

dataloaders = {
    'train': torch.utils.data.DataLoader(datasets['train'],
        batch_size=batch_size, num_workers=n_workers, shuffle=True),
    'val': torch.utils.data.DataLoader(datasets['val'], batch_size=batch_size,
        num_workers=n_workers, shuffle=True),
    'test': torch.utils.data.DataLoader(datasets['test'], batch_size=batch_size,
        num_workers=n_workers, shuffle=True) }

trained_model_ckpt_path = os.path.join(root, 'models',
    'resnet50_based_classifier.pth')

Downloading: "https://download.pytorch.org/models/mobilenet_v2-b0353104.pth" to /root/.cache/torch/hub/checkpoints/mobilenet_v2-b0353104.pth


HBox(children=(FloatProgress(value=0.0, max=14212972.0), HTML(value='')))




In [None]:
trained_model, val_acc = train(model, criterion, optimizer, scheduler, n_epochs)
torch.save({
    'state_dict': trained_model.state_dict(),
    'acc': val_acc,
    'n_classes': n_classes }, trained_model_ckpt_path)

100%|██████████| 379/379 [06:52<00:00,  1.09s/it, Acc=0.249, Epoch=1/5, Loss=3.85]
100%|██████████| 80/80 [01:45<00:00,  1.32s/it, Acc=0.496, Epoch=1/5, Loss=1.92]
100%|██████████| 379/379 [01:27<00:00,  4.31it/s, Acc=0.48, Epoch=2/5, Loss=1.99]
100%|██████████| 80/80 [00:12<00:00,  6.21it/s, Acc=0.596, Epoch=2/5, Loss=1.47]
100%|██████████| 379/379 [01:27<00:00,  4.33it/s, Acc=0.543, Epoch=3/5, Loss=1.69]
100%|██████████| 80/80 [00:12<00:00,  6.19it/s, Acc=0.634, Epoch=3/5, Loss=1.29]
100%|██████████| 379/379 [01:27<00:00,  4.32it/s, Acc=0.597, Epoch=4/5, Loss=1.5]
100%|██████████| 80/80 [00:12<00:00,  6.22it/s, Acc=0.658, Epoch=4/5, Loss=1.21]
100%|██████████| 379/379 [01:27<00:00,  4.34it/s, Acc=0.62, Epoch=5/5, Loss=1.4]
100%|██████████| 80/80 [00:12<00:00,  6.37it/s, Acc=0.686, Epoch=5/5, Loss=1.11]


In [None]:
trained_model = resnet50_based_model(n_embeddings)
trained_model.load_state_dict(torch.load(trained_model_ckpt_path)['state_dict'])
trained_model.eval()
trained_model = torch.jit.script(trained_model).to(device)

test_acc = test(trained_model)

100%|██████████| 80/80 [00:17<00:00,  4.70it/s, Acc=0.736]
