## Pawpularity Transfer Learning Approach in Pytorch

This notebook implements a resnet50 architecture with pre-trained weights, replacing the final layer and re-training the last convlutional layer set to predict a pawpularity score bounded between 0 and 100.  Everything is implemented from scratch in pytorch.  


- A custom pytorch dataset class is implemented to attach scores to each image file, as well as the annotations.  Currently only the images are being used to train the model.  
- The model is a resnet50 architectur where the final fully connected layer is replaced with two fully connected layers and output 1 value.
- The model starts with pretrained weights for all of the convolutional layers, and the final set of layers in the model (layer4) is unfrozen to allow it to learn a feature representation more specifi to this task.  Training refreezes layer4 after 4 epochs and continues to only train the fully connected layers after that to reduce overfitting.  
- The final activation is sigmoid to bound the output between 1 and 0, and output is multiplied by 100 in the training loop to give it a bounded output between 0 and 100 which matches the range of pawpularity scores.  
- The model is optimising for mean squared error(MSE), using Adam with weight decay to reduce overfitting.
- The final competition evaluation metric is the square root of MSE or 
$ \textrm{RMSE} = \sqrt{\frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2} $


### Load Dependencies

In [None]:
import pandas as pd
import numpy as np
import math
import time
import os
from skimage import io, transform
import PIL

In [None]:
%matplotlib inline
%config InlineBackend.figure_format = 'retina'

import matplotlib.pyplot as plt

import torch
from torchvision import datasets, transforms, models

from torch import nn, optim
import torch.nn.functional as F
from torch.optim.lr_scheduler import ExponentialLR
from torch.utils.data import Dataset
from torch.utils.data.sampler import SubsetRandomSampler

In [None]:
# Config
data_dir = '../input/petfinder-pawpularity-score/'
model_dir = '../input/resnet50-pretrained-model/'
working_dir = './'
global_batch_size = 64
workers = 2
np.random.seed(10)
print(os.listdir(data_dir))
print(os.listdir(f'{data_dir}train')[0:4])

### Load and Explore data

**Look at the annotations**

In [None]:
train_df = pd.read_csv(f'{data_dir}train.csv')

In [None]:
train_df.head()

In [None]:
train_df.info()

In [None]:
# Annotations
np.array(train_df.iloc[2, 1:13])

In [None]:
# Scores
train_df.iloc[2, 13]

In [None]:
n, bins, patches = plt.hist(train_df.iloc[:, 13], 50, density=True, facecolor='g', alpha=0.75)

plt.xlabel('Pawpularity')
plt.ylabel('Frequency')
plt.title('Pawpularity Histogram')
plt.xlim(0, 100)
# plt.ylim(0, 0.03)
plt.grid(True)
plt.show()

**Custom dataset class to attach annotations and scores to the images**

This is a critical step to attach the classes and annotations to the image files and allow this to be put into a pytorch dataloader.  

In [None]:
class PawpularityDataset(Dataset):
    """Dataset connecting animal images to the score and annotations"""

    def __init__(self, csv_file, img_dir, transform=transforms.ToTensor()):
        """
        Args:
            csv_file (string): Path to the csv file with annotations.
            img_dir (string): Directory with all the images.
            transform (callable, optional): Optional transform to be applied
                on a sample.
        """

        self.annotations_csv = pd.read_csv(csv_file)
        self.img_dir = img_dir
        self.transform = transform

    def __len__(self):
        return len(self.annotations_csv)

    def __getitem__(self, idx):
        if torch.is_tensor(idx):
            idx = idx.tolist()

        img_name = os.path.join(self.img_dir,
                                self.annotations_csv.iloc[idx, 0])

        # load each image in PIL format for compatibility with transforms
        image = PIL.Image.open(img_name + '.jpg')
        
        # Columns 1 to 12 contain the annotations
        annotations = np.array(self.annotations_csv.iloc[idx, 1:13])
        annotations = annotations.astype('float')
        # Column 13 has the scores
        score = np.array(self.annotations_csv.iloc[idx, 13])
        score = torch.tensor(score.astype('float')).view(1).to(torch.float32)

        # Apply the transforms
        image = self.transform(image)

        sample = [image, annotations, score]
        return sample

**Define global image transforms**

In [None]:
# Test out the transforms on an image (images need to be made the same size for the dataset to work)
# Apply some image augmentation on the training set (rotation, flip)
# Normalize using imagenet RGB mean and std

img_transforms = transforms.Compose([transforms.Resize(255),
                                     transforms.CenterCrop(224),
                                     transforms.RandomHorizontalFlip(),
                                     transforms.RandomRotation(20),
                                     transforms.ToTensor(),
                                     transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                                          std=[0.229, 0.224, 0.225])])

img_transforms_valid = transforms.Compose([transforms.Resize(255),
                                           transforms.CenterCrop(224),
                                           transforms.ToTensor(),
                                           transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                                                std=[0.229, 0.224, 0.225])])

**Load and check out the datasets**

In [None]:
## Load and set up the final training and validation dataset (use different transforms)

train_data = PawpularityDataset(f'{data_dir}train.csv', f'{data_dir}train', transform=img_transforms)
valid_data = PawpularityDataset(f'{data_dir}train.csv', f'{data_dir}train', transform=img_transforms_valid)

np.random.seed(13)

# obtain random indices that will be used for traingin/validation split
valid_size = 0.1
num_train = len(train_data)
indices = list(range(num_train))
np.random.shuffle(indices)
split = int(np.floor(valid_size * num_train))
train_idx, valid_idx = indices[split:], indices[:split]

# define samplers for obtaining training and validation batches
train_sampler = SubsetRandomSampler(train_idx)
valid_sampler = SubsetRandomSampler(valid_idx)

train_loader = torch.utils.data.DataLoader(train_data, batch_size=global_batch_size,
                                           sampler=train_sampler, num_workers=workers,
                                           pin_memory=True) 
# sample the validation dataset from a separate dataset the doesn't include the image aug transformations.
valid_loader = torch.utils.data.DataLoader(valid_data, batch_size=global_batch_size,
                                           sampler=valid_sampler, num_workers=workers,
                                           pin_memory=True) 

print(len(train_loader)*global_batch_size)
print(len(valid_loader)*global_batch_size)

In [None]:
# Batch size of 8
images, annotations, scores = next(iter(train_loader))
print(images.shape)
print(scores.shape)
print(annotations.shape)

**Look at some images**

In [None]:
# Helper function to unnormalize and plot images
def im_convert(tensor):
    """ Display a tensor as an image. """
    
    image = tensor.to("cpu").clone().detach()
    image = image.numpy().squeeze()
    image = image * np.array((0.229, 0.224, 0.225)).reshape(3, 1, 1) + np.array((0.485, 0.456, 0.406)).reshape(3, 1, 1)
    img = (image * 255).astype(np.uint8) # unnormalize
    

    return plt.imshow(np.transpose(img, (1, 2, 0)))

In [None]:
im_numpy = images.numpy() # convert images to numpy for display

# plot the images in the batch, along with the corresponding labels
fig = plt.figure(figsize=(20, 10))
# display 20 images
for idx in np.arange(8):
    ax = fig.add_subplot(2, 4, idx+1, xticks=[], yticks=[])
    im_convert(images[idx])
    ax.set_title(scores[idx].item())

### Set up the model structure

In [None]:
import torch.nn as nn
import torch.nn.functional as func
import torch.optim as optim

**Downloading the pretrained model**

To access the pretrained model in a kaggle notebook, download it via pytorch on a local notebook, save the model using torch.save.  Then upload it to your kaggle notebook as a dataset which you can then load via torch.load without having to connect to the internet.

In [None]:
# model = models.resnet50(pretrained=True)
# torch.save(model, 'resnet50_pretrained.pt')

**Load the model, replace the output layer, and choose which layers to freeze/train**

In [None]:
# Load the pretrained resnet50
model = torch.load(f'{model_dir}resnet50_pretrained.pt')

# Disable gradients on all model parameters to freeze the weights
for param in model.parameters():
    param.requires_grad = False

# Replace the final fully connected resnet layer with a 2 fc layer network and sigmoid output
model.fc = nn.Sequential(nn.Linear(2048, 256),
                         nn.ReLU(),
                         nn.Linear(256, 1),
                         nn.Sigmoid())

for param in model.fc.parameters():
    param.requires_grad = True
    
# Unfreeze the last few layers of the model

for param in model.layer4.parameters():
    param.requires_grad = True

In [None]:
print(model)

In [None]:
torch.manual_seed(13)

criterion = nn.MSELoss(reduction='sum')

#Adam with L2 regularization
optimizer = optim.AdamW(model.parameters(), lr=0.000025, weight_decay=50)

# Learning rate decay
scheduler = optim.lr_scheduler.MultiStepLR(optimizer, milestones = [1, 2, 6], gamma=0.5)

In [None]:
# Test out the forward pass on a single batch

images, annotations, scores = next(iter(train_loader))
with torch.no_grad():
    train_loss = 0.0
    output = model(images)*100 # convert sigmoid output to pawpularity scale
    loss = criterion(output, scores)
    math.sqrt(loss.item()/global_batch_size)

print(scores.dtype)
print(output.dtype)
print('Starting RMSE: ', torch.mean(output))
print('Prediction Standard Deviation: ', torch.std(output))

### Train the model

In [None]:
# check if CUDA is available and set the training device

train_on_gpu = torch.cuda.is_available()
device = torch.cuda.get_device_name()

if not train_on_gpu:
    print('CUDA is not available.  Training on CPU ...')
else:
    print(f'CUDA is available!  Training on GPU {device}...')

**Model training loop**

Run the training and validation steps for a fixed number of epochs, and save the model anytime the validation loss decreases.  

In [None]:
# Training and validation loop

if train_on_gpu:
    model.cuda()

n_epochs = 10

valid_loss_min = np.Inf # track change in validation loss

train_losses, valid_losses = [], []

for epoch in range(1, n_epochs+1):
    
    start = time.time()
    current_lr = scheduler.get_last_lr()[0]
    
    # keep track of training and validation loss
    train_loss = 0.0
    valid_loss = 0.0
    
    # Stop training the convolutional layers after a certain point
    if epoch > 3:
        for param in model.layer4.parameters():
            param.requires_grad = False
    
    ###################
    # train the model #
    ###################
    # put in training mode (enable dropout)
    model.train()
    for images, annotations, scores in train_loader:
        # move tensors to GPU if CUDA is available
        if train_on_gpu:
            images, annotations, scores = images.cuda(), annotations.cuda(), scores.cuda()
        # clear the gradients of all optimized variables
        optimizer.zero_grad()
        # forward pass: compute predicted outputs by passing inputs to the model
        output = model(images)*100 # multiply by 100 the sigmoid output to 0-100 pawpularity scale
        # print(output.dtype)
        # print(scores.dtype)
        # calculate the batch loss
        loss = criterion(output, scores)
        # backward pass: compute gradient of the loss with respect to model parameters
        loss.backward()
        # perform a single optimization step (parameter update)
        optimizer.step()
        # update training loss
        train_loss += loss.item()
        
    ######################    
    # validate the model #
    ######################
    # eval mode (no dropout)
    model.eval()
    with torch.no_grad():
        for images, annotations, scores in valid_loader:
            # move tensors to GPU if CUDA is available
            if train_on_gpu:
                images, annotations, scores = images.cuda(), annotations.cuda(), scores.cuda()
            # forward pass: compute predicted outputs by passing inputs to the model
            output = model(images)*100 # multiply by 100 the sigmoid output to 0-100 pawpularity scale
            # calculate the batch loss
            loss = criterion(output, scores)
            # update average validation loss 
            valid_loss += loss.item()
    
    # calculate RMSE
    train_loss = math.sqrt(train_loss/len(train_loader.sampler))
    valid_loss = math.sqrt(valid_loss/len(valid_loader.sampler))
    
    train_losses.append(train_loss)
    valid_losses.append(valid_loss)
        
    # increment learning rate decay
    scheduler.step()
    
    # print training/validation statistics 
    # print(f'Epoch: {e}, {float(time.time() - start):.3f} seconds, lr={optimizer.lr}')
    print('Epoch: {}, time: {:.1f}s, lr: {:.7f} \tTraining Loss: {:.3f} \tValidation Loss: {:.3f}'.format(
        epoch, float(time.time() - start), current_lr, train_loss, valid_loss))
    
    # save model if validation loss has decreased
    if valid_loss <= valid_loss_min:
        print('Validation loss decreased ({:.3f} --> {:.3f}).  Saving model ...'.format(
        valid_loss_min,
        valid_loss))
        torch.save(model.state_dict(), f'{working_dir}pawpularity_best_model.pt')
        valid_loss_min = valid_loss    

### Diagnostics and performance

In [None]:
# Load the best performing model on the validation set
model.load_state_dict(torch.load(f'{working_dir}pawpularity_best_model.pt'))

In [None]:
# get the distribution of predictions

predictions = []
score_list = []

model.eval()
with torch.no_grad():
    for images, annotations, scores in valid_loader:
        # move tensors to GPU if CUDA is available
        if train_on_gpu:
            images, annotations, scores = images.cuda(), annotations.cuda(), scores.cuda()
        # forward pass: compute predicted outputs by passing inputs to the model
        output = model(images)*100
        predictions.extend(list(output.cpu().detach().numpy().reshape(len(output),)))
        score_list.extend(list(scores.cpu().detach().numpy().reshape(len(scores),)))
        

preds_df = pd.DataFrame({'preds': predictions})
preds_df.describe()

In [None]:
# Manually Check RMSE

diffs = np.array(score_list) - np.array(predictions)
print(math.sqrt((diffs @ diffs)/len(valid_loader.sampler)))


In [None]:
# Check that manually increasing the variance doesn't help

mean = np.mean(np.array(predictions))
stddev = np.std((np.array(predictions)))
print(mean, stddev)
updated_normalized = 1.5*(predictions-mean)/stddev
new_predictions = updated_normalized+predictions

diffs = np.array(score_list) - np.array(new_predictions)
print(math.sqrt((diffs @ diffs)/len(valid_loader.sampler)))

In [None]:
# Histogram of validation predictions - if this is too narrow that's an issue

n, bins, patches = plt.hist(predictions, 50, density=True, facecolor='g', alpha=0.75)

plt.xlabel('Pawpularity')
plt.ylabel('Frequency')
plt.title('Predicted Pawpularity Histogram')
plt.xlim(0, 100)
plt.ylim(0, .2)
plt.grid(True)
plt.show()

The range could still be a greater, and the model is failing completely at predicting the highest ranked images that get a score of 100.  That said, it is producing a much higher range of predictions than the and feels like it would give useful information to users submitting photos.  

In [None]:
# Histogram of validation set actual scores

n, bins, patches = plt.hist(train_df.iloc[valid_idx, 13], 50, density=True, facecolor='g', alpha=0.75)

plt.xlabel('Pawpularity')
plt.ylabel('Frequency')
plt.title('Actual Pawpularity Histogram')
plt.xlim(0, 100)
plt.ylim(0, .2)
plt.grid(True)
plt.show()

In [None]:
# Plot the losses
fig = plt.figure()
ax = plt.axes()
ax.plot(list(range(1, len(train_losses))), train_losses[1:])
ax.plot(list(range(1, len(valid_losses))), valid_losses[1:]);
print(f'best score: {valid_loss_min}')

### Show examples of images and predicted vs. actual scores

In [None]:
images, annotations, scores = next(iter(valid_loader))
images, annotations, scores = images.cuda(), annotations.cuda(), scores.cuda()

In [None]:
output_plot = model(images).cpu()*100
images, annotations, scores = images.cpu(), annotations.cpu(), scores.cpu()

In [None]:
# plot the images in the batch, along with the corresponding labels and predictions

fig = plt.figure(figsize=(20, 10))
# display 20 images
for idx in np.arange(12):
    ax = fig.add_subplot(3, 4, idx+1, xticks=[], yticks=[])
    im_convert(images[idx])
    ax.set_title(f'Act: {round(scores[idx].item())} Pred: {round(output_plot[idx].item())}')

### Use the model to predict the test dataset

In [None]:
test_df = pd.read_csv(f'{data_dir}test.csv')
test_df.head(10)

In [None]:
# Load the best performing model on the validation set
model.load_state_dict(torch.load('pawpularity_best_model.pt'))

In [None]:
class PawpularityTestDataset(Dataset):
    """Dataset connecting dog images to the score and annotations"""

    def __init__(self, csv_file, img_dir, transform=transforms.ToTensor()):
        """
        Args:
            csv_file (string): Path to the csv file with annotations.
            img_dir (string): Directory with all the images.
            transform (callable, optional): Optional transform to be applied
                on a sample.
        """

        self.annotations_csv = pd.read_csv(csv_file)
        self.img_dir = img_dir
        self.transform = transform

    def __len__(self):
        return len(self.annotations_csv)

    def __getitem__(self, idx):
        if torch.is_tensor(idx):
            idx = idx.tolist()

        img_name = os.path.join(self.img_dir,
                                self.annotations_csv.iloc[idx, 0])

        # load each image in PIL format for compatibility with transforms
        image = PIL.Image.open(img_name + '.jpg')

        annotations = np.array(self.annotations_csv.iloc[idx, 1:13])
        annotations = annotations.astype('float')

        # Apply the transforms
        image = self.transform(image)

        sample = [image, annotations]
        return sample

In [None]:
## Load the test dataset (careful to use validation transforms without img augmentation)

test_data = PawpularityTestDataset(f'{data_dir}test.csv', f'{data_dir}test', transform=img_transforms_valid)

batch_size = min(len(test_data), global_batch_size)

test_loader = torch.utils.data.DataLoader(test_data, batch_size=batch_size, num_workers=workers) 

In [None]:
# Step through with a reasonable batch size and build up the output dataset

model.eval()
outputs = []
for images, annotations in test_loader:
    # move tensors to GPU if CUDA is available
    if train_on_gpu:
        images, annotations = images.cuda(), annotations.cuda()
    test_output = model(images)*100
    outputs.extend(list(test_output.cpu().detach().numpy().reshape(len(test_output),)))
    
img_names = list( test_df.iloc[:, 0].values)
outputs = [round(x, 2) for x in outputs]

output_df = pd.DataFrame({'Id': img_names, 'Pawpularity': outputs})
output_df.head(10)

In [None]:
# Write the output in the required format
output_df.to_csv('submission.csv', index=False)