## Pawpularity ConvNext Transfer Learning Approach in Pytorch

This notebook implements the new convnext architecture ([A ConvNet for the 2020s](https://arxiv.org/abs/2201.03545)) with pre-trained weights, replacing the output layer and re-training the last stage of convlutional layers to predict a pawpularity score bounded between 0 and 100.  Everything is implemented in pytorch.  Training is also reasonably fast with only ~7 epochs per fold necessary to get to maximum performance on the validation set.  

- A custom pytorch dataset class is implemented to attach scores to each image file, as well as the annotations.  Currently only the images are being used to train the model.  
- The model is a convnext architecture where the final fully connected layer is replaced with two fully connected layers and output 1 value.
- The model starts with pretrained weights for all of the convolutional layers, and the final set of layers in the model (stage 3) is unfrozen to allow it to learn a feature representation more specifi to this task. 
- The final activation is sigmoid to bound the output between 1 and 0, and output is multiplied by 100 in the training loop to give it a bounded output between 0 and 100 which matches the range of pawpularity scores.  
- The model is optimising for mean squared error(MSE), using Adam with weight decay to reduce overfitting.
- This uses 10 folds, and trains 10 models, keeping the weights from epoch that performed the best on the validation set for each.  The test dataset is predicted by running it through each of these 10 models and taking the average of the score predictions.
- The final competition evaluation metric is the square root of MSE or 
$ \textrm{RMSE} = \sqrt{\frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2} $


### Load Dependencies

In [None]:
import pandas as pd
import numpy as np
import math
import time
import os
from skimage import io, transform
import PIL

In [None]:
%matplotlib inline
%config InlineBackend.figure_format = 'retina'

import matplotlib.pyplot as plt

import torch
from torchvision import datasets, transforms, models

from torch import nn, optim
import torch.nn.functional as F
from torch.optim.lr_scheduler import ExponentialLR
from torch.utils.data import Dataset
from torch.utils.data.sampler import SubsetRandomSampler

In [None]:
# Config
data_dir = '../input/petfinder-pawpularity-score/'
model_dir = '../input/convnext-pretrained-model-v2/'
weights_dir = '../input/convnext-pretrained-weights-2-stage/'
working_dir = './'
global_batch_size = 64
workers = 2
np.random.seed(10)
print(os.listdir(data_dir))
print(os.listdir(f'{data_dir}train')[0:4])

### Load and Explore data

**Look at the annotations**

In [None]:
train_df = pd.read_csv(f'{data_dir}train.csv')

In [None]:
train_df.head()

In [None]:
train_df.info()

In [None]:
# Annotations
np.array(train_df.iloc[2, 1:13])

In [None]:
# Scores
train_df.iloc[2, 13]

In [None]:
n, bins, patches = plt.hist(train_df.iloc[:, 13], 50, density=True, facecolor='g', alpha=0.75)

plt.xlabel('Pawpularity')
plt.ylabel('Frequency')
plt.title('Pawpularity Histogram')
plt.xlim(0, 100)
# plt.ylim(0, 0.03)
plt.grid(True)
plt.show()

**Custom dataset class to attach annotations and scores to the images**

This is a critical step to attach the classes and annotations to the image files and allow this to be put into a pytorch dataloader.  

In [None]:
class PawpularityDataset(Dataset):
    """Dataset connecting animal images to the score and annotations"""

    def __init__(self, csv_file, img_dir, transform=transforms.ToTensor()):
        """
        Args:
            csv_file (string): Path to the csv file with annotations.
            img_dir (string): Directory with all the images.
            transform (callable, optional): Optional transform to be applied
                on a sample.
        """

        self.annotations_csv = pd.read_csv(csv_file)
        self.img_dir = img_dir
        self.transform = transform

    def __len__(self):
        return len(self.annotations_csv)

    def __getitem__(self, idx):
        if torch.is_tensor(idx):
            idx = idx.tolist()

        img_name = os.path.join(self.img_dir,
                                self.annotations_csv.iloc[idx, 0])

        # load each image in PIL format for compatibility with transforms
        image = PIL.Image.open(img_name + '.jpg')
        
        # Columns 1 to 12 contain the annotations
        annotations = np.array(self.annotations_csv.iloc[idx, 1:13])
        annotations = annotations.astype('float')
        # Column 13 has the scores
        score = np.array(self.annotations_csv.iloc[idx, 13])
        score = torch.tensor(score.astype('float')).view(1).to(torch.float32)

        # Apply the transforms
        image = self.transform(image)

        sample = [image, annotations, score]
        return sample

**Define global image transforms**

In [None]:
# Test out the transforms on an image (images need to be made the same size for the dataset to work)
# Apply some image augmentation on the training set (rotation, flip)
# Normalize using imagenet RGB mean and std

img_transforms = transforms.Compose([transforms.Resize(255),
                                     transforms.CenterCrop(224),
                                     transforms.RandomHorizontalFlip(),
                                     transforms.RandomRotation(20),
                                     transforms.ToTensor(),
                                     transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                                          std=[0.229, 0.224, 0.225])])

img_transforms_valid = transforms.Compose([transforms.Resize(255),
                                           transforms.CenterCrop(224),
                                           transforms.ToTensor(),
                                           transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                                                std=[0.229, 0.224, 0.225])])

**Load and check out the datasets and create kfold dataloaders**

In [None]:
# Load and set up the final training and validation dataset (use different transforms)
# Return a list of train/valid dataloaders with different train/test splits for cross validation
from sklearn.model_selection import KFold

# Create two versions of the dataset with and without image augmentation
augmented_data = PawpularityDataset(f'{data_dir}train.csv', f'{data_dir}train', transform=img_transforms)
base_transform_data = PawpularityDataset(f'{data_dir}train.csv', f'{data_dir}train', transform=img_transforms_valid)

def get_cv_dataloaders(augmented_data, base_transform_data, folds=5, cv_shuffle=True, rands=10):
    
    num_images = len(augmented_data)
    indices = list(range(num_images))
    
    dataloaders = []
    
    # use sklearn kfold to split into random training/validation indices
    cv = KFold(n_splits=folds, random_state=rands, shuffle=cv_shuffle)
    for train_idx, valid_idx in cv.split(indices):
        # define samplers for obtaining training and validation batches
        train_sampler = SubsetRandomSampler(train_idx)
        valid_sampler = SubsetRandomSampler(valid_idx)

        # create dataloaders using the cv indexes
        train_loader = torch.utils.data.DataLoader(augmented_data, batch_size=global_batch_size,
                                                   sampler=train_sampler, num_workers=workers,
                                                   pin_memory=True) 
        # sample the validation dataset from a separate dataset the doesn't include the image aug transformations.
        valid_loader = torch.utils.data.DataLoader(base_transform_data, batch_size=global_batch_size,
                                                   sampler=valid_sampler, num_workers=workers,
                                                   pin_memory=True) 

        # print('Train length: ', len(train_loader)*global_batch_size)
        # print('Valid length: ', len(valid_loader)*global_batch_size)
        
        dataloaders.append((train_loader, valid_loader))
        
    return dataloaders

        


In [None]:
cv_dataloaders = get_cv_dataloaders(augmented_data=augmented_data, 
                                base_transform_data=base_transform_data,
                                folds=3, 
                                cv_shuffle=True)

In [None]:
# Batch size of 64
tl = cv_dataloaders[0][1]
images, annotations, scores = next(iter(tl))
print(images.shape)
print(scores.shape)
print(annotations.shape)

**Look at some images**

In [None]:
# Helper function to unnormalize and plot images
def im_convert(tensor):
    """ Display a tensor as an image. """
    
    image = tensor.to("cpu").clone().detach()
    image = image.numpy().squeeze()
    image = image * np.array((0.229, 0.224, 0.225)).reshape(3, 1, 1) + np.array((0.485, 0.456, 0.406)).reshape(3, 1, 1)
    img = (image * 255).astype(np.uint8) # unnormalize
    

    return plt.imshow(np.transpose(img, (1, 2, 0)))

In [None]:
im_numpy = images.numpy() # convert images to numpy for display

# plot the images in the batch, along with the corresponding labels
fig = plt.figure(figsize=(20, 10))
# display 20 images
for idx in np.arange(8):
    ax = fig.add_subplot(2, 4, idx+1, xticks=[], yticks=[])
    im_convert(images[idx])
    ax.set_title(scores[idx].item())

### Set up the model structure

In [None]:
import torch.nn as nn
import torch.nn.functional as func
import torch.optim as optim

**Downloading the pretrained model**

To access the pretrained model in a kaggle notebook, download it via pytorch on a local notebook, save the model using torch.save.  Then upload it to your kaggle notebook as a dataset which you can then load via torch.load without having to connect to the internet.

For the convnext model, I downloaded the [model definition](https://github.com/facebookresearch/ConvNeXt/blob/dc7823d8a2ecc554fcd57ff6cdb7748011bcdedd/models/convnext.py) (which includes url paths for pretrained weights) to a file, and uploaded this as a dataset to use in my kaggle notebook.  

I'm also using the a [timm dataset](https://www.kaggle.com/kozodoi/timm-pytorch-image-models) to load the timm module which is required for ConvNext.  

In [None]:
# Pytorch implementation of convnext
# Source: https://github.com/facebookresearch/ConvNeXt/blob/dc7823d8a2ecc554fcd57ff6cdb7748011bcdedd/models/convnext.py

import torch
import torch.nn as nn
import torch.nn.functional as F
import sys
sys.path.append('../input/timm-pytorch-image-models/pytorch-image-models-master')
from timm.models.layers import trunc_normal_, DropPath
from timm.models.registry import register_model

class Block(nn.Module):
    r""" ConvNeXt Block. There are two equivalent implementations:
    (1) DwConv -> LayerNorm (channels_first) -> 1x1 Conv -> GELU -> 1x1 Conv; all in (N, C, H, W)
    (2) DwConv -> Permute to (N, H, W, C); LayerNorm (channels_last) -> Linear -> GELU -> Linear; Permute back
    We use (2) as we find it slightly faster in PyTorch
    
    Args:
        dim (int): Number of input channels.
        drop_path (float): Stochastic depth rate. Default: 0.0
        layer_scale_init_value (float): Init value for Layer Scale. Default: 1e-6.
    """
    def __init__(self, dim, drop_path=0., layer_scale_init_value=1e-6):
        super().__init__()
        self.dwconv = nn.Conv2d(dim, dim, kernel_size=7, padding=3, groups=dim) # depthwise conv
        self.norm = LayerNorm(dim, eps=1e-6)
        self.pwconv1 = nn.Linear(dim, 4 * dim) # pointwise/1x1 convs, implemented with linear layers
        self.act = nn.GELU()
        self.pwconv2 = nn.Linear(4 * dim, dim)
        self.gamma = nn.Parameter(layer_scale_init_value * torch.ones((dim)), 
                                    requires_grad=True) if layer_scale_init_value > 0 else None
        self.drop_path = DropPath(drop_path) if drop_path > 0. else nn.Identity()

    def forward(self, x):
        input = x
        x = self.dwconv(x)
        x = x.permute(0, 2, 3, 1) # (N, C, H, W) -> (N, H, W, C)
        x = self.norm(x)
        x = self.pwconv1(x)
        x = self.act(x)
        x = self.pwconv2(x)
        if self.gamma is not None:
            x = self.gamma * x
        x = x.permute(0, 3, 1, 2) # (N, H, W, C) -> (N, C, H, W)

        x = input + self.drop_path(x)
        return x

class ConvNeXt(nn.Module):
    r""" ConvNeXt
        A PyTorch impl of : `A ConvNet for the 2020s`  -
          https://arxiv.org/pdf/2201.03545.pdf
    Args:
        in_chans (int): Number of input image channels. Default: 3
        num_classes (int): Number of classes for classification head. Default: 1000
        depths (tuple(int)): Number of blocks at each stage. Default: [3, 3, 9, 3]
        dims (int): Feature dimension at each stage. Default: [96, 192, 384, 768]
        drop_path_rate (float): Stochastic depth rate. Default: 0.
        layer_scale_init_value (float): Init value for Layer Scale. Default: 1e-6.
        head_init_scale (float): Init scaling value for classifier weights and biases. Default: 1.
    """
    def __init__(self, in_chans=3, num_classes=1000, 
                 depths=[3, 3, 9, 3], dims=[96, 192, 384, 768], drop_path_rate=0., 
                 layer_scale_init_value=1e-6, head_init_scale=1.,
                 ):
        super().__init__()

        self.downsample_layers = nn.ModuleList() # stem and 3 intermediate downsampling conv layers
        stem = nn.Sequential(
            nn.Conv2d(in_chans, dims[0], kernel_size=4, stride=4),
            LayerNorm(dims[0], eps=1e-6, data_format="channels_first")
        )
        self.downsample_layers.append(stem)
        for i in range(3):
            downsample_layer = nn.Sequential(
                    LayerNorm(dims[i], eps=1e-6, data_format="channels_first"),
                    nn.Conv2d(dims[i], dims[i+1], kernel_size=2, stride=2),
            )
            self.downsample_layers.append(downsample_layer)

        self.stages = nn.ModuleList() # 4 feature resolution stages, each consisting of multiple residual blocks
        dp_rates=[x.item() for x in torch.linspace(0, drop_path_rate, sum(depths))] 
        cur = 0
        for i in range(4):
            stage = nn.Sequential(
                *[Block(dim=dims[i], drop_path=dp_rates[cur + j], 
                layer_scale_init_value=layer_scale_init_value) for j in range(depths[i])]
            )
            self.stages.append(stage)
            cur += depths[i]

        self.norm = nn.LayerNorm(dims[-1], eps=1e-6) # final norm layer
        self.head = nn.Linear(dims[-1], num_classes)

        self.apply(self._init_weights)
        self.head.weight.data.mul_(head_init_scale)
        self.head.bias.data.mul_(head_init_scale)

    def _init_weights(self, m):
        if isinstance(m, (nn.Conv2d, nn.Linear)):
            trunc_normal_(m.weight, std=.02)
            nn.init.constant_(m.bias, 0)

    def forward_features(self, x):
        for i in range(4):
            x = self.downsample_layers[i](x)
            x = self.stages[i](x)
        return self.norm(x.mean([-2, -1])) # global average pooling, (N, C, H, W) -> (N, C)

    def forward(self, x):
        x = self.forward_features(x)
        x = self.head(x)
        return x

class LayerNorm(nn.Module):
    r""" LayerNorm that supports two data formats: channels_last (default) or channels_first. 
    The ordering of the dimensions in the inputs. channels_last corresponds to inputs with 
    shape (batch_size, height, width, channels) while channels_first corresponds to inputs 
    with shape (batch_size, channels, height, width).
    """
    def __init__(self, normalized_shape, eps=1e-6, data_format="channels_last"):
        super().__init__()
        self.weight = nn.Parameter(torch.ones(normalized_shape))
        self.bias = nn.Parameter(torch.zeros(normalized_shape))
        self.eps = eps
        self.data_format = data_format
        if self.data_format not in ["channels_last", "channels_first"]:
            raise NotImplementedError 
        self.normalized_shape = (normalized_shape, )
    
    def forward(self, x):
        if self.data_format == "channels_last":
            return F.layer_norm(x, self.normalized_shape, self.weight, self.bias, self.eps)
        elif self.data_format == "channels_first":
            u = x.mean(1, keepdim=True)
            s = (x - u).pow(2).mean(1, keepdim=True)
            x = (x - u) / torch.sqrt(s + self.eps)
            x = self.weight[:, None, None] * x + self.bias[:, None, None]
            return x


model_urls = {
    "convnext_tiny_1k": "https://dl.fbaipublicfiles.com/convnext/convnext_tiny_1k_224_ema.pth",
    "convnext_small_1k": "https://dl.fbaipublicfiles.com/convnext/convnext_small_1k_224_ema.pth",
    "convnext_base_1k": "https://dl.fbaipublicfiles.com/convnext/convnext_base_1k_224_ema.pth",
    "convnext_large_1k": "https://dl.fbaipublicfiles.com/convnext/convnext_large_1k_224_ema.pth",
    "convnext_base_22k": "https://dl.fbaipublicfiles.com/convnext/convnext_base_22k_224.pth",
    "convnext_large_22k": "https://dl.fbaipublicfiles.com/convnext/convnext_large_22k_224.pth",
    "convnext_xlarge_22k": "https://dl.fbaipublicfiles.com/convnext/convnext_xlarge_22k_224.pth",
}

@register_model
def convnext_tiny(pretrained=False, **kwargs):
    model = ConvNeXt(depths=[3, 3, 9, 3], dims=[96, 192, 384, 768], **kwargs)
    if pretrained:
        url = model_urls['convnext_tiny_1k']
        checkpoint = torch.hub.load_state_dict_from_url(url=url, map_location="cpu", check_hash=True)
        model.load_state_dict(checkpoint["model"])
    return model

@register_model
def convnext_small(pretrained=False, **kwargs):
    model = ConvNeXt(depths=[3, 3, 27, 3], dims=[96, 192, 384, 768], **kwargs)
    if pretrained:
        url = model_urls['convnext_small_1k']
        checkpoint = torch.hub.load_state_dict_from_url(url=url, map_location="cpu", check_hash=True)
        model.load_state_dict(checkpoint["model"])
    return model

@register_model
def convnext_base(pretrained=False, in_22k=False, **kwargs):
    model = ConvNeXt(depths=[3, 3, 27, 3], dims=[128, 256, 512, 1024], **kwargs)
    if pretrained:
        url = model_urls['convnext_base_22k'] if in_22k else model_urls['convnext_base_1k']
        checkpoint = torch.hub.load_state_dict_from_url(url=url, map_location="cpu", check_hash=True)
        model.load_state_dict(checkpoint["model"])
    return model

@register_model
def convnext_large(pretrained=False, in_22k=False, **kwargs):
    model = ConvNeXt(depths=[3, 3, 27, 3], dims=[192, 384, 768, 1536], **kwargs)
    if pretrained:
        url = model_urls['convnext_large_22k'] if in_22k else model_urls['convnext_large_1k']
        checkpoint = torch.hub.load_state_dict_from_url(url=url, map_location="cpu", check_hash=True)
        model.load_state_dict(checkpoint["model"])
    return model

@register_model
def convnext_xlarge(pretrained=False, in_22k=False, **kwargs):
    model = ConvNeXt(depths=[3, 3, 27, 3], dims=[256, 512, 1024, 2048], **kwargs)
    if pretrained:
        url = model_urls['convnext_xlarge_22k'] if in_22k else model_urls['convnext_xlarge_1k']
        checkpoint = torch.hub.load_state_dict_from_url(url=url, map_location="cpu", check_hash=True)
        model.load_state_dict(checkpoint["model"])
    return model

**Load the model, replace the output layer, and choose which layers to freeze/train**

I'm replacing the final fully connected layer of the convnext model with my own feed forward network, and also unfreezing the last stage of convnext to allow weights there to be trained.  Currently I'm not using the annotations in the model at all. 

In [None]:
# Load the pretrained resnet50 from a file

def load_model(path):

    model = torch.load(path)

    # Disable gradients on all model parameters to freeze the weights
    for param in model.parameters():
        param.requires_grad = False

    # Replace the final fully connected resnet layer with a 2 fc layer network and sigmoid output
    # Also use the annotations
    model.head = nn.Sequential(nn.Linear(1024, 256),
                             nn.ReLU(),
                             nn.Linear(256, 1),
                             nn.Sigmoid())



    for param in model.head.parameters():
        param.requires_grad = True

    # Unfreeze the last stage
    for param in model.stages[3].parameters():
        param.requires_grad = True
    
    return model

model = load_model(f'{model_dir}convnext_base_pretrained_v2.pt')

In [None]:
print(model)

In [None]:
def initialize_optimizer(starting_lr, lambd):
    
    # Loss function using MSE (the goal)
    criterion = nn.MSELoss(reduction='sum')

    #Adam with L2 regularization
    optimizer = optim.AdamW(model.parameters(), lr=starting_lr, weight_decay=lambd)

    # Learning rate decay
    scheduler = optim.lr_scheduler.MultiStepLR(optimizer, milestones = [1, 2, 6], gamma=0.5)
    
    return criterion, optimizer, scheduler

criterion, optimizer, scheduler = initialize_optimizer(0.000025, 100)

In [None]:
# Test out the forward pass on a single batch

images, annotations, scores = next(iter(cv_dataloaders[0][1]))
with torch.no_grad():
    train_loss = 0.0
    output = model(images)*100 # convert sigmoid output to pawpularity scale
    loss = criterion(output, scores)
    RMSE = math.sqrt(loss.item()/global_batch_size)

print(scores.dtype)
print(output.dtype)
print('Starting Prediction: ', torch.mean(output))
print('Starting RMSE: ', RMSE)
print('Prediction Standard Deviation: ', torch.std(output))

### Train the model

**Model training loop**

Run the training and validation steps for a fixed number of epochs, and save the model anytime the validation loss decreases.  

In [None]:
# Training and validation loop

def train_validation_loop(fold, model, train_loader, valid_loader, train_on_gpu, n_epochs=6):

    if train_on_gpu:
        model.cuda()

    valid_loss_min = np.Inf # track change in validation loss

    train_losses, valid_losses = [], []

    for epoch in range(1, n_epochs+10):

        start = time.time()
        current_lr = scheduler.get_last_lr()[0]

        # keep track of training and validation loss
        train_loss = 0.0
        valid_loss = 0.0

        # Stop training the convolutional layers after a certain point
        #if epoch > 4:
        #    for param in model.layer3.parameters():
        #        param.requires_grad = False
        
        #if epoch > 5:
        #    for param in model.layer4.parameters():
        #        param.requires_grad = False

        ###################
        # train the model #
        ###################
        # put in training mode (enable dropout)
        model.train()
        for images, annotations, scores in train_loader:
            # move tensors to GPU if CUDA is available
            if train_on_gpu:
                images, annotations, scores = images.cuda(), annotations.cuda(), scores.cuda()
            # clear the gradients of all optimized variables
            optimizer.zero_grad()
            # forward pass: compute predicted outputs by passing inputs to the model
            output = model(images)*100 # multiply by 100 the sigmoid output to 0-100 pawpularity scale
            # print(output.dtype)
            # print(scores.dtype)
            # calculate the batch loss
            loss = criterion(output, scores)
            # backward pass: compute gradient of the loss with respect to model parameters
            loss.backward()
            # perform a single optimization step (parameter update)
            optimizer.step()
            # update training loss
            train_loss += loss.item()

        ######################    
        # validate the model #
        ######################
        # eval mode (no dropout)
        model.eval()
        with torch.no_grad():
            for images, annotations, scores in valid_loader:
                # move tensors to GPU if CUDA is available
                if train_on_gpu:
                    images, annotations, scores = images.cuda(), annotations.cuda(), scores.cuda()
                # forward pass: compute predicted outputs by passing inputs to the model
                output = model(images)*100 # multiply by 100 the sigmoid output to 0-100 pawpularity scale
                # calculate the batch loss
                loss = criterion(output, scores)
                # update average validation loss 
                valid_loss += loss.item()

        # calculate RMSE
        train_loss = math.sqrt(train_loss/len(train_loader.sampler))
        valid_loss = math.sqrt(valid_loss/len(valid_loader.sampler))

        train_losses.append(train_loss)
        valid_losses.append(valid_loss)

        # increment learning rate decay
        scheduler.step()

        # print training/validation statistics 
        # print(f'Epoch: {e}, {float(time.time() - start):.3f} seconds, lr={optimizer.lr}')
        print('Epoch: {}, time: {:.1f}s, lr: {:.7f} \tTraining Loss: {:.3f} \tValidation Loss: {:.3f}'.format(
            epoch, float(time.time() - start), current_lr, train_loss, valid_loss))

        # save model if validation loss has decreased
        if valid_loss <= valid_loss_min:
            print('Validation loss decreased ({:.3f} --> {:.3f}).  Saving model ...'.format(
            valid_loss_min,
            valid_loss))
            model_name = f'{working_dir}pawpularity_best_model_fold{fold}.pt'
            torch.save(model.state_dict(), model_name)
            valid_loss_min = valid_loss
        
        # Stop early if the min epochs is satisfied and score isn't improving
        if valid_loss > valid_loss_min+0.04 and epoch >= n_epochs:
            break
    
    # Plot the losses
    fig = plt.figure()
    ax = plt.axes()
    ax.plot(list(range(0, len(train_losses))), train_losses[0:])
    ax.plot(list(range(0, len(valid_losses))), valid_losses[0:]);
    print(f'best score: {valid_loss_min}')
        
    return (model_name, valid_loss_min)

In [None]:
# check if CUDA is available and set the training device

train_on_gpu = torch.cuda.is_available()
device = torch.cuda.get_device_name()

if not train_on_gpu:
    print('CUDA is not available.  Training on CPU ...')
else:
    print(f'CUDA is available!  Training on GPU {device}...')

**Skip the training step in this notebook and use model files already uploaded**

This makes submission much faster.  

In [None]:
# Train for all of the cv folds

cv_folds = 10
epochs = 7
starting_lr = 0.00005
lambd = 7 # Regularization 

cv_dataloaders = get_cv_dataloaders(augmented_data=augmented_data, 
                                    base_transform_data=base_transform_data,
                                    folds=cv_folds, 
                                    cv_shuffle=True)

# Skip training in the submission notebook
'''
saved_models = []

for i, (train_loader, valid_loader) in enumerate(cv_dataloaders):
    print('Starting Fold', i)
    
    # Reset the model and schedulers for each new dataset
    model = load_model(f'{model_dir}convnext_base_pretrained.pt')
    criterion, optimizer, scheduler = initialize_optimizer(starting_lr=starting_lr, lambd=lambd)
    
    # Run the training loop and add the best model's filepath
    saved_models.append(train_validation_loop(i, model, train_loader, valid_loader, train_on_gpu, n_epochs=epochs))
    
    #if i >= 2:
    #   break
    print('\n')
'''

### Diagnostics and performance

In [None]:
# Load the best performing model from the first fold
model = load_model(f'{model_dir}convnext_base_pretrained_v2.pt')
model.load_state_dict(torch.load(f'{weights_dir}pawpularity_best_model_fold4.pt'))

if train_on_gpu:
    model.cuda()

In [None]:
# get the distribution of predictions

predictions = []
score_list = []

model.eval()
with torch.no_grad():
    for images, annotations, scores in cv_dataloaders[4][1]: # fold 4, validation dataset
        # move tensors to GPU if CUDA is available
        if train_on_gpu:
            images, annotations, scores = images.cuda(), annotations.cuda(), scores.cuda()
        # forward pass: compute predicted outputs by passing inputs to the model
        output = model(images)*100
        predictions.extend(list(output.cpu().detach().numpy().reshape(len(output),)))
        score_list.extend(list(scores.cpu().detach().numpy().reshape(len(scores),)))
        

preds_df = pd.DataFrame({'preds': predictions})
preds_df.describe()

In [None]:
# Manually Check RMSE

diffs = np.array(score_list) - np.array(predictions)
print(math.sqrt((diffs @ diffs)/len(cv_dataloaders[4][1].sampler)))

In [None]:
# Histogram of validation predictions - if this is too narrow that's an issue

n, bins, patches = plt.hist(predictions, 50, density=True, facecolor='g', alpha=0.75)

plt.xlabel('Pawpularity')
plt.ylabel('Frequency')
plt.title('Predicted Pawpularity Histogram')
plt.xlim(0, 100)
plt.ylim(0, .2)
plt.grid(True)
plt.show()

### Show examples of images and predicted vs. actual scores

In [None]:
output_plot = model(images).cpu()*100
images, annotations, scores = images.cpu(), annotations.cpu(), scores.cpu()

In [None]:
# plot the images in the batch, along with the corresponding labels and predictions

fig = plt.figure(figsize=(20, 10))
# display 20 images
for idx in np.arange(12):
    ax = fig.add_subplot(3, 4, idx+1, xticks=[], yticks=[])
    im_convert(images[idx])
    ax.set_title(f'Act: {round(scores[idx].item())} Pred: {round(output_plot[idx].item())}')

### Use the model to predict the test dataset

Do inference on the best model from each fold, and then take the prediction for each value in the test set.  

In [None]:
test_df = pd.read_csv(f'{data_dir}test.csv')
test_df.head(10)

In [None]:
class PawpularityTestDataset(Dataset):
    """Dataset connecting dog images to the score and annotations"""

    def __init__(self, csv_file, img_dir, transform=transforms.ToTensor()):
        """
        Args:
            csv_file (string): Path to the csv file with annotations.
            img_dir (string): Directory with all the images.
            transform (callable, optional): Optional transform to be applied
                on a sample.
        """

        self.annotations_csv = pd.read_csv(csv_file)
        self.img_dir = img_dir
        self.transform = transform

    def __len__(self):
        return len(self.annotations_csv)

    def __getitem__(self, idx):
        if torch.is_tensor(idx):
            idx = idx.tolist()

        img_name = os.path.join(self.img_dir,
                                self.annotations_csv.iloc[idx, 0])

        # load each image in PIL format for compatibility with transforms
        image = PIL.Image.open(img_name + '.jpg')

        annotations = np.array(self.annotations_csv.iloc[idx, 1:13])
        annotations = annotations.astype('float')

        # Apply the transforms
        image = self.transform(image)

        sample = [image, annotations]
        return sample

In [None]:
## Load the test dataset (careful to use validation transforms without img augmentation)

test_data = PawpularityTestDataset(f'{data_dir}test.csv', f'{data_dir}test', transform=img_transforms_valid)

batch_size = min(len(test_data), global_batch_size)

test_loader = torch.utils.data.DataLoader(test_data, batch_size=batch_size, num_workers=workers) 

In [None]:
# Step through with a reasonable batch size and build up the output dataset


def test_predictions(model, test_loader, model_path, test_df):
    
    # Load a set of trained weights into the model
    model.load_state_dict(torch.load(model_path))
    
    # Move to gpu if necessary
    if train_on_gpu:
        model.cuda()
    
    # Put in evaluation mode
    model.eval()
    
    # Run predictions for each batch in the test dataset
    preds = []
    for images, annotations in test_loader:
        # move tensors to GPU if CUDA is available
        if train_on_gpu:
            images, annotations = images.cuda(), annotations.cuda()
        # get predictions
        test_pred = model(images)*100
        # add predictions from the current batch
        preds.extend(list(test_pred.cpu().detach().numpy().reshape(len(test_pred),)))

    # Get the list of image filenames from the test annotations file
    img_names = np.array(test_df.iloc[:, 0].values)
    
    # Round the outputs
    preds = [round(x, 2) for x in preds]
    
    # return a tuple with the image names and model predictions
    return (img_names, np.array(preds))

In [None]:
print('Using pre uploaded saved models')
model_list = [(f'{weights_dir}pawpularity_best_model_fold0.pt', 17.75550977922519),
              (f'{weights_dir}pawpularity_best_model_fold1.pt', 17.055042532540423),
              (f'{weights_dir}pawpularity_best_model_fold2.pt', 17.723917203277402),
              (f'{weights_dir}pawpularity_best_model_fold3.pt', 17.506054498891437),
              (f'{weights_dir}pawpularity_best_model_fold4.pt', 16.693078011850073),
              (f'{weights_dir}pawpularity_best_model_fold5.pt', 18.116734056509916),
              (f'{weights_dir}pawpularity_best_model_fold6.pt', 17.94325075614874),
              (f'{weights_dir}pawpularity_best_model_fold7.pt', 17.84976622654215),
              (f'{weights_dir}pawpularity_best_model_fold8.pt', 17.56859209356934),
              (f'{weights_dir}pawpularity_best_model_fold9.pt', 18.29225659309586)]
model_list

In [None]:
# Set up the model structure
model = load_model(f'{model_dir}convnext_base_pretrained_v2.pt')

# Iterate through saved models and calculate predictions for each one on the test data
outputs = []
for best_model, rmse in model_list:
    img_names, preds = test_predictions(model, test_loader, best_model, test_df)
    outputs.append(preds)
    
# Get image names
img_list = img_names

# Take the average of the outputs from each model's preds
mean_outputs = np.mean(np.stack(outputs, axis=1), axis=1)

# Join into a dataset
output_df = pd.DataFrame({'Id': img_list,'Pawpularity' : mean_outputs})
output_df.head(10)

In [None]:
# Write the output in the required format
output_df.to_csv('submission.csv', index=False)