Hi to everyone who decides to read this! 

This is my first experience in computer vision and fine tuning pre-trained models, I couldn't get past such a nice dataset with cute animals :)

There will be no cross-validation, and I don't know how to use meta-information here either, so this is a very simple beginner's notebook, and I'm very glad that it works

Importing the necessary libraries and ensuring that timm works without an internet connection:

In [1]:
import os
import sys

import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
import cv2

import albumentations as A
from albumentations.pytorch import ToTensorV2

import torch
from torch import nn
from torch.utils.data import Dataset, DataLoader
from torchvision.transforms import ToTensor

from sklearn.model_selection import train_test_split

sys.path.append("../input/timm-pytorch-image-models")
import timm

Downloading train and test meta DataFrames and appending path for each image:

In [1]:
df_train = pd.read_csv('../input/petfinder-pawpularity-score/train.csv')
df_train['Img_path'] = df_train['Id'].apply(lambda i: '../input/petfinder-pawpularity-score/train/' + i + ".jpg")
df_test = pd.read_csv('../input/petfinder-pawpularity-score/test.csv')
df_test['Pawpularity'] = 0
df_test['Img_path'] = df_test['Id'].apply(lambda j: '../input/petfinder-pawpularity-score/test/' + j + ".jpg")

In [1]:
df_test.head()

In [1]:
df_train.head(10)

In [1]:
df_train.info()

Estimation of the Pawpularity value distribution:

In [1]:
plt.figure(figsize=(15,6))
plt.hist(df_train['Pawpularity'], 
         bins=50,
         color='lightsteelblue',
         edgecolor='black')
plt.ylabel('Count')
plt.xlabel('Pawpularity')
plt.title('Pawpularity distribution')

The distribution looks like a normal one. Values of Pawpularity=100 in this case will not be considered outliers due to the specifics of the data.

Defining some configuration variables:

In [1]:
image_size = 224
batch_size = 64
num_epochs = 10
device = 'cuda' if torch.cuda.is_available() else 'cpu'
print(device)

Defining three groups of transformations.
For validation, only Resize, Normalization and ToTensor are applied, and for visualization of a batch of images, on the contrary, Normalization and ToTensor are not applied.
The mean, std and max_pixel_value are taken from the imagenet dataset.

In [1]:
image_transforms = {'train_transform': A.Compose([A.Resize(image_size, image_size), 
                                                  A.HorizontalFlip(p=0.5), 
                                                  A.VerticalFlip(p=0.5), 
                                                  A.ToSepia(p=0.1), 
                                                  A.Normalize(mean=(0.485, 0.456, 0.406), 
                                                              std=(0.229, 0.224, 0.225), 
                                                              max_pixel_value=255.0, 
                                                              p=1.0), 
                                                  ToTensorV2()]),
                    
                   'validation_transform': A.Compose([A.Resize(image_size, image_size), 
                                                      A.Normalize(mean=(0.485, 0.456, 0.406), 
                                                                  std=(0.229, 0.224, 0.225), 
                                                                  max_pixel_value=255.0, 
                                                                  p=1.0), 
                                                      ToTensorV2()]),
                   'visualization_transform': A.Compose([A.Resize(image_size, image_size), 
                                                         A.HorizontalFlip(p=0.5), 
                                                         A.VerticalFlip(p=0.5),
                                                         A.ToSepia(p=0.1)])}

Creating custom dataset class:

In [1]:
class ImageDataset(Dataset):
    def __init__(self, image_labels, image_dir, transform=None, target_transform=None):
        self.image_labels = image_labels
        self.image_dir = image_dir
        self.transform = transform
        self.target_transform = target_transform
        
        
    def __len__(self):
        return len(self.image_labels)
    
    
    def __getitem__(self, index):
        image_path = self.image_dir.iloc[index]
        image = cv2.imread(image_path)
        image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
        label = self.image_labels.iloc[index]
        if self.transform:
            image = self.transform(image=image)['image']
        if self.target_transform:
            label = self.target_transform(label=label)
        return image, label

Splitting the data and creating train, val, test and visual instances of dataset class:

In [1]:
train_target = df_train['Pawpularity']
train_features = df_train.drop(['Pawpularity'], axis=1)

test_target = df_test['Pawpularity']
test_features = df_test.drop(['Pawpularity'], axis=1)

X_train, X_val, y_train, y_val = train_test_split(train_features, train_target, test_size=0.2)

In [1]:
train_dataset = ImageDataset(y_train, X_train['Img_path'], transform=image_transforms['train_transform'])
val_dataset = ImageDataset(y_val, X_val['Img_path'], transform=image_transforms['validation_transform'])
test_dataset = ImageDataset(test_target, test_features['Img_path'], transform=image_transforms['validation_transform'])
visual_train_dataset =  ImageDataset(y_train, X_train['Img_path'], transform=image_transforms['visualization_transform'])

Creating instances of DataLoader class for all of the datasets:

In [1]:
test_batch_size = len(test_dataset)

In [1]:
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False)
test_loader = DataLoader(test_dataset, batch_size=test_batch_size, shuffle=False)
visual_loader = DataLoader(visual_train_dataset, batch_size=batch_size, shuffle=True)

In [1]:
visual_loader

Checking that the size of the data in the batch matches the requirements of the neural network:

In [1]:
visual_train_f, visual_train_t = next(iter(visual_loader))
print(f'Feature batch shape: {visual_train_f.size()}')
print(f'Target batch shape: {visual_train_t.size()}')

Yes, it's ok there.

Now just look at these augmented cuties that are in the same batch, I would take them all home if I could :)

In [1]:
def plot_batch(features, target, batch_size=batch_size):
    '''Shows one batch of augmented images'''
    plt.figure(figsize=(20, 60))
    for i in range(batch_size):
        img = features[i]
        label = target[i]
        
        plt.subplot(16, 4, i+1)
        plt.title(f'Pawpularity: {label}')
        plt.imshow(img)
    plt.show()

In [1]:
plot_batch(visual_train_f, visual_train_t)

Downloading the model EfficientNet_b0 and pretrained state:

In [1]:
pretrained_path = '../input/timmefficientnet/tf_efficientnet_b0_ns-c0e6a31c.pth'

model = timm.create_model('tf_efficientnet_b0_ns', pretrained=False, in_chans=3)
model.load_state_dict(torch.load(pretrained_path))

for param in model.parameters():
    param.requires_grad = False

print(model)

Changing the last Classifier layer for regression task:

In [1]:
model.classifier = nn.Sequential(nn.Linear(1280, 1000, bias=True), 
                                 nn.SiLU(inplace=True),
                                 nn.Linear(1000, 1, bias=True))

print(model)

Defining optimizer and loss function:

In [1]:
model.to(device)
criterion = torch.nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

Training the model with training loop:

In [1]:
def training_loop(model, training_loader, validation_loader, criterion, optimizer, epochs=num_epochs):
    '''Training loop for train and eval modes'''
    for epoch in range(1, epochs+1):
        train_loss = 0
        for image, target in training_loader:
            image = image.to(device)
            target = target.to(device)
            target = target.unsqueeze(1)
            optimizer.zero_grad()
            outputs = model(image)
            loss = torch.sqrt(criterion(outputs.float(), target.float()))
            
            loss.backward()
            optimizer.step()
            
            train_loss += loss.item()
            
        with torch.no_grad():
            model.eval()
            valid_loss = 0
            for val_image, val_target in validation_loader:
                val_image = val_image.to(device)
                val_target = val_target.to(device)
                val_target = val_target.unsqueeze(1)
                val_outputs = model(val_image)
                val_loss = torch.sqrt(criterion(val_outputs.float(), val_target.float()))
                
                valid_loss += val_loss.item()
                
        print(f'Epoch: {epoch} Training loss: {train_loss/len(training_loader)}  Val loss: {valid_loss/len(validation_loader)}')

In [1]:
training_loop(model, train_loader, val_loader, criterion, optimizer, epochs=num_epochs)

And finally, predictions:

In [1]:
model.eval()
preds = []
for image, target in test_loader:
    image = image.to(device)
    target = target.to(device)
    test_pred = model(image)
    preds.extend(list(test_pred.cpu().detach().numpy().reshape(len(test_pred))))
    
imgs = list(df_test.iloc[:, 0].values)
preds = [round(x, 2) for x in preds]

pred_df = pd.DataFrame({'Id': imgs, 'Pawpularity': preds})
pred_df.head(10)

In [1]:
# pred_df.to_csv('submission.csv', index=False)

Thank You for watching!