# T8 - SVR-Boost - Meta-Swin-Transformer - Pawpularity

This is an ongoing CV-focused kaggle contest (3 months to go from now, Oct, 2021). And you are getting the chance to win a cash prize!

In this contest, you will help the website [PetFinder.my](https://petfinder.my/) to give "Pawpularity" scores to pet photos, which will help them find their homes faster.

The "Pawpularity" scores in the trainning set is derived from each pet profile's page view statistics at the listing pages, using an algorithm that normalizes the traffic data across different pages, platforms and various metrics.

`Metadata`
* For each image, you are provided optional metadata, manually labeling each photo for key visual quality and composition parameters.

* These labels are not used for deriving our Pawpularity score, but it may be beneficial for better understanding the content and co-relating them to a photo's attractiveness. Our end goal is to deploy AI solutions that can generate intelligent recommendations (i.e. show a closer frontal pet face, add accessories, increase subject focus, etc) and automatic enhancements (i.e. brightness, contrast) on the photos, so we are hoping to have predictions that are more easily interpretable.

* You may use these labels as you see fit, and optionally build an intermediate / supplementary model to predict the labels from the photos. If your supplementary model is good, we may integrate it into our AI tools as well.

* In our production system, new photos that are dynamically scored will not contain any photo labels. If the Pawpularity prediction model requires photo label scores, we will use an intermediary model to derive such parameters, before feeding them to the final model.

`Evaluation Metrics`

$$RMSE = \sqrt{\frac{1}{n}\sum_{i=1}^n (y_i - \hat{y}_i)^2}$$

[PetFinder.my - Pawpularity Contest](https://www.kaggle.com/c/petfinder-pawpularity-score/overview/description)

[Reference](https://www.kaggle.com/cdeotte/rapids-svr-boost-17-8)


### Update

In this verison, I combine the prediction of swin-transform and SVR (please see the reference for more details).

# How to Add RAPIDS SVR Head
There are 3 steps to building a double headed model. The first step is to train your Image NN backbone and head. The next step is to train our RAPIDS SVR head with extracted embeddings from frozen Image NN backbone. Lastly, we infer with both heads and average the predictions.

The SVR here is imported from a package called [cuML](https://docs.rapids.ai/api/cuml/stable/), which is developed by the [RAPIDS project](https://rapids.ai/). Comparing with sklearn, cuML is a suite of libraries that implement machine learning algorithms that can be run on GPU.


![](https://raw.githubusercontent.com/cdeotte/Kaggle_Images/main/Oct-2021/st1.png)
![](https://raw.githubusercontent.com/cdeotte/Kaggle_Images/main/Oct-2021/st2.png)
![](https://raw.githubusercontent.com/cdeotte/Kaggle_Images/main/Oct-2021/st3.png)

[1]: https://www.kaggle.com/abhishek/tez-pawpular-swin-ference
[2]: https://www.kaggle.com/cdeotte/rapids-svr-boost-17-8?scriptVersionId=76282086

In [None]:
# install package by adding dataset
# and I also upload the python-box package in dataset: https://www.kaggle.com/zhicongliang/pythonbox
# and you can find timm in dataset: https://www.kaggle.com/kozodoi/timm-pytorch-image-models
# then you can add these datasets to your notebook
!pip install ../input/pythonbox/python_box-5.4.1-py3-none-any.whl
!pip install ../input/timm-pytorch-image-models/pytorch-image-models-master

In [None]:
import os
import pandas as pd
import numpy as np
import tqdm
from PIL import Image
import copy

from sklearn.model_selection import StratifiedKFold

from box import Box

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import DataLoader, Dataset
import torchvision.transforms as T
# https://rwightman.github.io/pytorch-image-models/
import timm

# SVR from RAPIDS
from cuml.svm import SVR

import matplotlib.pyplot as plt
plt.style.use('seaborn')

## Step 0. Configuration

Here we define a dictionary to store our parameters.

In [None]:
config = {
    'root': '../input/petfinder-pawpularity-score/',
    'device': 'cuda', # 'cpu' for cpu, 'cuda' for gpu
    'n_splits': 5,
    'seed': 2021,
    'train_batchsize': 64,
    'val_batchsize': 64,
    'epoch': 20,
    'learning_rate': 1e-5,
    'logger_interval': 1,
    'model_name': 'swin_tiny_patch4_window7_224',
    'pretrain_path': '../input/timmswin/swin_tiny_patch4_window7_224.pth',
    'eta_min': 1e-4,
    'T_0': 20
}

# transform key to attribute. it will be easier for us to refer to these parameters later
config = Box(config)

In [None]:
dense_features = [
    'Subject Focus', 'Eyes', 'Face', 'Near', 'Action', 'Accessory',
    'Group', 'Collage', 'Human', 'Occlusion', 'Info', 'Blur'
]

### Step 1. Load the data

If we are using dataset like cifar10, mnist, svhn and etc., we can directly use torchvision.datasets. However, if you would like to use our own data, we need to constrcut a custom Dataset that will help us load the data and perform some basic transformations.

The most important functions of a custom Dataset is `__len__` and `__getitem__`.

The `__len__` function will return the number of elements in this dataset, while `__getitem__` will return an image-label pair that can be accepted by pytorch given an index.

In [None]:
# define Custom Dataset with pytorch
class PetfinderDataset(Dataset):

    def __init__(self, df, dense_features, image_size=224, transform=None):
        self._X = df["Id"].values
        self._features = df[dense_features].values
        self._y = None
        if "Pawpularity" in df.keys():
            self._y = df["Pawpularity"].values
        if not transform:
            # we resize all the image to the same size
            self._transform = T.Compose([
                T.Resize([image_size, image_size]),
                T.ToTensor(), # transform the PIL image type to torch.tensor
            ])
        else:
            self._transform = transform

    def __len__(self):
        return len(self._X)

    def __getitem__(self, idx):
        image_path = self._X[idx]
        # given the index(path), read the raw image, and then transform it
        # image = read_image(image_path)  # this require the latest torchvision version
        image = Image.open(image_path)
        image = self._transform(image)
        
        features = self._features[idx, :]
        
        # if we have label, then we return the image-label pair (for training)
        # if not, we directly return the image (for testing)
        if self._y is not None:
            label = self._y[idx]
            return image, features, label
        return image, features

In [None]:
df = pd.read_csv(os.path.join(config.root, 'train.csv'))
df['Id'] = df["Id"].apply(lambda x: os.path.join(config.root, "train", x + ".jpg")) # we transform the Id to its image path

train_val_set = PetfinderDataset(df, dense_features=dense_features)

print('# of data:', len(df))
print('range of label [{}, {}]'.format(df['Pawpularity'].min(), df['Pawpularity'].max()))

In [None]:
# we show some images here
plt.figure(figsize=(12, 12))
for idx  in range(16):
    image, features, label = train_val_set.__getitem__(idx)
    plt.subplot(4, 4, idx+1)
    plt.imshow(image.permute(1, 2, 0));
    plt.axis('off')
    plt.title('Pawpularity: {}'.format(label))

## Step 2. Define Swin-Transformer



In [None]:
class Model(nn.Module):
    def __init__(self, name):
        super(Model, self).__init__()
        self.backbone = timm.create_model(name, 
                                          pretrained=False, # it would be very easy to set it to true
                                                            # but in kaggle we could not use internet to download it
                                          num_classes=0, 
                                          in_chans=3)
        
        state_dict = torch.load(config.pretrain_path, map_location=config.device)['model']
        del state_dict['head.weight'] # in the model, we don't have these two parameters actually
        del state_dict['head.bias']
        
        self.backbone.load_state_dict(state_dict)
        self.backbone.head = nn.Linear(self.backbone.num_features, 128)
        self.dropout = nn.Dropout(0.1)
        self.dense1 = nn.Linear(140, 61)
        self.dense2 = nn.Linear(61, 1)
        
    def forward(self, x, features):
        x1 = self.backbone(x) # image embedding
        
        x = self.dropout(x1)
        x = torch.cat([x, features], dim=1)
        x = self.dense1(x)
        x = self.dense2(x)
        
        # return the intermediate result
        x = torch.cat([x, x1, features], dim=1)
        return x
    

In [None]:
model = Model(config.model_name)
model

## Step 3. Train Our Model with Cross Validation

In [None]:
def test(model, test_loader):
    model.eval() # turn model into evaluation mode
    test_loss = 0
    with torch.no_grad():
        for data, features, target in test_loader:
            data, features, target = data.to(config.device),features.float().to(config.device), target.float().to(config.device)/100.
            output = model(data, features)
            test_loss += F.mse_loss(output.sigmoid().view(-1), 
                                    target.view(-1), reduction='sum').item()  # sum up batch loss

    test_loss /= len(test_loader.dataset)
    return np.sqrt(test_loss) # RMSE 

In [None]:
def mixup(x, features, y, alpha=1):
    assert alpha > 0, "alpha should be larger than 0"
    assert x.size(0) > 1, "Mixup cannot be applied to a single instance."
    
    lam = np.random.beta(alpha, alpha)
#     for the shape of lam, run the following two lines
#     import seaborn as sns
#     sns.distplot(np.random.beta(0.5,0.5, 1000), bins=100)
    rand_index = torch.randperm(x.size()[0]) # random permutation of images in the batch x
    mixed_x = lam * x + (1-lam) * x[rand_index, :]
    mixed_features = lam * features + (1-lam) * features[rand_index, :]
    target_a, target_b = y, y[rand_index]
    return mixed_x, mixed_features, target_a, target_b, lam
    

In [None]:
# we split the dataset into for cross-validation
# here we treat the label "Pawpularity" as categorical data, and use the StratifiedKfol Function
# actually it is numerical data
skf = StratifiedKFold(
    n_splits=config.n_splits, shuffle=True, random_state=config.seed
)

In [None]:
# my pretrained weighted: https://www.kaggle.com/zhicongliang/pawpularitymetaswintransformer
## load the models
best_model_fold = []
for fold in range(5):
    model = Model(config.model_name).to(config.device)
    model.load_state_dict(torch.load('../input/pawpularitymetaswintransformer/meta_swin_transformer_fold_{}.h5'.format(fold), map_location=torch.device(config.device)))
    best_model_fold.append(model)

In [None]:
# we keep record of the training in each fold
super_final_prediction_nn = []
super_final_prediction_svr = []
super_final_oof_prediction_nn = []
super_final_oof_prediction_svr = []
super_final_oof_true = []

svrs = []

for fold, (train_idx, val_idx) in enumerate(skf.split(df["Id"], df["Pawpularity"])):
    
    print('================================ CV fold {} ================================'.format(fold))
    
    train_df = df.loc[train_idx].reset_index(drop=True)
    val_df = df.loc[val_idx].reset_index(drop=True)
    
    # we would like to do some random transformation to our training data such that
    # our model can be more rubost against different patterns in out-of-sample data
    train_transform = T.Compose([
        T.Resize([224, 224]), # crop the image size to 3*224*224
        T.RandomHorizontalFlip(), # random flip the image horizontally
        T.RandomVerticalFlip(), # random flip the image vertically
        T.RandomAffine(15, translate=(0.1, 0.1), scale=(0.9, 1.1)), # Random affine transformation of the image keeping center invariant.
        T.ColorJitter(brightness=0.1, contrast=0.1, saturation=0.1), # randomly changes the brightness, saturation, and other properties of an image
        T.ToTensor(),
        T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
    ])
    # in validation set, we only convert our data to torch.float and do a normalization
    val_transform = T.Compose([
        T.Resize([224, 224]),
        T.ToTensor(),
        T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
    ])
    
    
    train_set = PetfinderDataset(train_df, transform=train_transform, dense_features=dense_features)
    val_set = PetfinderDataset(val_df, transform=val_transform, dense_features=dense_features)
    
    # then we define the dataloader for training and validation
    # it tells the machine how to sample from our training/validation set
    train_loader = DataLoader(train_set, batch_size=config.train_batchsize, shuffle=True, num_workers=4)
    val_loader = DataLoader(val_set, batch_size=config.val_batchsize, num_workers=4)
    
    # get pretrained best model
    model = best_model_fold[fold]
    
    ### Training SVR over training set
    # get model heads
    for batch_idx, (data, features, target) in enumerate(train_loader):
        data, features = data.float().to(config.device), features.float().to(config.device)
        output = model(data, features)
        output = output.detach()
        if batch_idx == 0:
            train_pred = output
        else:
            train_pred = torch.cat([train_pred, output], dim=0)
    
    # get the embedding feature of the neural network
    train_embed = train_pred[:,1:]
    
    ## fit RAPIDS SVR
    clf = SVR(C=5.)
    clf.fit(train_embed, train_df.Pawpularity.values.astype('int32'))
    svrs.append(clf)
    
    ### validate the model over validation set
    # get model heads
    for batch_idx, (data, features, target) in enumerate(val_loader):
        data, features = data.float().to(config.device), features.float().to(config.device)
        output = model(data, features)
        output = output.detach()
        if batch_idx == 0:
            val_pred = output
        else:
            val_pred = torch.cat([val_pred, output], dim=0)
    
    val_embed = val_pred[:,1:]
    
    final_oof_prediction_nn = val_pred[:,0].sigmoid().to('cpu').numpy() * 100
    final_oof_prediction_svr = clf.predict(val_embed).get().clip(0,100)
    
    super_final_oof_prediction_nn.append(final_oof_prediction_nn)
    super_final_oof_prediction_svr.append(final_oof_prediction_svr)
    
    final_oof_true = val_df['Pawpularity'].values
    super_final_oof_true.append(final_oof_true)
    
    # COMPUTE RSME
    rsme = np.sqrt( np.mean( (super_final_oof_true[-1] - super_final_oof_prediction_nn[-1])**2.0 ) )
    print('NN RSME =',rsme,'\n')
    rsme = np.sqrt( np.mean( (super_final_oof_true[-1] - super_final_oof_prediction_svr[-1])**2.0 ) )
    print('SVR RSME =',rsme,'\n')
    
    w = 0.1
    oof2 = (1-w) * super_final_oof_prediction_nn[-1] + w * super_final_oof_prediction_svr[-1]
    rsme = np.sqrt( np.mean( (super_final_oof_true[-1] - oof2)**2.0 ) )
    print('Ensemble RSME =',rsme,'\n')
    
    

## Step 4. Make Submission

In [None]:
df_test = pd.read_csv(os.path.join(config.root, 'test.csv'))
test_id = df_test.index
df_test['Id'] = df_test["Id"].apply(lambda x: os.path.join(config.root, "test", x + ".jpg")) # we transform the Id to its image path

test_transform = T.Compose([
    T.Resize([224, 224]),
    T.ToTensor(),
    T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

test_set = PetfinderDataset(df_test, dense_features=dense_features, transform=test_transform)
test_loader = DataLoader(test_set, batch_size=config.val_batchsize, num_workers=4)

# get the testing prediction
test_pred_nn = np.zeros((df_test.shape[0]))
test_pred_svr = np.zeros((df_test.shape[0]))

for (model, clf) in zip(best_model_fold, svrs):
    for batch_idx, (data, features) in enumerate(test_loader):
        data, features = data.float().to(config.device), features.float().to(config.device)
        
        # NN prediction
        output = model(data, features).detach()
        if batch_idx == 0:
            pred_nn = output[:,0].sigmoid().to('cpu').numpy()* 100
        else:
            pred_nn = np.vstack((pred_nn, output[:,0].sigmoid().to('cpu').numpy()* 100))

        # SVR prediction
        pred = clf.predict(output[:,1:]).get().clip(0,100)
        if batch_idx == 0:
            pred_svr = pred
        else:
            pred_svr = np.vstack((pred_svr, pred))
        
    test_pred_nn += pred_nn
    test_pred_svr += pred_svr

# take the average over folds
w = 0.1
test_pred = (1-w) * test_pred_nn / len(best_model_fold) + w * test_pred_svr / len(svrs)


## SVR prediction
submission = pd.read_csv(os.path.join(config.root, 'test.csv'))[['Id']]
submission['Pawpularity'] = test_pred
submission

In [None]:
submission.to_csv('submission.csv', index=False)