# T5 - AlexNet - Pawpularity

This is an ongoing CV-focused kaggle contest (3 months to go from now, Oct, 2021). And you are getting the chance to win a cash prize!

In this contest, you will help the website [PetFinder.my](https://petfinder.my/) to give "Pawpularity" scores to pet photos, which will help them find their homes faster.

The "Pawpularity" scores in the trainning set is derived from each pet profile's page view statistics at the listing pages, using an algorithm that normalizes the traffic data across different pages, platforms and various metrics.

`Metadata`
* For each image, you are provided optional metadata, manually labeling each photo for key visual quality and composition parameters.

* These labels are not used for deriving our Pawpularity score, but it may be beneficial for better understanding the content and co-relating them to a photo's attractiveness. Our end goal is to deploy AI solutions that can generate intelligent recommendations (i.e. show a closer frontal pet face, add accessories, increase subject focus, etc) and automatic enhancements (i.e. brightness, contrast) on the photos, so we are hoping to have predictions that are more easily interpretable.

* You may use these labels as you see fit, and optionally build an intermediate / supplementary model to predict the labels from the photos. If your supplementary model is good, we may integrate it into our AI tools as well.

* In our production system, new photos that are dynamically scored will not contain any photo labels. If the Pawpularity prediction model requires photo label scores, we will use an intermediary model to derive such parameters, before feeding them to the final model.

`Evaluation Metrics`

$$RMSE = \sqrt{\frac{1}{n}\sum_{i=1}^n (y_i - \hat{y}_i)^2}$$

[PetFinder.my - Pawpularity Contest](https://www.kaggle.com/c/petfinder-pawpularity-score/overview/description)

[Reference](https://www.kaggle.com/phalanx/train-swin-t-pytorch-lightning/notebook)

In this tutorial, I will build the pipeline and use AlexNet as a demo. When you submit the code to Kaggle, you may encounter error even though you can successfully run and save the notebook. Please refer to Discussion for more information.

## AlexNet (2012) 
https://papers.nips.cc/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf

#### History

AlexNet was primarily designed by Alex Krizhevsky. It was published with Ilya Sutskever and Krizhevsky’s doctoral advisor Geoffrey Hinton, and is a Convolutional Neural Network or CNN.

After competing in ImageNet Large Scale Visual Recognition Challenge, AlexNet shot to fame. It achieved a top-5 error of 15.3%. This was 10.8% lower than that of runner up. 

The primary result of the original paper was that the depth of the model was absolutely required for its high performance. This was quite expensive computationally but was made feasible due to GPUs or Graphical Processing Units, during training.



#### Architecture

![architecture](https://miro.medium.com/max/1400/1*5bnqbGcBSLzaNMsz5dHkfg.png)

#### Technique

1. ReLU non-linearity:

    Using ReLU non-linearity, AlexNet shows us that deep CNN’s can be trained much faster with the help of saturating activation functions such as Tanh or Sigmoid. The figure shown below shows us that with the help of ReLUs(solid curve), AlexNet can achieve a 25% training error rate. This is six times faster than an equivalent network that uses tanh(dotted curve). This was tested on the CIFAR-10 dataset.
    
    ![ReLU v.s. tanh](https://lh3.googleusercontent.com/yFU1a9lC3c9crWLNm_48DHbkUaWCP2ikC-GdndD58mseiZ4qQVLLnXTSoWJu8cEFTjv8xVMmjNOLz9h8y88-J1dog0vOdZdjNRsDjI1PGeXfx_-zmcnMf9XRGiMhqJWeDu80hDyy)
    

2. Overlapping Pooling:  

    CNNs traditionally “pool” outputs of neighboring groups of neurons with no overlapping. However, when the authors introduced overlap, they saw a reduction in error by about 0.5% and found that models with overlapping pooling generally find it harder to overfit. (the figure below is an example of pooling. For Alexnet's overlap pooling, they use a kernel size=3 and stride=2)

    ![max-pool](https://lh4.googleusercontent.com/zAsVIGQRrXN-RQxroXCDXrdhSAMim7MsAdUJja2JV3j5zZAFT7TobX_F85SF2m3y9ifLJaNv8x3LztDvRg4TW30HzX1kQ1PQoZNNEXSDS46jd6nnmNJyLEjxmxZDtI2_Lh4nV8g_)
    

3. Data Augmentation: 

    When you show a Neural Net different variation of the same image, it helps prevent overfitting. It also forces the Neural Net to memorize the key features and helps in generating additional data. 
    
    Ramdom Flip: 
    ![Ramdom Flip](https://lh5.googleusercontent.com/FyGSIFFcz5Rn6KIpxBoIXeDd5zeSjDSbW5uijKPF26vlVHICeVUQ5FEHryWYTnzFdc4UjWrvtLRBAcqOhgpbQ60cinAZnCX8uTKEcvLESDo4fG9VSumOIlXZyiC9FY-JcwLNjYb7)
    
    Random Crop: 
    ![Random Crop](https://lh4.googleusercontent.com/LsJ4ckx-M2t-21f-d0gL0UxjvO7EWWuyrktRtwYhQd19naspFYHWF_uoYwYzWqfAkgM-isJpsYmyeVRiOGyyfKBq7X84_PL1qX5bc-dG6Tz4CbF-FIOFXa_562iunhnSWWNJkXGH)
    
    ColorJitter: 
    ![ColorJitter](https://pytorch.org/vision/stable/_images/sphx_glr_plot_transforms_006.png)
    
    RandomAffine:
    ![RandomAffine](https://pytorch.org/vision/stable/_images/sphx_glr_plot_transforms_010.png)
    
4. Dropout: During dropout, a neuron is dropped from the Neural Network with a probability of `rate`. When a neuron is dropped, it does not contribute to forward propagation or backward propagation. Every input goes through a different Neural Network architecture, as shown in the image below. As a result, the learned weight parameters are more robust and do not get overfitted easily.

    ![Dropout](https://lh4.googleusercontent.com/i4wnkHE5-KTEYAW4M8SbCMNtzYGirIpkG1XaY1t9tqBbrHTLzHeELOij2_ySJ0sfCMdPwGK2wZXr9_bsnBNhES7mvDmRB2q-8keTwyuC9pk1CFyLswH7ciajlgydPFNaoR4sQPIW)
    
    
reference: https://www.mygreatlearning.com/blog/alexnet-the-first-cnn-to-win-image-net/

reference: https://towardsdatascience.com/alexnet-the-architecture-that-challenged-cnns-e406d5297951

In [None]:
# install package by adding dataset
# you can find the torchsummary package in dataset: https://www.kaggle.com/truthr/torchsummary
# and I also upload the python-box package in dataset: https://www.kaggle.com/zhicongliang/pythonbox
# then you can add these datasets to your notebook
!pip install ../input/torchsummary/torchsummary-1.5.1-py3-none-any.whl
!pip install ../input/pythonbox/python_box-5.4.1-py3-none-any.whl

In [None]:
import os
import pandas as pd
import numpy as np
import tqdm
from PIL import Image
import copy

from sklearn.model_selection import StratifiedKFold

from box import Box

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import DataLoader, Dataset
import torchvision.transforms as T
#from torchvision.io import read_image # this require the latest torchvision version

import matplotlib.pyplot as plt
plt.style.use('seaborn')

## Step 0. Configuration

Here we define a dictionary to store our parameters.

In [None]:
config = {
    'root': '../input/petfinder-pawpularity-score/',
    'device': 'cuda', # 'cpu' for cpu, 'cuda' for gpu
    'n_splits': 5,
    'seed': 2021,
    'train_batchsize': 128,
    'val_batchsize': 128,
    'epoch': 20,
    'learning_rate': 0.01,
    'logger_interval': 1,
    'milestone': [10, 15],
    'gamma': 0.1,
}

# transform key to attribute. it will be easier for us to refer to these parameters later
config = Box(config)

### Step 1. Load the data

If we are using dataset like cifar10, mnist, svhn and etc., we can directly use torchvision.datasets. However, if you would like to use our own data, we need to constrcut a custom Dataset that will help us load the data and perform some basic transformations.

The most important functions of a custom Dataset is `__len__` and `__getitem__`.

The `__len__` function will return the number of elements in this dataset, while `__getitem__` will return an image-label pair that can be accepted by pytorch given an index.

In [None]:
# define Custom Dataset with pytorch
class PetfinderDataset(Dataset):

    def __init__(self, df, image_size=224, transform=None):
        self._X = df["Id"].values
        self._y = None
        if "Pawpularity" in df.keys():
            self._y = df["Pawpularity"].values
        if not transform:
            # we resize all the image to the same size
            self._transform = T.Compose([
                T.Resize([image_size, image_size]),
                T.ToTensor(), # transform the PIL image type to torch.tensor
            ])
        else:
            self._transform = transform

    def __len__(self):
        return len(self._X)

    def __getitem__(self, idx):
        image_path = self._X[idx]
        # given the index(path), read the raw image, and then transform it
        # image = read_image(image_path)  # this require the latest torchvision version
        image = Image.open(image_path)
        image = self._transform(image)
        # if we have label, then we return the image-label pair (for training)
        # if not, we directly return the image (for testing)
        if self._y is not None:
            label = self._y[idx]
            return image, label
        return image

In [None]:
df = pd.read_csv(os.path.join(config.root, 'train.csv'))
df['Id'] = df["Id"].apply(lambda x: os.path.join(config.root, "train", x + ".jpg")) # we transform the Id to its image path

train_val_set = PetfinderDataset(df)

print('# of data:', len(df))
print('range of label [{}, {}]'.format(df['Pawpularity'].min(), df['Pawpularity'].max()))

In [None]:
# we show some images here
plt.figure(figsize=(12, 12))
for idx  in range(16):
    image, label = train_val_set.__getitem__(idx)
    plt.subplot(4, 4, idx+1)
    plt.imshow(image.permute(1, 2, 0));
    plt.axis('off')
    plt.title('Pawpularity: {}'.format(label))

## Step 2. Define AlexNet



In [None]:
# AlexNet model architecture from the One weird trick...
# <https://arxiv.org/abs/1404.5997>`_ paper.
class AlexNet(nn.Module):

    def __init__(self, droprate=0.2):
        super(AlexNet, self).__init__()

        self.features = nn.Sequential(
            ## conv2d output_shape = (image_shape-kernel_shape+2*padding)/stride + 1
            nn.Conv2d(3, 64, kernel_size=11, stride=4, padding=2),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
            nn.Conv2d(64, 192, kernel_size=5, padding=2),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
            nn.Conv2d(192, 384, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(384, 256, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(256, 256, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2)
        )
        self.avgpool = nn.AdaptiveAvgPool2d((6, 6))
        self.classifier = nn.Sequential(
            nn.Dropout(droprate),
            nn.Linear(256 * 6 * 6, 4096),
            nn.ReLU(inplace=True),
            nn.Dropout(droprate),
            nn.Linear(4096, 4096),
            nn.ReLU(inplace=True),
            nn.Linear(4096, 1),
        )

        # weight initialization
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
                if m.bias is not None:
                    nn.init.constant_(m.bias, 0)
            elif isinstance(m, nn.BatchNorm2d):
                nn.init.constant_(m.weight, 1)
                nn.init.constant_(m.bias, 0)
            elif isinstance(m, nn.Linear):
                nn.init.normal_(m.weight, 0, 0.01)
                nn.init.constant_(m.bias, 0)

    def extract_features(self, inputs):
        """ Returns output of the final convolution layer """
        x = self.features(inputs)
        return x

    def forward(self, inputs):
        # See note [TorchScript super()]
        x = self.features(inputs)
        x = self.avgpool(x)
        x = torch.flatten(x, 1)
        x = self.classifier(x)
        return x

In [None]:
import torchsummary
# here we show the summary of the model
model = AlexNet()
torchsummary.summary(model, (3,224,224), device='cpu')

## Step 3. Train Our Model with Cross Validation

In [None]:
def test(model, test_loader):
    model.eval() # turn model into evaluation mode
    test_loss = 0
    correct = 0
    with torch.no_grad():
        for data, target in test_loader:
            data, target = data.to(config.device), target.float().to(config.device)/100.
            output = model(data)
            test_loss += F.mse_loss(output.view(-1), target.view(-1), reduction='sum').item()  # sum up batch loss

    test_loss /= len(test_loader.dataset)
    return np.sqrt(test_loss) # RMSE 

In [None]:
# we split the dataset into for cross-validation
# here we treat the label "Pawpularity" as categorical data, and use the StratifiedKfol Function
# actually it is numerical data
skf = StratifiedKFold(
    n_splits=config.n_splits, shuffle=True, random_state=config.seed
)

In [None]:
# we keep record of the training in each fold
train_losses_fold = []
val_losses_fold = []
best_model_fold = []

for fold, (train_idx, val_idx) in enumerate(skf.split(df["Id"], df["Pawpularity"])):
    
    print('================================ CV fold {} ================================'.format(fold))
    
    train_df = df.loc[train_idx].reset_index(drop=True)
    val_df = df.loc[val_idx].reset_index(drop=True)
    
    # we would like to do some random transformation to our training data such that
    # our model can be more rubost against different patterns in out-of-sample data
    train_transform = T.Compose([
        T.Resize([224, 224]), # crop the image size to 3*224*224
        T.RandomHorizontalFlip(), # random flip the image horizontally
        T.RandomVerticalFlip(), # random flip the image vertically
        T.RandomAffine(15, translate=(0.1, 0.1), scale=(0.9, 1.1)), # Random affine transformation of the image keeping center invariant.
        T.ColorJitter(brightness=0.1, contrast=0.1, saturation=0.1), # randomly changes the brightness, saturation, and other properties of an image
        T.ToTensor(),
        T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
    ])
    # in validation set, we only convert our data to torch.float and do a normalization
    val_transform = T.Compose([
        T.Resize([224, 224]),
        T.ToTensor(),
        T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
    ])
    
    train_set = PetfinderDataset(train_df, transform=train_transform)
    val_set = PetfinderDataset(val_df, transform=val_transform)
    
    # then we define the dataloader for training and validation
    # it tells the machine how to sample from our training/validation set
    train_loader = DataLoader(train_set, batch_size=config.train_batchsize, num_workers=4)
    val_loader = DataLoader(val_set, batch_size=config.val_batchsize, num_workers=4)
    
    model = AlexNet().to(config.device) # use GPU to accelerate the training. Kaggle gives us 30h every week.
    optimizer = optim.SGD(model.parameters(), lr=config.learning_rate)
    # we decay the learning rate by factor gamma=0.1 when we reach each milestone epoch
    scheduler = optim.lr_scheduler.MultiStepLR(optimizer, milestones=config.milestone, gamma=config.gamma)
    criterion = nn.MSELoss(reduction='sum')
    
    train_losses = []
    val_losses = []
    
    best_val_loss = np.inf
    best_model = None
    
    for epoch in range(config.epoch):
        print('\t=================== Epoch {} ==================='.format(epoch))
        
        model.train() # turn model into training mode
        batch_train_loss = []
        
        # iterate each bactch to update the model
        for batch_idx, (data, target) in tqdm.tqdm(enumerate(train_loader), total=len(train_loader)):
            data, target = data.to(config.device), target.float().to(config.device) / 100. # we transform the label to [0,1]
            
            optimizer.zero_grad() # very important. without this step, grad will accumulate
            output = model(data)
            
            loss = criterion(output.view(-1), target.view(-1)) 
            rmse = torch.sqrt(loss/len(target)) # rmse loss
            rmse.backward()
            optimizer.step() # update the model by the gradient
            
            batch_train_loss.append(loss.item())
        
        if epoch % config.logger_interval == 0:
            train_loss = np.sqrt(np.sum(batch_train_loss)/len(train_loader.dataset)) * 100 # don't forget the final pawpularity range in [1,100]
            val_loss = test(model, val_loader) * 100
            
            train_losses.append(train_loss)
            val_losses.append(val_loss)
            
            print('\t\t train loss: {:.4f}'.format(train_loss))
            print('\t\t val loss: {:.4f} -- best loss: {:.4f}'.format(val_loss, best_val_loss))
            
            # if we get a lower validation loss, then we record the model
            if val_loss < best_val_loss:
                best_val_loss = val_loss
                best_model = copy.deepcopy(model)
                  
        scheduler.step()
    
    train_losses_fold.append(train_losses)
    val_losses_fold.append(val_losses)
    best_model_fold.append(best_model)
             
    

### how to save our model in kaggle

1. Save the model by using model.save("model_name.h5") or other similar command. (Make sure to use .h5 extension. That would create a single file for your saved model.) Using this command will save your model in your notebook's memory.

2. Save your notebook by going to Advanced Settings and select Always save output. Hit Save and then select Quick Save if you want your notebook to get saved as it is or otherwise it will run all your notebook and then save it (which might take long depending on your model training phase etc.)

3. Go to notebook viewer (the saved notebook). Go to Output of notebook and create a private (or even public) dataset for that model.

4. Then load that dataset into your any notebook. You can load the model by using model = tf.keras.models.load_model("..input/dataset_name/model_name.h5"). You can even download the model file from dataset for offline purposes.

I did not try the method above. It is just for your reference. https://www.kaggle.com/questions-and-answers/92749

In [None]:
## save the models

for fold, model in enumerate(best_model_fold):
    torch.save(model.state_dict(), 'alexnet_fold_{}.h5'.format(fold))

In [None]:
# # my pretrained alexnet: https://www.kaggle.com/zhicongliang/pawpularityalenext
# ## load the models
# best_model_fold = []
# for fold in range(5):
#     model = AlexNet().to(config.device)
#     model.load_state_dict(torch.load('../input/pawpularityalenext/alexnet_fold_{}.h5'.format(fold), map_location=torch.device(config.device)))
#     best_model_fold.append(model)

## step 4. Visualize the training/validation curve

In [None]:
train_losses = np.array(train_losses_fold)
val_losses = np.array(val_losses_fold)

In [None]:
index = range(0, config.epoch, config.logger_interval)
fig = plt.figure(figsize=(8,6))
plt.plot(index, train_losses.mean(axis=0), label='Training Loss')
plt.plot(index, val_losses.mean(axis=0), label='Validation Loss')
plt.legend(fontsize=15)
plt.xlabel('Epoch', fontsize=15)

## step 5. Make submission

In [None]:
df_test = pd.read_csv(os.path.join(config.root, 'test.csv'))
test_id = df_test.index
df_test['Id'] = df_test["Id"].apply(lambda x: os.path.join(config.root, "test", x + ".jpg")) # we transform the Id to its image path

test_transform = T.Compose([
    T.Resize([224, 224]),
    T.ToTensor(),
    T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

test_set = PetfinderDataset(df_test, transform=test_transform)
test_loader = DataLoader(test_set, batch_size=config.val_batchsize, num_workers=4)

# get the testing prediction
test_pred = np.zeros((df_test.shape[0],1))

for model in best_model_fold:
    for batch_idx, data in enumerate(test_loader):
        data = data.to(config.device)
        output = model(data)
        if batch_idx == 0:
            preds = output.detach().to('cpu').numpy()* 100
        else:
            preds = np.vstack((preds, output.detach().to('cpu').numpy()* 100))

    test_pred += preds

# take the average over folds
test_pred = test_pred / len(best_model_fold)

submission = pd.read_csv(os.path.join(config.root, 'test.csv'))[['Id']]
submission['Pawpularity'] = test_pred
submission

In [None]:
submission.to_csv('submission.csv', index=False)

### Conclusion: 
This notebook gives a score 20.80063, which ranks 770/848 in the leaderboard. (Oct. 12, 2021) 