# Goal
The goal of this notebook is to show my approach for the Kaggle data competition and some tricks I used to achieve a good rank. We used $1500$ $28 \times 28$ images to predict the expected class of $60000$ images (there are $6$ classes). I went through a lot of experimentation but I will summarize the most important ones.

In [3]:
# Import all the necessary libraries
import torch
import torchvision.transforms as transforms
import torchvision.models as models
import torch.nn as nn
import torch.optim as optim
import numpy as np
import time
import copy
from sklearn.model_selection import train_test_split
import pandas as pd
from PIL import Image
from sklearn.metrics import confusion_matrix
import pytorch_cnn_2

In [4]:
# Detect if we have a GPU available
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

# Data preparation
I did not spend too much time in the data preparation eventhough it is an important part, I mainly normalized the images and as a validation technique I just kept $30%$ of the data aside.

In [5]:
# Load and prepare the train data
train = np.load('data/train.npz')
X_train, X_val, y_train, y_val = train_test_split(train['arr_0'], train['arr_1'], test_size=0.3, random_state=42, stratify=train['arr_1'])
mean = (X_train / 255.).mean()
std = (X_train / 255.).std()

# Reshape 
X_train = X_train.reshape(-1,28,28).astype(int)
X_val = X_val.reshape(-1,28,28).astype(int)

# Model Initialization
We use the methods implemented in pytorch_cnn_2 to initialize the model we use, I suggest you check it as it is really a very complete function

In [6]:
# Models to choose from [resnet, alexnet, vgg, squeezenet, densenet, inception]
model_name = "resnet"

# Number of classes in the dataset
num_classes = 6

# Batch size for training (change depending on how much memory you have) and val/testing
batch_size = 16
val_test_batch_size = 32

# Number of epochs to train for
num_epochs = 15

# Flag for feature extracting. When False, we finetune the whole model,
#   when True we only update the reshaped layer params
feature_extract = False

In [7]:
# Initialize the model for this run
model_ft, input_size = pytorch_cnn_2.initialize_model(model_name, num_classes, feature_extract, use_pretrained=True)

# Send the model to GPU
model_ft = model_ft.to(device)

print("Initializing Datasets and Dataloaders...")

# Gather the parameters to be optimized/updated in this run. If we are
#  finetuning we will be updating all parameters. However, if we are
#  doing feature extract method, we will only update the parameters
#  that we have just initialized, i.e. the parameters with requires_grad
#  is True.

params_to_update = model_ft.parameters()
print("Params to learn:")
if feature_extract:
    params_to_update = []
    for name,param in model_ft.named_parameters():
        if param.requires_grad == True:
            params_to_update.append(param)
            print("\t",name)
else:
    for name,param in model_ft.named_parameters():
        if param.requires_grad == True:
            print("\t",name)

Initializing Datasets and Dataloaders...
Params to learn:
	 conv1.weight
	 bn1.weight
	 bn1.bias
	 layer1.0.conv1.weight
	 layer1.0.bn1.weight
	 layer1.0.bn1.bias
	 layer1.0.conv2.weight
	 layer1.0.bn2.weight
	 layer1.0.bn2.bias
	 layer1.1.conv1.weight
	 layer1.1.bn1.weight
	 layer1.1.bn1.bias
	 layer1.1.conv2.weight
	 layer1.1.bn2.weight
	 layer1.1.bn2.bias
	 layer2.0.conv1.weight
	 layer2.0.bn1.weight
	 layer2.0.bn1.bias
	 layer2.0.conv2.weight
	 layer2.0.bn2.weight
	 layer2.0.bn2.bias
	 layer2.0.downsample.0.weight
	 layer2.0.downsample.1.weight
	 layer2.0.downsample.1.bias
	 layer2.1.conv1.weight
	 layer2.1.bn1.weight
	 layer2.1.bn1.bias
	 layer2.1.conv2.weight
	 layer2.1.bn2.weight
	 layer2.1.bn2.bias
	 layer3.0.conv1.weight
	 layer3.0.bn1.weight
	 layer3.0.bn1.bias
	 layer3.0.conv2.weight
	 layer3.0.bn2.weight
	 layer3.0.bn2.bias
	 layer3.0.downsample.0.weight
	 layer3.0.downsample.1.weight
	 layer3.0.downsample.1.bias
	 layer3.1.conv1.weight
	 layer3.1.bn1.weight
	 layer3.1.bn1.

# Choose an optimizer and a loss function
The model training and generalization can be greatly affected by the choice of optimizer, after trying SGD, Adam and Adagram the first one was the best performing so I will keep it. For the loss function the most simple one to use for multiclass classification is the Cross Entropy loss, other ones require one hot encoding I believe but anyways we will stick with this one.

In [8]:
# Define an optimizer
optimizer_ft = optim.SGD(params_to_update, lr=0.001, momentum=0.9)

# Define a scheduler in case you want a dynamic learnign rate
#scheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer_ft, mode='max', patience=10, factor=0.7)

# Setup the loss fxn
criterion = nn.CrossEntropyLoss()

# Dataloading and preprocessing
Pytorch offers a complete and simple data preprocessing methods through torchvision.transforms and the simplest way to get access to it is through Dataloader class which takes Dataset class as an input, the problem is that the nature of our dataset makes it difficult for uss to use this tools. The easiest way to do this is to transform our data to images and save them, one elegant way I found to do things is to create a custom Dataset class, the idea is not mine and link is in the readme file. I will not get into the details of preprocessing as it is wildely available and easy to search for.

In [9]:
# The preprocessing we will apply
train_transforms = transforms.Compose([
        transforms.Resize(255),
        transforms.RandomResizedCrop(input_size),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize([mean, mean, mean], [std, std, std])
    ])
    
val_transforms = transforms.Compose([
        transforms.Resize(input_size),
        transforms.ToTensor(),
        transforms.Normalize([mean, mean, mean], [std, std, std])
    ])
    
test_transforms = transforms.Compose([
        transforms.Resize(input_size),
        transforms.ToTensor(),
        transforms.Normalize([mean, mean, mean], [std, std, std])
    ])

In [10]:
# Create a class that inherit the Dataset class
class DatasetDraws(torch.utils.data.Dataset):
    
    def __init__(self, X, y, transform=None):
        self.X = X
        if y.all() != None:
            self.y = y
        self.transform = transform
        
    def __len__(self):
        return len(self.X)
    
    def __getitem__(self, index):
        image = Image.fromarray(self.X[index]).convert('RGB')
        label = self.y[index]
        if self.transform is not None:
            image = self.transform(image)
            
        return image, label

In [11]:
# torch Dataset
train_dataset = DatasetDraws(X_train, y_train, transform=train_transforms)
val_dataset = DatasetDraws(X_val, y_val, transform=val_transforms)

# torch Dataloaders
train_load = torch.utils.data.DataLoader(train_dataset, batch_size = batch_size, shuffle = True)
val_load = torch.utils.data.DataLoader(val_dataset, batch_size = val_test_batch_size, shuffle = True)
dataloaders = {'train': train_load, 'val':val_load}

# Training the model
The training function available in pytorch_cnn.py is really complete, it includes transfere learning (feature extraction and use of pretrained weights)

In [12]:
# Train and evaluate
model_ft, hist = pytorch_cnn_2.train_model(model_ft, dataloaders, device, criterion, optimizer_ft, num_epochs=num_epochs, is_inception=(model_name=="inception"))

Epoch 0/14
----------
train Loss: 1.5831 Acc: 0.3457
val Loss: 1.1242 Acc: 0.5800
Epoch 1/14
----------
train Loss: 1.1277 Acc: 0.5781
val Loss: 0.6870 Acc: 0.7556
Epoch 2/14
----------
train Loss: 0.9463 Acc: 0.6495
val Loss: 0.6613 Acc: 0.7711
Epoch 3/14
----------
train Loss: 0.9498 Acc: 0.6562
val Loss: 0.5852 Acc: 0.8044
Epoch 4/14
----------
train Loss: 0.8045 Acc: 0.7000
val Loss: 0.5566 Acc: 0.8022
Epoch 5/14
----------
train Loss: 0.8127 Acc: 0.7076
val Loss: 0.5620 Acc: 0.8089
Epoch 6/14
----------
train Loss: 0.7411 Acc: 0.7276
val Loss: 0.5602 Acc: 0.7933
Epoch 7/14
----------
train Loss: 0.7545 Acc: 0.7219
val Loss: 0.4767 Acc: 0.8511
Epoch 8/14
----------
train Loss: 0.7321 Acc: 0.7476
val Loss: 0.4700 Acc: 0.8422
Epoch 9/14
----------
train Loss: 0.6762 Acc: 0.7524
val Loss: 0.5030 Acc: 0.8267
Epoch 10/14
----------
train Loss: 0.6832 Acc: 0.7533
val Loss: 0.4317 Acc: 0.8467
Epoch 11/14
----------
train Loss: 0.6196 Acc: 0.7781
val Loss: 0.4154 Acc: 0.8622
Epoch 12/14
--

There is still the problem of the $30\%$ data left out for validation for which I came up with an idea that did not result in great outcome, basically after finishing the training we can use the all the dataset to run another training epoch with a small learning rate to prevent the model from overfitting.

# Best approach
My best approach yet consisted on ensembling three CNNs, I found the model on the pytorch forum and the link is in the description

In [14]:
class MyEnsemble(nn.Module):
    def __init__(self, modelA, modelB, modelC, nb_classes=6):
        super(MyEnsemble, self).__init__()
        self.modelA = modelA
        self.modelB = modelB
        self.modelC = modelC
        
        # Remove last linear layer
        self.modelA.fc = nn.Identity()
        self.modelB.fc = nn.Identity()
        self.modelC.fc = nn.Identity()
        
        # Create new classifier
        self.classifier = nn.Linear(18, nb_classes)
        
    def forward(self, x):
        x1 = self.modelA(x.clone())  # clone to make sure x is not changed by inplace methods
        x1 = x1.view(x1.size(0), -1)
        x2 = self.modelB(x.clone())
        x2 = x2.view(x2.size(0), -1)
        x3 = self.modelC(x.clone())
        x3 = x3.view(x3.size(0), -1)
        x = torch.cat((x1, x2, x3), dim=1)
        x = self.classifier(F.relu(x))
        return x

# Initialize three CNNs
modelA, input_size = pytorch_cnn_2.initialize_model('resnet', 6, False, use_pretrained=True)
modelB, input_size = pytorch_cnn_2.initialize_model('resnet', 6, False, use_pretrained=True)
modelC, input_size = pytorch_cnn_2.initialize_model('resnet', 6, False, use_pretrained=True)

# Create ensemble model    
model = MyEnsemble(modelA, modelB, modelC, 6)

You just need to send it to GPU and run it. I will not run the model now because I believe it is not the best way to run it. An ensemble consists on training multiple models on different subsets of the data whereas here we are using the same data, I just wanted to show all my approaches

# Ideas for improvement
I had a bunch of ideas I wanted to try but did not have the time and sometimes the skills. A pretty simple approach was to try the Mixup technique which is a data augmentation technique that uses one hot encoding. Another technique is to use GANs for data augmentation but I am not sure if it will work as I don't have any experience with GANs. Finally, finding a better validation method could surely lead to better results.