# Product Classification: Baseline CNN Model and improvements

In this notebook, we will develop a model which can predict a product image's Shopee product category. 

The models are implemented in PyTorch. This notebook was run on a M1 Macbook Pro, with dependency versions as documented in environment.yml

## Data Setup

In [1]:
# If notebook is not at the root
# This is needed for this notebook to be runnable from the `notebooks` folder instead of the root.
import sys
sys.path.insert(0, '../')
image_dir = '../data/selected_images/'

In [2]:
from model import dataset

data = dataset.DataSet(max_num_img=500, crop=0.9, path=image_dir)
data.load_all()

# For all models, we will train with a batch size of 32
batch_size = 32
seed = 42
data_seed = 42

100%|██████████| 500/500 [00:01<00:00, 351.45it/s]
100%|██████████| 500/500 [00:01<00:00, 353.49it/s]
100%|██████████| 500/500 [00:01<00:00, 342.22it/s]
100%|██████████| 500/500 [00:01<00:00, 362.96it/s]
100%|██████████| 500/500 [00:01<00:00, 307.11it/s]
100%|██████████| 500/500 [00:01<00:00, 294.34it/s]
100%|██████████| 500/500 [00:01<00:00, 316.11it/s]
100%|██████████| 500/500 [00:01<00:00, 370.93it/s]
100%|██████████| 500/500 [00:01<00:00, 268.71it/s]


## Baseline Model
Since this is an image classification problem, we build a baseline model which uses a few stacked 3x3 convolutions for feature extraction. We have a final fully connected layer which takes in the final output from the convolutional layers and outputs a list of probabilities for each class. The exact structure is as follows: Conv3-16 -> Conv16-16 -> 2x2 maxPool -> Conv16-32 -> Conv32-32 -> 2x2 maxPool -> Conv32-64 -> Conv64-64 -> 2x2 maxPool -> Conv64-128 -> Conv128-128 -> 2x2 maxPool -> 1x1 adaptive average pool -> 128-[num classes] fc; where ConvA-B means the number of input channels is A, and the number of output channels is B. We apply ReLU then batch normalisation after each 3x3 convolution. 

We found in our experiments that applying batch normalisation after each convolution results in better performance. In addition, we found that placing ReLu before batch normalisation results in better performance than batch normalisation before ReLU. 

In [4]:
import torch
import torch.nn as nn

from model import trainer, baseline_cnn_1

torch.manual_seed(seed)

baseline_cnn_model = baseline_cnn_1.BaselineCNN1(len(data.categories))
# Use weight 
optimizer = torch.optim.Adam(baseline_cnn_model.parameters(), lr=4e-4, weight_decay=5e-4)
criterion = nn.CrossEntropyLoss()

# The trainer has methods to split the dataset into training (0.7), validation (0.1) and test (0.2) sets respectively.
# The seed is set accordingly for a consistent split.
mtrainer = trainer.Trainer(baseline_cnn_model, optimizer, criterion, data, batch_size, seed=data_seed, random_transform=False)

In [5]:
# Train the model for 30 epochs
mtrainer.run_train(30)

[Epoch   0]: Training loss: 1.752771 | Accuracy: 0.378730
[Epoch   0]: Validation loss: 1.782797 | Accuracy: 0.344444 | Within 3: 0.733333
[Epoch   1]: Training loss: 1.571027 | Accuracy: 0.445079
[Epoch   1]: Validation loss: 1.788241 | Accuracy: 0.362222 | Within 3: 0.711111
[Epoch   2]: Training loss: 1.461987 | Accuracy: 0.500952
[Epoch   2]: Validation loss: 1.546405 | Accuracy: 0.482222 | Within 3: 0.766667
[Epoch   3]: Training loss: 1.398670 | Accuracy: 0.522540
[Epoch   3]: Validation loss: 2.105829 | Accuracy: 0.353333 | Within 3: 0.628889
[Epoch   4]: Training loss: 1.333582 | Accuracy: 0.550159
[Epoch   4]: Validation loss: 1.388717 | Accuracy: 0.508889 | Within 3: 0.826667
[Epoch   5]: Training loss: 1.273665 | Accuracy: 0.574286
[Epoch   5]: Validation loss: 1.526939 | Accuracy: 0.484444 | Within 3: 0.831111
[Epoch   6]: Training loss: 1.218312 | Accuracy: 0.602222
[Epoch   6]: Validation loss: 1.427212 | Accuracy: 0.495556 | Within 3: 0.800000
[Epoch   7]: Training loss:

In [6]:
_, test_acc, _ = mtrainer.run_test(mtrainer.testloader, 3)
print(f'Accuracy of the best model on the test images: {test_acc*100} %') # Best model within 30 epochs

_, test_acc, _ = mtrainer.run_test(mtrainer.testloader, model=mtrainer.model)
print(f'Accuracy of the fully trained model on the test images: {test_acc*100} %') # Model after 30 epochs

Accuracy of the best model on the test images: 69.22222222222221 %
Accuracy of the fully trained model on the test images: 66.55555555555556 %


We found that the model tends to overfit the data after about 20 epochs. 

## Improving the baseline model
We define a ConvBlock[A][B] group as a set of 3x3 ConvA-B->ConvB-B convolution blocks.
We try improving the baseline model either by adding an additional ConvBlock[128][256] group, or stacking additional 3x3 convolution blocks with the same number of channels. The former is roughly corresponds to increasing the feature representation ability of the network while the latter is roughly equivalent to increasing the receptive field of the convolution group.

We found that adding the additional ConvBlock[128][256] group improves the performance of the model, but stacking additional 3x3 convolution blocks with the same number of channels reduces the performance of the model. (Need to confirm )

We also tried increasing the depth of the classification layer, but found that it does not improve the performance of the model.

#### The model with an additional ConvBlock
This means adding an additional Conv128-256 -> Conv256-256 -> 2x2 maxPool to the model's feature extractor

In [8]:
import torch
import torch.nn as nn

from model import trainer, deeper_cnn

torch.manual_seed(seed)

deeper_model = deeper_cnn.DeeperCNN(len(data.categories))

criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(deeper_model.parameters(), lr=4e-4, weight_decay=5e-4)

mtrainer = trainer.Trainer(deeper_model, optimizer, criterion, data, batch_size, seed=data_seed, random_transform=False)

In [9]:
mtrainer.run_train(30)

[Epoch   0]: Training loss: 1.682192 | Accuracy: 0.406349
[Epoch   0]: Validation loss: 1.719534 | Accuracy: 0.380000 | Within 3: 0.720000
[Epoch   1]: Training loss: 1.491947 | Accuracy: 0.480635
[Epoch   1]: Validation loss: 1.794454 | Accuracy: 0.408889 | Within 3: 0.726667
[Epoch   2]: Training loss: 1.372921 | Accuracy: 0.533651
[Epoch   2]: Validation loss: 1.379984 | Accuracy: 0.482222 | Within 3: 0.800000
[Epoch   3]: Training loss: 1.229030 | Accuracy: 0.572063
[Epoch   3]: Validation loss: 1.678931 | Accuracy: 0.464444 | Within 3: 0.757778
[Epoch   4]: Training loss: 1.170719 | Accuracy: 0.590159
[Epoch   4]: Validation loss: 1.657043 | Accuracy: 0.471111 | Within 3: 0.742222
[Epoch   5]: Training loss: 1.081296 | Accuracy: 0.629841
[Epoch   5]: Validation loss: 1.324579 | Accuracy: 0.564444 | Within 3: 0.813333
[Epoch   6]: Training loss: 1.029254 | Accuracy: 0.646349
[Epoch   6]: Validation loss: 1.389609 | Accuracy: 0.611111 | Within 3: 0.851111
[Epoch   7]: Training loss:

In [10]:
_, test_acc, _ = mtrainer.run_test(mtrainer.testloader, 3)
print(f'Accuracy of the best model on the test images: {test_acc*100} %') # Best model within 30 epochs

_, test_acc, _ = mtrainer.run_test(mtrainer.testloader, model=mtrainer.model)
print(f'Accuracy of the fully trained model on the test images: {test_acc*100} %') # Model after 30 epochs

Accuracy of the best model on the test images: 69.0 %
Accuracy of the fully trained model on the test images: 67.77777777777779 %


### Model with additional 3x3 convolutional block