# Shopee Product Classification 

Notebook to experiment over several Neural Networks over the product dataset obtained from Shopee and evaluate results.

The following models are evaluated as part of this notebook:

Baseline 1: CNN <br>
Baseline 2: CNN with augmented layers <br>
Improvement 1: Adding RNN <br>
Improvement 2: Adding attention

## Imports and Config

In [None]:
!pip install scikit-image
!pip install shopee_crawler
!pip install torchvision
!pip install opencv-python

In [1]:
# System
import importlib

# Visualization
import ipywidgets as widgets
import matplotlib.pyplot as plt

# Custom Modules
from model import trainer, dataset, baseline_cnn_1


import torch
import torch.nn as nn

torch.manual_seed(42)

<torch._C.Generator at 0x1147746b0>

## Dataset

In [3]:
data = dataset.DataSet(max_num_img=3000, crop=0.75, path='data/selected_images/')

In [4]:
data.load_all()

100%|██████████| 500/500 [00:02<00:00, 244.65it/s]
100%|██████████| 500/500 [00:02<00:00, 219.94it/s]
100%|██████████| 500/500 [00:02<00:00, 246.10it/s]
100%|██████████| 500/500 [00:02<00:00, 240.50it/s]
100%|██████████| 500/500 [00:02<00:00, 213.82it/s]
100%|██████████| 500/500 [00:02<00:00, 225.39it/s]
100%|██████████| 500/500 [00:02<00:00, 245.81it/s]
100%|██████████| 501/501 [00:01<00:00, 260.31it/s]
100%|██████████| 500/500 [00:02<00:00, 182.85it/s]


## CNN Baseline Model

In [20]:
batch_size = 32
num_epoch = 30

baseline_cnn_1_model = baseline_cnn_1.BaselineCNN1(len(data.categories))

criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(baseline_cnn_1_model.parameters(), lr=5e-4)

In [21]:
mtrainer = trainer.Trainer(baseline_cnn_1_model, optimizer, criterion, data, batch_size)
mtrainer.run_train(30)

[Epoch   0]: Training loss: 1.749399 | Accuracy: 0.373651
[Epoch   0]: Validation loss: 1.753244 | Accuracy: 0.368889 | Within 3: 0.697778
[Epoch   1]: Training loss: 1.580246 | Accuracy: 0.446032
[Epoch   1]: Validation loss: 1.652975 | Accuracy: 0.402222 | Within 3: 0.784444
[Epoch   2]: Training loss: 1.458521 | Accuracy: 0.501905
[Epoch   2]: Validation loss: 1.654066 | Accuracy: 0.444444 | Within 3: 0.777778
[Epoch   3]: Training loss: 1.409372 | Accuracy: 0.515556
[Epoch   3]: Validation loss: 1.477753 | Accuracy: 0.471111 | Within 3: 0.797778
[Epoch   4]: Training loss: 1.325175 | Accuracy: 0.555873
[Epoch   4]: Validation loss: 1.599549 | Accuracy: 0.453333 | Within 3: 0.746667
[Epoch   5]: Training loss: 1.271877 | Accuracy: 0.572063
[Epoch   5]: Validation loss: 2.488706 | Accuracy: 0.293333 | Within 3: 0.597778
[Epoch   6]: Training loss: 1.236835 | Accuracy: 0.581587
[Epoch   6]: Validation loss: 1.580380 | Accuracy: 0.453333 | Within 3: 0.773333
[Epoch   7]: Training loss:

In [22]:
test_loss, test_acc, top_k, incorect_stats = mtrainer.run_test(mtrainer.testloader, 3, True)
print(f'Accuracy of the network on the test images: {test_acc*100} %')
# print(f'Accuracy within top 3 results: {top_k*100} %')

# from collections import Counter

# counts = Counter(incorect_stats).most_common(25)
# for k, v in counts:
#     print(f"({data.categories[k[0]]}, {data.categories[k[1]]}): {v}")

Accuracy of the network on the test images: 65.8157602663707 %


## CCN Model with additional convolutional block

In [23]:
from model import deeper_cnn

deeper_model = deeper_cnn.DeeperCNN(len(data.categories))

criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(deeper_model.parameters(), lr=5e-4)

In [24]:
batch_size = 32
mtrainer = trainer.Trainer(deeper_model, optimizer, criterion, data, batch_size)
mtrainer.run_train(30)

[Epoch   0]: Training loss: 1.709312 | Accuracy: 0.401270
[Epoch   0]: Validation loss: 1.825302 | Accuracy: 0.413333 | Within 3: 0.717778
[Epoch   1]: Training loss: 1.469937 | Accuracy: 0.487937
[Epoch   1]: Validation loss: 1.711448 | Accuracy: 0.435556 | Within 3: 0.751111
[Epoch   2]: Training loss: 1.384629 | Accuracy: 0.521587
[Epoch   2]: Validation loss: 1.603133 | Accuracy: 0.500000 | Within 3: 0.800000
[Epoch   3]: Training loss: 1.272197 | Accuracy: 0.557778
[Epoch   3]: Validation loss: 1.531367 | Accuracy: 0.448889 | Within 3: 0.775556
[Epoch   4]: Training loss: 1.190783 | Accuracy: 0.587302
[Epoch   4]: Validation loss: 1.737269 | Accuracy: 0.473333 | Within 3: 0.731111
[Epoch   5]: Training loss: 1.131053 | Accuracy: 0.613651
[Epoch   5]: Validation loss: 1.513844 | Accuracy: 0.544444 | Within 3: 0.835556
[Epoch   6]: Training loss: 1.048434 | Accuracy: 0.638730
[Epoch   6]: Validation loss: 1.435403 | Accuracy: 0.540000 | Within 3: 0.837778
[Epoch   7]: Training loss:

In [25]:
test_loss, test_acc, top_k = mtrainer.run_test(mtrainer.testloader)
print(f'Accuracy of the network on the test images: {test_acc*100} %')

Accuracy of the network on the test images: 70.03329633740289 %


## CNN Model with additional convolutional block and skip connections

In [51]:
from model import baseline_cnn_2

cnn_model_2 = baseline_cnn_2.BaselineCNN2(len(data.categories))

criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(cnn_model_2.parameters(), lr=5e-4)

In [52]:
batch_size = 32
mtrainer = trainer.Trainer(cnn_model_2, optimizer, criterion, data, batch_size)
mtrainer.run_train(30)

[Epoch   0]: Training loss: 1.794897 | Accuracy: 0.351746
[Epoch   0]: Validation loss: 1.773898 | Accuracy: 0.371111 | Within 3: 0.708889
[Epoch   1]: Training loss: 1.622753 | Accuracy: 0.426667
[Epoch   1]: Validation loss: 1.651874 | Accuracy: 0.402222 | Within 3: 0.733333
[Epoch   2]: Training loss: 1.512305 | Accuracy: 0.476190
[Epoch   2]: Validation loss: 1.674292 | Accuracy: 0.386667 | Within 3: 0.717778
[Epoch   3]: Training loss: 1.434505 | Accuracy: 0.507302
[Epoch   3]: Validation loss: 1.642798 | Accuracy: 0.477778 | Within 3: 0.808889
[Epoch   4]: Training loss: 1.372783 | Accuracy: 0.526349
[Epoch   4]: Validation loss: 1.567974 | Accuracy: 0.477778 | Within 3: 0.782222
[Epoch   5]: Training loss: 1.325570 | Accuracy: 0.538730
[Epoch   5]: Validation loss: 1.582854 | Accuracy: 0.457778 | Within 3: 0.753333
[Epoch   6]: Training loss: 1.265975 | Accuracy: 0.566984
[Epoch   6]: Validation loss: 1.631676 | Accuracy: 0.448889 | Within 3: 0.773333
[Epoch   7]: Training loss:

In [53]:
test_loss, test_acc, top_k = mtrainer.run_test(mtrainer.testloader)
print(f'Accuracy of the network on the test images: {test_acc*100} %')

Accuracy of the network on the test images: 67.25860155382908 %
