# Shopee Product Classification 

Notebook to experiment over several Neural Networks over the product dataset obtained from Shopee and evaluate results.

The following models are evaluated as part of this notebook:

Baseline 1: CNN <br>
Baseline 2: CNN with augmented layers <br>
Improvement 1: Adding RNN <br>
Improvement 2: Adding attention

## Imports and Config

In [None]:
!pip install scikit-image
!pip install shopee_crawler
!pip install torchvision
!pip install opencv-python

In [55]:
# System
import importlib


import torch
import torch.nn as nn

## Dataset

In [29]:
from model import dataset

data = dataset.DataSet(max_num_img=3000, crop=0.8, path='data/selected_images/')

In [30]:
data.load_all()

100%|██████████| 500/500 [00:01<00:00, 327.40it/s]
100%|██████████| 500/500 [00:01<00:00, 330.89it/s]
100%|██████████| 500/500 [00:01<00:00, 311.64it/s]
100%|██████████| 500/500 [00:01<00:00, 336.41it/s]
100%|██████████| 500/500 [00:01<00:00, 263.49it/s]
100%|██████████| 500/500 [00:01<00:00, 260.43it/s]
100%|██████████| 500/500 [00:01<00:00, 262.13it/s]
100%|██████████| 501/501 [00:01<00:00, 332.92it/s]
100%|██████████| 500/500 [00:01<00:00, 258.68it/s]


## CNN Baseline Model

In [31]:
from model import trainer, baseline_cnn_1

torch.manual_seed(42)

batch_size = 32
num_epoch = 30

baseline_cnn_1_model = baseline_cnn_1.BaselineCNN1(len(data.categories))

criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(baseline_cnn_1_model.parameters(), lr=4e-4)

In [32]:
mtrainer = trainer.Trainer(baseline_cnn_1_model, optimizer, criterion, data, batch_size)
mtrainer.run_train(30)

[Epoch   0]: Training loss: 1.728545 | Accuracy: 0.381587
[Epoch   0]: Validation loss: 1.658206 | Accuracy: 0.375556 | Within 3: 0.728889
[Epoch   1]: Training loss: 1.555988 | Accuracy: 0.463810
[Epoch   1]: Validation loss: 1.557984 | Accuracy: 0.422222 | Within 3: 0.762222
[Epoch   2]: Training loss: 1.430644 | Accuracy: 0.520635
[Epoch   2]: Validation loss: 2.494374 | Accuracy: 0.288889 | Within 3: 0.604444
[Epoch   3]: Training loss: 1.376172 | Accuracy: 0.531746
[Epoch   3]: Validation loss: 1.601417 | Accuracy: 0.433333 | Within 3: 0.771111
[Epoch   4]: Training loss: 1.317127 | Accuracy: 0.549841
[Epoch   4]: Validation loss: 1.724938 | Accuracy: 0.413333 | Within 3: 0.740000
[Epoch   5]: Training loss: 1.252796 | Accuracy: 0.585079
[Epoch   5]: Validation loss: 1.472554 | Accuracy: 0.511111 | Within 3: 0.791111
[Epoch   6]: Training loss: 1.195902 | Accuracy: 0.594603
[Epoch   6]: Validation loss: 1.524227 | Accuracy: 0.482222 | Within 3: 0.757778
[Epoch   7]: Training loss:

In [34]:
test_loss, test_acc, top_k, incorect_stats = mtrainer.run_test(mtrainer.testloader, 3, True)
print(f'Accuracy of the network on the test images: {test_acc*100} %')
# print(f'Accuracy within top 3 results: {top_k*100} %')

# from collections import Counter

# counts = Counter(incorect_stats).most_common(25)
# for k, v in counts:
#     print(f"({data.categories[k[0]]}, {data.categories[k[1]]}): {v}")

Accuracy of the network on the test images: 67.48057713651498 %


## CCN Model with additional convolutional block

In [53]:
importlib.reload(deeper_cnn)

<module 'model.deeper_cnn' from '/Users/naomileow/Documents/school/CS5242/project/model/deeper_cnn.py'>

In [36]:
from model import deeper_cnn
torch.manual_seed(42)

deeper_model = deeper_cnn.DeeperCNN(len(data.categories))

criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(deeper_model.parameters(), lr=4e-4)

In [37]:
batch_size = 32
mtrainer = trainer.Trainer(deeper_model, optimizer, criterion, data, batch_size)
mtrainer.run_train(30)

[Epoch   0]: Training loss: 1.753233 | Accuracy: 0.380317
[Epoch   0]: Validation loss: 1.846293 | Accuracy: 0.346667 | Within 3: 0.682222
[Epoch   1]: Training loss: 1.511827 | Accuracy: 0.478413
[Epoch   1]: Validation loss: 2.050122 | Accuracy: 0.320000 | Within 3: 0.693333
[Epoch   2]: Training loss: 1.388863 | Accuracy: 0.521270
[Epoch   2]: Validation loss: 1.486442 | Accuracy: 0.504444 | Within 3: 0.793333
[Epoch   3]: Training loss: 1.294602 | Accuracy: 0.548571
[Epoch   3]: Validation loss: 1.602448 | Accuracy: 0.437778 | Within 3: 0.757778
[Epoch   4]: Training loss: 1.185758 | Accuracy: 0.595238
[Epoch   4]: Validation loss: 1.378949 | Accuracy: 0.513333 | Within 3: 0.811111
[Epoch   5]: Training loss: 1.119060 | Accuracy: 0.619365
[Epoch   5]: Validation loss: 1.441261 | Accuracy: 0.535556 | Within 3: 0.828889
[Epoch   6]: Training loss: 1.053957 | Accuracy: 0.644762
[Epoch   6]: Validation loss: 1.388480 | Accuracy: 0.537778 | Within 3: 0.828889
[Epoch   7]: Training loss:

In [38]:
test_loss, test_acc, top_k = mtrainer.run_test(mtrainer.testloader)
print(f'Accuracy of the network on the test images: {test_acc*100} %')


Accuracy of the network on the test images: 70.14428412874584 %


In [67]:
from model import deeper_cnn
torch.manual_seed(42)

deeper_model = deeper_cnn.DeeperCNN2(len(data.categories))

criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(deeper_model.parameters(), lr=3e-4)

In [68]:
batch_size = 32
mtrainer = trainer.Trainer(deeper_model, optimizer, criterion, data, batch_size)
mtrainer.run_train(30)

[Epoch   0]: Training loss: 1.851337 | Accuracy: 0.340317
[Epoch   0]: Validation loss: 1.985287 | Accuracy: 0.297778 | Within 3: 0.680000
[Epoch   1]: Training loss: 1.670817 | Accuracy: 0.418413
[Epoch   1]: Validation loss: 1.764408 | Accuracy: 0.393333 | Within 3: 0.717778
[Epoch   2]: Training loss: 1.555972 | Accuracy: 0.462857
[Epoch   2]: Validation loss: 1.629288 | Accuracy: 0.426667 | Within 3: 0.800000
[Epoch   3]: Training loss: 1.460918 | Accuracy: 0.506984
[Epoch   3]: Validation loss: 1.760022 | Accuracy: 0.435556 | Within 3: 0.777778
[Epoch   4]: Training loss: 1.434107 | Accuracy: 0.506349
[Epoch   4]: Validation loss: 1.510408 | Accuracy: 0.446667 | Within 3: 0.795556
[Epoch   5]: Training loss: 1.342594 | Accuracy: 0.549206
[Epoch   5]: Validation loss: 1.399178 | Accuracy: 0.540000 | Within 3: 0.808889
[Epoch   6]: Training loss: 1.275160 | Accuracy: 0.570476
[Epoch   6]: Validation loss: 1.473031 | Accuracy: 0.486667 | Within 3: 0.791111
[Epoch   7]: Training loss:

In [69]:
test_loss, test_acc, top_k = mtrainer.run_test(mtrainer.testloader)
print(f'Accuracy of the network on the test images: {test_acc*100} %')

Accuracy of the network on the test images: 63.374028856825745 %


## CNN Model with additional convolutional block and skip connections

In [60]:
importlib.reload(baseline_cnn_2)

<module 'model.baseline_cnn_2' from '/Users/naomileow/Documents/school/CS5242/project/model/baseline_cnn_2.py'>

In [61]:
from model import baseline_cnn_2, trainer

torch.manual_seed(42)
cnn_model_2 = baseline_cnn_2.ResidualCNN(len(data.categories))

criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(cnn_model_2.parameters(), lr=3e-4)

In [62]:
batch_size = 32
mtrainer = trainer.Trainer(cnn_model_2, optimizer, criterion, data, batch_size)
mtrainer.run_train(30)

[Epoch   0]: Training loss: 1.727069 | Accuracy: 0.389841
[Epoch   0]: Validation loss: 1.778004 | Accuracy: 0.437778 | Within 3: 0.762222
[Epoch   1]: Training loss: 1.492964 | Accuracy: 0.484444
[Epoch   1]: Validation loss: 1.700370 | Accuracy: 0.420000 | Within 3: 0.762222
[Epoch   2]: Training loss: 1.379663 | Accuracy: 0.529841
[Epoch   2]: Validation loss: 1.576051 | Accuracy: 0.473333 | Within 3: 0.793333
[Epoch   3]: Training loss: 1.309124 | Accuracy: 0.557460
[Epoch   3]: Validation loss: 1.341210 | Accuracy: 0.540000 | Within 3: 0.797778
[Epoch   4]: Training loss: 1.227667 | Accuracy: 0.584444
[Epoch   4]: Validation loss: 1.604701 | Accuracy: 0.471111 | Within 3: 0.762222
[Epoch   5]: Training loss: 1.177446 | Accuracy: 0.594603
[Epoch   5]: Validation loss: 1.829640 | Accuracy: 0.453333 | Within 3: 0.760000
[Epoch   6]: Training loss: 1.109520 | Accuracy: 0.622857
[Epoch   6]: Validation loss: 1.507974 | Accuracy: 0.502222 | Within 3: 0.802222
[Epoch   7]: Training loss:

In [63]:
test_loss, test_acc, top_k = mtrainer.run_test(mtrainer.testloader)
print(f'Accuracy of the network on the test images: {test_acc*100} %')

Accuracy of the network on the test images: 67.14761376248613 %


In [50]:
from model import baseline_cnn_2, trainer

torch.manual_seed(42)
cnn_model_2 = baseline_cnn_2.DeeperResidualCNN(len(data.categories))

criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(cnn_model_2.parameters(), lr=4e-4)

In [51]:
batch_size = 32
mtrainer = trainer.Trainer(cnn_model_2, optimizer, criterion, data, batch_size)
mtrainer.run_train(30)

[Epoch   0]: Training loss: 1.738965 | Accuracy: 0.390159
[Epoch   0]: Validation loss: 2.836551 | Accuracy: 0.211111 | Within 3: 0.580000
[Epoch   1]: Training loss: 1.461996 | Accuracy: 0.488571
[Epoch   1]: Validation loss: 1.504579 | Accuracy: 0.473333 | Within 3: 0.755556
[Epoch   2]: Training loss: 1.347466 | Accuracy: 0.531111
[Epoch   2]: Validation loss: 1.666066 | Accuracy: 0.460000 | Within 3: 0.748889
[Epoch   3]: Training loss: 1.249529 | Accuracy: 0.559048
[Epoch   3]: Validation loss: 1.514517 | Accuracy: 0.515556 | Within 3: 0.802222
[Epoch   4]: Training loss: 1.163504 | Accuracy: 0.601587
[Epoch   4]: Validation loss: 1.521192 | Accuracy: 0.508889 | Within 3: 0.822222
[Epoch   5]: Training loss: 1.055560 | Accuracy: 0.639365
[Epoch   5]: Validation loss: 1.983051 | Accuracy: 0.455556 | Within 3: 0.726667
[Epoch   6]: Training loss: 0.969718 | Accuracy: 0.660952
[Epoch   6]: Validation loss: 2.031146 | Accuracy: 0.413333 | Within 3: 0.748889
[Epoch   7]: Training loss:

In [56]:
test_loss, test_acc, top_k = mtrainer.run_test(mtrainer.testloader)
print(f'Accuracy of the network on the test images: {test_acc*100} %')

Accuracy of the network on the test images: 69.03440621531631 %
