# Shopee Product Classification 

Notebook to experiment over several Neural Networks over the product dataset obtained from Shopee and evaluate results.

The following models are evaluated as part of this notebook:

Baseline 1: CNN <br>
Baseline 2: CNN with augmented layers <br>
Improvement 1: Adding RNN <br>
Improvement 2: Adding attention

## Imports and Config

In [None]:
!pip install scikit-image
!pip install shopee_crawler
!pip install torchvision
!pip install opencv-python

In [2]:
# System
import importlib

# Visualization
import ipywidgets as widgets
import matplotlib.pyplot as plt

# Custom Modules
from model import trainer, dataset, baseline_cnn_1


import torch
import torch.nn as nn

torch.manual_seed(42)

<torch._C.Generator at 0x120701130>

## Dataset

In [3]:
data = dataset.DataSet(max_num_img=3000, crop=0.75, path='data/selected_images/')

In [4]:
data.load_all()

100%|██████████| 500/500 [00:01<00:00, 325.40it/s]
100%|██████████| 500/500 [00:01<00:00, 332.37it/s]
100%|██████████| 500/500 [00:01<00:00, 333.32it/s]
100%|██████████| 500/500 [00:01<00:00, 337.76it/s]
100%|██████████| 500/500 [00:01<00:00, 294.62it/s]
100%|██████████| 500/500 [00:01<00:00, 282.81it/s]
100%|██████████| 500/500 [00:01<00:00, 321.33it/s]
100%|██████████| 501/501 [00:01<00:00, 370.24it/s]
100%|██████████| 500/500 [00:01<00:00, 273.76it/s]


## CNN Baseline Model

In [20]:
batch_size = 32
num_epoch = 30

baseline_cnn_1_model = baseline_cnn_1.BaselineCNN1(len(data.categories))

criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(baseline_cnn_1_model.parameters(), lr=5e-4)

In [21]:
mtrainer = trainer.Trainer(baseline_cnn_1_model, optimizer, criterion, data, batch_size)
mtrainer.run_train(30)

[Epoch   0]: Training loss: 1.749399 | Accuracy: 0.373651
[Epoch   0]: Validation loss: 1.753244 | Accuracy: 0.368889 | Within 3: 0.697778
[Epoch   1]: Training loss: 1.580246 | Accuracy: 0.446032
[Epoch   1]: Validation loss: 1.652975 | Accuracy: 0.402222 | Within 3: 0.784444
[Epoch   2]: Training loss: 1.458521 | Accuracy: 0.501905
[Epoch   2]: Validation loss: 1.654066 | Accuracy: 0.444444 | Within 3: 0.777778
[Epoch   3]: Training loss: 1.409372 | Accuracy: 0.515556
[Epoch   3]: Validation loss: 1.477753 | Accuracy: 0.471111 | Within 3: 0.797778
[Epoch   4]: Training loss: 1.325175 | Accuracy: 0.555873
[Epoch   4]: Validation loss: 1.599549 | Accuracy: 0.453333 | Within 3: 0.746667
[Epoch   5]: Training loss: 1.271877 | Accuracy: 0.572063
[Epoch   5]: Validation loss: 2.488706 | Accuracy: 0.293333 | Within 3: 0.597778
[Epoch   6]: Training loss: 1.236835 | Accuracy: 0.581587
[Epoch   6]: Validation loss: 1.580380 | Accuracy: 0.453333 | Within 3: 0.773333
[Epoch   7]: Training loss:

In [44]:
test_loss, test_acc, top_k, incorect_stats = mtrainer.run_test(mtrainer.testloader, 3, True)
print(f'Accuracy of the network on the test images: {test_acc*100} %')
# print(f'Accuracy within top 3 results: {top_k*100} %')

# from collections import Counter

# counts = Counter(incorect_stats).most_common(25)
# for k, v in counts:
#     print(f"({data.categories[k[0]]}, {data.categories[k[1]]}): {v}")

  idxes = (predicted[:, 0] != labels).nonzero().flatten()


Accuracy of the network on the test images: 64.59489456159822 %


## CCN Model with additional convolutional block

In [96]:
importlib.reload(deeper_cnn)

<module 'model.deeper_cnn' from '/Users/naomileow/Documents/school/CS5242/project/model/deeper_cnn.py'>

In [97]:
from model import deeper_cnn

deeper_model = deeper_cnn.DeeperCNN(len(data.categories))

criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(deeper_model.parameters(), lr=5e-4)

In [98]:
batch_size = 32
mtrainer = trainer.Trainer(deeper_model, optimizer, criterion, data, batch_size)
mtrainer.run_train(30)

[Epoch   0]: Training loss: 1.749542 | Accuracy: 0.375873
[Epoch   0]: Validation loss: 1.652791 | Accuracy: 0.437778 | Within 3: 0.751111
[Epoch   1]: Training loss: 1.510620 | Accuracy: 0.473016
[Epoch   1]: Validation loss: 1.593568 | Accuracy: 0.444444 | Within 3: 0.762222
[Epoch   2]: Training loss: 1.383407 | Accuracy: 0.522857
[Epoch   2]: Validation loss: 1.529960 | Accuracy: 0.468889 | Within 3: 0.782222
[Epoch   3]: Training loss: 1.302175 | Accuracy: 0.548254
[Epoch   3]: Validation loss: 1.525819 | Accuracy: 0.482222 | Within 3: 0.784444
[Epoch   4]: Training loss: 1.205695 | Accuracy: 0.572063
[Epoch   4]: Validation loss: 1.615896 | Accuracy: 0.431111 | Within 3: 0.757778
[Epoch   5]: Training loss: 1.160441 | Accuracy: 0.594921
[Epoch   5]: Validation loss: 1.337535 | Accuracy: 0.546667 | Within 3: 0.813333
[Epoch   6]: Training loss: 1.090815 | Accuracy: 0.624444
[Epoch   6]: Validation loss: 1.896076 | Accuracy: 0.397778 | Within 3: 0.742222
[Epoch   7]: Training loss:

In [100]:
test_loss, test_acc, top_k = mtrainer.run_test(mtrainer.testloader)
print(f'Accuracy of the network on the test images: {test_acc*100} %')

Accuracy of the network on the test images: 69.14539400665927 %


## CNN Model with additional convolutional block and skip connections

In [87]:
from model import baseline_cnn_2

cnn_model_2 = baseline_cnn_2.BaselineCNN2(len(data.categories))

criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(cnn_model_2.parameters(), lr=5e-4)

In [88]:
batch_size = 32
mtrainer = trainer.Trainer(cnn_model_2, optimizer, criterion, data, batch_size)
mtrainer.run_train(30)

[Epoch   0]: Training loss: 1.764447 | Accuracy: 0.366032
[Epoch   0]: Validation loss: 1.856244 | Accuracy: 0.346667 | Within 3: 0.702222
[Epoch   1]: Training loss: 1.592605 | Accuracy: 0.438730
[Epoch   1]: Validation loss: 1.585183 | Accuracy: 0.426667 | Within 3: 0.777778
[Epoch   2]: Training loss: 1.437768 | Accuracy: 0.502222
[Epoch   2]: Validation loss: 1.569322 | Accuracy: 0.448889 | Within 3: 0.773333
[Epoch   3]: Training loss: 1.375024 | Accuracy: 0.526032
[Epoch   3]: Validation loss: 1.538624 | Accuracy: 0.466667 | Within 3: 0.804444
[Epoch   4]: Training loss: 1.285507 | Accuracy: 0.549841
[Epoch   4]: Validation loss: 1.871614 | Accuracy: 0.364444 | Within 3: 0.733333
[Epoch   5]: Training loss: 1.252900 | Accuracy: 0.563175
[Epoch   5]: Validation loss: 2.208791 | Accuracy: 0.308889 | Within 3: 0.693333
[Epoch   6]: Training loss: 1.186153 | Accuracy: 0.586984
[Epoch   6]: Validation loss: 1.395670 | Accuracy: 0.546667 | Within 3: 0.822222
[Epoch   7]: Training loss:

In [89]:
test_loss, test_acc, top_k = mtrainer.run_test(mtrainer.testloader)
print(f'Accuracy of the network on the test images: {test_acc*100} %')

Accuracy of the network on the test images: 67.03662597114317 %
