# CS5242 Shopee Product Classification: Other Networks

* In this notebook, we aim to use some of the other types of Neural network building blocks to perform image classification.
* These building blocks are added on to our CNN baseline model and evaluated.

The two additional types of network experiments performed in this notebook are as follows:

* Recurrent Neural Networks (RNN)
* Attention Neural Networks (Attention)

## Imports and Config

In [1]:
import torch
import torch.nn as nn

from model import dataset, trainer
from model import baseline_cnn_1, rnn_cnn, attention_cnn

In [2]:
batch_size = 32
num_epoch = 30
seed = 42

## Data Import

* As previously, we use our dataset to import the set of images across categories.
* The 9 categories are selected with the custom filtered 500 images from each of the categories.

In [3]:
data = dataset.DataSet(max_num_img=500, crop=0.8, path='data/selected_images/')

In [4]:
data.load_all()

100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 500/500 [00:01<00:00, 258.83it/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 500/500 [00:02<00:00, 242.56it/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 500/500 [00:03<00:00, 149.50it/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 500/500 [00:03<00:00, 151.57it/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 500/500 [00:04<00:00, 113.63it/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 500/500 [00:04<00:00, 111.54it/s]
100%|███████████████████████████████████████████████████████████████████████████████████████████████

## Baseline Model

* Before we proceed with these networks, we add in one evaluation of our baseline model to enable us to compare performances.

In [5]:
baseline_cnn_1_model = baseline_cnn_1.BaselineCNN1(len(data.categories))
torch.manual_seed(seed)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(baseline_cnn_1_model.parameters(), lr=4e-4)

In [6]:
mtrainer = trainer.Trainer(baseline_cnn_1_model, optimizer, criterion, data, batch_size)
mtrainer.run_train(num_epoch)

KeyboardInterrupt: 

In [None]:
test_loss, test_acc, top_k, incorect_stats = mtrainer.run_test(mtrainer.testloader, 3, True)
print(f'Accuracy of the network on the test images: {test_acc*100} %')

## Recurrent Neural Network (RNN)

* In this approach, we add an RNN layer over the baseline CNN model we implemented.
* The RNN layer selected is a Long Short Term Memory (LSTM) layer from the Pytorch nn modules.
    * We keep all other convolutional blocks the same as compared to the baseline CNN model.
* The LSTM mechanism is implemented as follows:
    * After passing through the convolutional blocks, the image is split into smaller patches
    * These patches are then passed sequentially into the LSTM model.
    * The number of hidden states in the LSTM is directly proportional to the number of patches in the image.
* Following the LSTM layer, a final fully connected layer is used.
    * The adaptive average pooling layer is removed in this case.

The RNN and CNN model was experimented with, owing to findings from https://www.matec-conferences.org/articles/matecconf/pdf/2019/26/matecconf_jcmme2018_02001.pdf following a similar approach.

In [5]:
rnn_cnn_model = rnn_cnn.CNNWithRNN(len(data.categories))
torch.manual_seed(seed)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(rnn_cnn_model.parameters(), lr=4e-4)

In [6]:
mtrainer = trainer.Trainer(rnn_cnn_model, optimizer, criterion, data, batch_size)
mtrainer.run_train(num_epoch)

KeyboardInterrupt: 

In [None]:
test_loss, test_acc, top_k, incorect_stats = mtrainer.run_test(mtrainer.testloader, 3, True)
print(f'Accuracy of the network on the test images: {test_acc*100} %')

* We can see that the RNN model did not do as well as our baseline model and in fact led to a small reduction in performance.
* In order to further understand this, we performed some paramter tuning on our model to see if that would affect our results, the results of which are explained below.

* **Increase in patch size**:
    * The increase in patch size led to a reduced performance on the RNN. This made sense since a larger patch size would require more information to be incorporated by the hidden cells and would lead to higher loss.
* **More stacked layers**:
    * Stacking multiple LSTM layers helped to increase the depth of our model and learn more features. We noticed that stacking 2 layers helped to provide a small improvement in the score, but increasing it to 3 led to a reduction. Thus stacking too many layers led to a higher degree of overfitting.
* **Removing MaxPool after convolution**:
    * An experiment was run with removing the MaxPool after the convolution layers as well, with the expectation that this would reduce abstraction and provide more data to the RNN. However this seemed to make performance worse as well. It would appear that the maxpool is important before applying the RNN.

## Attention Neural Network (Attention)

* In this approach, attention blocks are added after the convolution layers of the baseline model.
* A custom attention layer is built which incorporates the following steps:
    * An intermediate pooling result and the final pooled result are passed through convolutional layers.
    * Following this, another convolutional layer is applied to reduce the number of channels to 1.
    * A softmax is applied and multiplied with the intermdiate pooling result to get the attention elements.
    
The Attention with CNN model was experimented with, owing to findings from https://blog.paperspace.com/image-classification-with-attention/.

In [None]:
attention_cnn_model = attention_cnn.CNNWithAttention(len(data.categories))
torch.manual_seed(seed)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(attention_cnn_model.parameters(), lr=4e-4)

In [None]:
mtrainer = trainer.Trainer(attention_cnn_model, optimizer, criterion, data, batch_size)
mtrainer.run_train(num_epoch)

In [None]:
test_loss, test_acc, top_k, incorect_stats = mtrainer.run_test(mtrainer.testloader, 3, True)
print(f'Accuracy of the network on the test images: {test_acc*100} %')

* With the potential for improved performance, we also aim to run our attention layers along with the residual CNN model, which incorporates skip connections.

In [None]:
attention_res_cnn_model = attention_cnn.ResidualCNNWithAttention(len(data.categories))
torch.manual_seed(seed)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(attention_res_cnn_model.parameters(), lr=4e-4)

In [None]:
mtrainer = trainer.Trainer(attention_res_cnn_model, optimizer, criterion, data, batch_size)
mtrainer.run_train(num_epoch)

In [None]:
test_loss, test_acc, top_k, incorect_stats = mtrainer.run_test(mtrainer.testloader, 3, True)
print(f'Accuracy of the network on the test images: {test_acc*100} %')