<a href="https://colab.research.google.com/github/hkbu-kennycheng/comp3057/blob/main/lab4_CNN_and_RNN.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Image classification with CNN

We are going to train a CNN model for image classification using `PyTorch`. Let's import `torch` first.

In [None]:
import torch

## [CIFAR10](https://www.cs.toronto.edu/~kriz/cifar.html) dataset

With `CIFAR10` dataset, we could train a model to classify airplane, automobile, bird, cat, deer, dog, frog, horse, ship and truck. It serve as testing purpose for machine model development.

![](https://url2img-web.herokuapp.com/aHR0cHM6Ly93d3cuY3MudG9yb250by5lZHUvfmtyaXovY2lmYXIuaHRtbA==)

### Loading CIFAR10 dataset with torchvision

`CIFAR10` could be downloaded with `torchvision` module directly by `torchvision.datasets`. We also need `torchvision.transforms` and transform data to [`Tensor`](https://pytorch.org/docs/stable/tensors.html). Tensor is a multi-dimensional container for data of different types.


In [None]:
from torchvision import datasets
from torchvision import transforms

train_set = datasets.CIFAR10("./data", download=True, transform=transforms.Compose([transforms.ToTensor()]))
test_set = datasets.CIFAR10("./data", download=True, train=False, transform=transforms.Compose([transforms.ToTensor()]))

Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ./data/cifar-10-python.tar.gz


HBox(children=(FloatProgress(value=0.0, max=170498071.0), HTML(value='')))


Extracting ./data/cifar-10-python.tar.gz to ./data
Files already downloaded and verified


### Split train, valid and test data

We the wrap it with `DataLoader`, which helps us to shuffle and load data in each iteration during the train process.

In [None]:
import numpy as np
from torch.utils.data import DataLoader

train_loader = DataLoader(train_set, shuffle=True)
test_loader = DataLoader(test_set, shuffle=True)

The labels are represented using `0 - 9`.

In [None]:
labels = ('plane', 'car', 'bird', 'cat',
           'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

## Building a CNN model with `Sequential` API

Let's take a look to CNN model architecture that we are going to build.

![](https://pytorch.org/tutorials/_images/mnist.png)

`PyTorch` provides us API in `torch.nn.Conv2d` for applying convolution.

![](https://url2img-web.herokuapp.com/aHR0cHM6Ly9weXRvcmNoLm9yZy9kb2NzL3N0YWJsZS9nZW5lcmF0ZWQvdG9yY2gubm4uQ29udjJkLmh0bWw=)


In [None]:
from torch import nn

model = nn.Sequential(
  nn.Sequential(
    nn.Conv2d(3,6,5),     # Convolutions between INPUT and C1
    nn.ReLU(),            # Activation function of Convolutions, which outputs C1
    nn.MaxPool2d((2,2)),  # 2D max pooling subsampling between C1 and S2
    nn.Conv2d(6, 16, 5),  # Convolutions between S2 and C3
    nn.ReLU(),            # Activation function of Convolutions, whcih outputs C3
    nn.MaxPool2d(2),      # 2D max pooling subsampling between C3 and S4
  ),
  nn.Flatten(),          # Flatten reformats input data from 2d array to 1d array
  nn.Sequential(
    nn.Linear(16 * 5 * 5, 120), # F5
    nn.ReLU(),            # activation of F5
    nn.Linear(120, 84),   # F6
    nn.ReLU(),            # activation of F6
    nn.Linear(84, 10)     # OUTPUT
  ),
)
print(model)

Sequential(
  (0): Sequential(
    (0): Conv2d(3, 6, kernel_size=(5, 5), stride=(1, 1))
    (1): ReLU()
    (2): MaxPool2d(kernel_size=(2, 2), stride=(2, 2), padding=0, dilation=1, ceil_mode=False)
    (3): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
    (4): ReLU()
    (5): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (1): Flatten(start_dim=1, end_dim=-1)
  (2): Sequential(
    (0): Linear(in_features=400, out_features=120, bias=True)
    (1): ReLU()
    (2): Linear(in_features=120, out_features=84, bias=True)
    (3): ReLU()
    (4): Linear(in_features=84, out_features=10, bias=True)
  )
)


## Loss function

Loss function measures how accurate the model is during training. You want to minimize this function to "steer" the model in the right direction.

We would use [`nn.CrossEntropyLoss`](https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html) in our example CNN model.

In [None]:
loss_function = nn.CrossEntropyLoss()

## Optimizer

This is how the model is updated based on the data it sees and its loss function.

[Adam](https://arxiv.org/abs/1412.6980) optimization is a stochastic gradient descent method that is based on adaptive estimation of first-order and second-order moments. 

We set the learning rate to `0.001` in following configuration.

In [None]:
from torch import optim

optimizer = optim.Adam(model.parameters(), lr=1e-3)

## The training loop

We first need to define how many `epoch` in the training process. An epoch means one complete pass of the training dataset through the algorithm.

In [None]:
from tqdm import tqdm # for making progress bar

NUM_EPOCHS = 1 # 1 epoch take a few minutes using Colab CPU Standard runtime

for epoch in range(NUM_EPOCHS):
  loop = tqdm(train_loader, position=0, leave=True)

  model.train() # put model in training mode
  for (input, label) in loop:
    optimizer.zero_grad()
    output = model.forward(input)
    loss = loss_function(output, label)
    loss.backward()
    optimizer.step()

    loop.set_description(f"Epoch [{epoch}/{NUM_EPOCHS}]")

  return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)
Epoch [0/1]: 100%|██████████| 50000/50000 [07:41<00:00, 108.23it/s]


## Test trained model

In order to use the model for prediction, we need to put it in evaluation mode first

In [None]:
correct = 0
total = 0

loop = tqdm(test_loader, position=0, leave=True)
model.eval() # put model in evaluation mode
for (input, label) in loop:
  output = model.forward(input)
  _, predicted = torch.max(output.data, 1)
  total += label.size(0)
  correct += (predicted == label).sum().item()
  loop.set_postfix(acc=(100*correct/total))

100%|██████████| 10000/10000 [00:52<00:00, 189.51it/s, acc=45.9]


# Text classification with RNN build from scratch

This is a NLP task, which uses movie review form IMDB to classify it as positive review or negative review.

![](https://url2img-web.herokuapp.com/aHR0cHM6Ly9haS5zdGFuZm9yZC5lZHUvfmFtYWFzL2RhdGEvc2VudGltZW50Lw==)

In [None]:
!pip install torchtext



## Download the dataset

`torchtext` provides us convenient API for downloading the dataset and split it in training set and testing set.

In [None]:
from torchtext.datasets import IMDB

train_set, test_set = IMDB(split=('train', 'test'))

train_label, train_data = zip(*train_set)
test_label, test_data = zip(*test_set)

aclImdb_v1.tar.gz: 100%|██████████| 84.1M/84.1M [00:02<00:00, 40.5MB/s]


## Explore the data

Let's take a look to the data. Each review is written in english and paired with corresponding class `neg` or `pos`.

In [None]:
print(train_label[0])
print(train_data[0])

neg
I rented I AM CURIOUS-YELLOW from my video store because of all the controversy that surrounded it when it was first released in 1967. I also heard that at first it was seized by U.S. customs if it ever tried to enter this country, therefore being a fan of films considered "controversial" I really had to see this for myself.<br /><br />The plot is centered around a young Swedish drama student named Lena who wants to learn everything she can about life. In particular she wants to focus her attentions to making some sort of documentary on what the average Swede thought about certain political issues such as the Vietnam War and race issues in the United States. In between asking politicians and ordinary denizens of Stockholm about their opinions on politics, she has sex with her drama teacher, classmates, and married men.<br /><br />What kills me about I AM CURIOUS-YELLOW is that 40 years ago, this was considered pornographic. Really, the sex and nudity scenes are few and far between,

## Preprocess text data

Usually for NLP task, it require several necessary data preprocessing steps in order to transform data into appropriate input for the model. They are **standardization**, **tokenization** and **vectorization**.


### Standardization

You may notice that, the text in `train_data` contains `<br />` and captial characters. To Standardize it, we would replace `<br />` and punctuation as space and empty string. Finally transform all characters into lower case.

For example, `<br />I rented I AM CURIOUS-YELLOW` becomes `i rented i am curious yellow`.

### Tokenization

After that, we would split input data into tokens by spaces. For example `i rented i am curious-yellow` becomes a list of words `["i", "rented", "i", "am", "curious", "yellow"]`.

### Numericalization and Vectorization

Vectorization computes a vector for each word.


### Mapping text label to tensor

In [None]:
# label_dict = {'neg': 0, 'pos': 1}
label_dict = {'neg': torch.tensor([1.0, 0.0], dtype=torch.float), 'pos': torch.tensor([0.0, 1.0], dtype=torch.float)}
train_label = [label_dict[label] for label in train_label]
test_label = [label_dict[label] for label in test_label]

### Standardization and Tokenization

We may use `custom_replace` from `torchtext.data.functional` together with regular expression to replace ``

In [None]:
from torchtext.data.functional import custom_replace
import re
import string

rules = [
  (r'<br />', ' '),
  (r'[%s]' % re.escape(string.punctuation), '')
] + [(r'[%s]' % c, c.lower()) for c in string.ascii_uppercase]

custom_replace_transform = custom_replace(rules)

### Build a dictionary from text

In [None]:
from torchtext.vocab import build_vocab_from_iterator

vocab = build_vocab_from_iterator(train_data + test_data)
print(len(vocab))

206


### Vectorize tokens

In [None]:
from torchtext.data.functional import numericalize_tokens_from_iterator
from itertools import takewhile

train_data = numericalize_tokens_from_iterator(vocab, custom_replace_transform(train_data))
test_data = numericalize_tokens_from_iterator(vocab, custom_replace_transform(test_data))

tensorized_train_data = [torch.tensor([num for num in d], dtype=torch.long) for d in train_data]
tensorized_test_data = [torch.tensor([num for num in d], dtype=torch.long) for d in test_data]

In [None]:
print(len(tensorized_train_data[0]), tensorized_train_data[0])
print(len(tensorized_test_data[0]), tensorized_test_data[0])
print(train_label[:10])
print(test_label[:10])

1576 tensor([ 5,  0,  8,  ..., 10,  4,  2])
1321 tensor([ 5,  0, 10,  ...,  3,  5,  7])
[tensor([1., 0.]), tensor([1., 0.]), tensor([1., 0.]), tensor([1., 0.]), tensor([1., 0.]), tensor([1., 0.]), tensor([1., 0.]), tensor([1., 0.]), tensor([1., 0.]), tensor([1., 0.])]
[tensor([1., 0.]), tensor([1., 0.]), tensor([1., 0.]), tensor([1., 0.]), tensor([1., 0.]), tensor([1., 0.]), tensor([1., 0.]), tensor([1., 0.]), tensor([1., 0.]), tensor([1., 0.])]


### Wrap with DataLoader

In [None]:
from torch.utils.data import DataLoader
from torch.utils.data import SubsetRandomSampler

tensorized_train_set = list(zip(tensorized_train_data, train_label))
tensorized_test_set = list(zip(tensorized_test_data, test_label))

# we only takes 10% samples from dataset, otherwise it will take very long time to train
train_loader = DataLoader(tensorized_train_set, sampler=SubsetRandomSampler(range(0, len(tensorized_train_set), 10)))
test_loader = DataLoader(tensorized_test_set, sampler=SubsetRandomSampler(range(0, len(tensorized_test_set), 10)))

## Build the model


### Overview

```
/--------\          /--------\         /--------\         /--------\
| output |          | output |         | output |         | output |
\--------/          \--------/         \--------/         \--------/
     |                  |                  |                  |
 +-------+          +-------+          +-------+          +-------+
 |  RNN  |    =>    |  RNN  |    =>    |  RNN  |    =>    |  RNN  |    =>    ...
 +-------+  hidden  +-------+  hidden  +-------+  hidden  +-------+  hidden
     |      state       |      state       |      state       |      state
/--------\          /--------\         /--------\         /--------\
|  input |          | input  |         |  input |         | input  |
\--------/          \--------/         \--------/         \--------/
    ^                   ^                  ^                  ^
    i                 rented               i                  am
 (Vector)            (Vector)           (Vector)           (Vector)
```

### Recurrent Neural Network

Here is an overview of our RNN. It's a simple recurrent feedforward neural network.

```
  +-------+
  | input |
  +-------+
       |
+-----------+    +--------+
| embedding |    | hidden | <---O
+-----------+    +--------+     |
         \       /              |
       +----------+             |
       | combined |             |
       +----------+             |
      /            \            |
      |        +---------+      |
      |        | Sigmoid |      |
      |        +---------+      |
      |             |           |
+---------+    +--------+       |
| output  |    | hidden | ------O
+---------+    +--------+


```


In [None]:
from torch import nn

class RNN(nn.Module):
    def __init__(self, input_size, hidden_size, output_size, num_embeddings):
        super(RNN, self).__init__()

        self.hidden_size = hidden_size

        self.embedding = nn.Embedding(num_embeddings, input_size)
        self.i2h = nn.Linear(input_size + hidden_size, hidden_size)
        self.i2o = nn.Linear(input_size + hidden_size, output_size)
        self.acti = nn.Sigmoid()

    def forward(self, input):
        hidden = self.initHidden()
        for word_vector in self.embedding(input):
          combined = torch.cat((word_vector.reshape(1, len(word_vector)), hidden), 1)
          hidden = self.i2h(combined)
          hidden = self.acti(hidden)
          output = self.i2o(combined)
        return output

    def initHidden(self):
        return torch.zeros(1, self.hidden_size)

In [None]:
n_hidden = 128      # size of hidden state
embedding_dim = 512  # number of dimension for embedding vector

model = RNN(embedding_dim, n_hidden, 2, len(vocab))

## Loss function

Loss function measures how accurate the model is during training. You want to minimize this function to "steer" the model in the right direction.

We would use [`nn.BCEWithLogitsLoss`](https://pytorch.org/docs/stable/generated/torch.nn.BCEWithLogitsLoss.html) in our example RNN model.

![](https://url2img-web.herokuapp.com/aHR0cHM6Ly9weXRvcmNoLm9yZy9kb2NzL3N0YWJsZS9nZW5lcmF0ZWQvdG9yY2gubm4uQkNFV2l0aExvZ2l0c0xvc3MuaHRtbA==)

In [None]:
loss_function = torch.nn.BCEWithLogitsLoss(pos_weight=torch.ones([2]))

## Optimization

`SDG` is a gradient descent (with momentum set to `0.9`) optimizer. For learning rate, we set to `0.001`.

In [None]:
from torch import optim

# Only SGD, SpareAdam and Adagrad support spare gradients
optimizer = optim.SGD(model.parameters(), lr=1e-3, momentum=0.9)

## Train the model

In [None]:
from tqdm import tqdm

NUM_EPOCHS = 1 # 1 epoch take about 15 minutes using Colab CPU standard runtime
               # 2 epochs give us 0.68 accuracy in evaluation

model.train() # put model in training mode

for epoch in range(NUM_EPOCHS):
  loop = tqdm(train_loader, position=0, leave=True)
  
  for (batch, labels) in loop:
    for input in batch:

      optimizer.zero_grad()
      output = model.forward(input)

      loss = loss_function(output, labels)
      loss.backward()

      optimizer.step()

100%|██████████| 2500/2500 [15:47<00:00,  2.64it/s]


## Evaluate the model

In [None]:
model.eval() # put model in evaluation mode

prediction_model = nn.Sequential(
  model,
  nn.LogSoftmax(dim=1),
)

### Evaluation loop

In [None]:
correct = 0
loop = tqdm(test_loader, position=0, leave=True)

for (batch, labels) in loop:
  for input in batch:
    output = prediction_model.forward(input)
    truth = labels.argmax(0)[0]
    guess = output.argmax(1)[0]
    if truth == guess:
      correct += 1
print(f'Acc = {correct/len(loop)}')

100%|██████████| 2500/2500 [04:21<00:00,  9.56it/s]

Acc = 0.5788



