<div class="alert alert-block alert-info" style="margin-top: 20px">

      
| Name | Description | Date
| :- |-------------: | :-:
|Reza Hashemi| Multi Layer Perceptrons - CNN-RNN network 10th  | Finalized on 23rd of August 2019 | width="750" align="center"></a></p>
</div>

# CNN-RNN network
- CNNs and RNNs both have their own strengths and drawbacks. Hence, it is sometimes recommended to combine the two to model highly complicated data
- Here, we combine the two to classify fashion image dataset (Fashion-MNIST)

In [1]:
!pip3 install torch torchvision



In [2]:
import numpy as np
import pandas as pd
import torch, torchvision
import torch.nn as nn
import torch.nn.functional as F
torch.__version__

'1.1.0'

## 1. Import & process dataset
- Fashion MNIST dataset from torchvision 
- [Original dataset source](https://github.com/zalandoresearch/fashion-mnist), [paper](https://arxiv.org/abs/1708.07747)

![](https://github.com/zalandoresearch/fashion-mnist/raw/master/doc/img/fashion-mnist-sprite.png)

In [3]:
from torchvision import datasets
import torchvision.transforms as transforms

train_dataset = datasets.FashionMNIST(root = "/", train = True, download = True, transform = transforms.ToTensor())
test_dataset = datasets.FashionMNIST(root = "/", train = False, download = True, transform = transforms.ToTensor())

0it [00:00, ?it/s]

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz to /FashionMNIST/raw/train-images-idx3-ubyte.gz


26427392it [00:02, 12266940.50it/s]                              


Extracting /FashionMNIST/raw/train-images-idx3-ubyte.gz


0it [00:00, ?it/s]

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz to /FashionMNIST/raw/train-labels-idx1-ubyte.gz


32768it [00:00, 95447.30it/s]                            
0it [00:00, ?it/s]

Extracting /FashionMNIST/raw/train-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz to /FashionMNIST/raw/t10k-images-idx3-ubyte.gz


4423680it [00:01, 4019779.87it/s]                             
0it [00:00, ?it/s]

Extracting /FashionMNIST/raw/t10k-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz to /FashionMNIST/raw/t10k-labels-idx1-ubyte.gz


8192it [00:00, 30913.95it/s]            

Extracting /FashionMNIST/raw/t10k-labels-idx1-ubyte.gz
Processing...
Done!





In [0]:
# create data loaders 
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size = 128, shuffle = True)
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size = 128, shuffle = False)

## 2. Creating CNN-RNN model and training

- Create and train  CNN-RNN model for fashion MNIST image classification.


![](https://www.researchgate.net/profile/Maulik_Kamdar/publication/322167103/figure/fig5/AS:631611880124465@1527599415555/CRNN-Architecture-Overview-Combining-CNN-and-RNN-to-predict-the-methylation-state-from.png)

In [0]:
# create CNN with one convolution/pooling layer
class net(nn.Module):
  def __init__(self, input_dim, num_filters, conv_kernel_size, pool_kernel_size, stride, padding, hidden_size, num_classes, device):
    super(net, self).__init__()
    self.input_dim = input_dim
    self.device = device
    self.num_filters = num_filters
    self.hidden_size = hidden_size
    conv_output_size = int((input_dim - conv_kernel_size + 2 * padding)/stride) + 1   # conv layer output size
    self.pool_output_size = int((conv_output_size - pool_kernel_size)/stride) + 1          # pooling layer output size
   
    self.conv = nn.Conv2d(1, num_filters, kernel_size = conv_kernel_size, stride = stride, padding = padding)     
    self.pool = nn.MaxPool2d(kernel_size = pool_kernel_size, stride = stride)
    self.rnn = nn.GRU(input_size = self.pool_output_size * self.pool_output_size, hidden_size = hidden_size)  # GRU layer that takes into CNN output
    self.relu = nn.ReLU()
    self.dense = nn.Linear(hidden_size, num_classes)     
    
  def forward(self, x):
    x = self.conv(x)
    x = self.relu(x)
    x = self.pool(x)

    x = x.view(self.num_filters, x.size(0), self.pool_output_size * self.pool_output_size)   # resize to fit into GRU layer
    
    h0 = torch.from_numpy(np.zeros((1, x.size(1), self.hidden_size))).float().to(self.device)
    x, _ = self.rnn(x, h0)
    x = x[-1, :, :]                 # take only the last sequence output
    x = self.dense(x)
    return x

In [0]:
# hyperparameters
DEVICE = torch.device('cuda')
INPUT_DIM = 28
NUM_FILTERS = 64
HIDDEN_SIZE = 30
CONV_KERNEL_SIZE = 3
POOL_KERNEL_SIZE = 2
STRIDE = 1
PADDING = 1
HIDDEN_SIZE = 10
NUM_CLASSES = 10
LEARNING_RATE = 1e-1
NUM_EPOCHS = 10

In [0]:
model = net(INPUT_DIM, NUM_FILTERS, CONV_KERNEL_SIZE, POOL_KERNEL_SIZE, STRIDE, PADDING, HIDDEN_SIZE, NUM_CLASSES, DEVICE).to(DEVICE)
criterion = nn.CrossEntropyLoss()   # do not need softmax layer when using CEloss criterion
optimizer = torch.optim.Adam(model.parameters(), lr = LEARNING_RATE)

In [8]:
# training for NUM_EPOCHS
for i in range(NUM_EPOCHS):
  temp_loss = []
  for (x, y) in train_loader:
    x, y = x.float().to(DEVICE), y.to(DEVICE)  # beware that input to embedding should be type 'long'
    outputs = model(x)
    loss = criterion(outputs, y)
    temp_loss.append(loss.item())
    
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    
  print("Loss at {}th epoch: {}".format(i, np.mean(temp_loss)))

Loss at 0th epoch: 2.3441601541759107
Loss at 1th epoch: 2.339597595780135
Loss at 2th epoch: 2.340265729025737
Loss at 3th epoch: 2.3395771812528436
Loss at 4th epoch: 2.337725659169114
Loss at 5th epoch: 2.3428904186687998
Loss at 6th epoch: 2.3376515308168653
Loss at 7th epoch: 2.3386383000721556
Loss at 8th epoch: 2.334579667557023
Loss at 9th epoch: 2.3355735863195553


## 3. Evaluation
- Evaluate the trained CNN-RNN model with accuracy score 
  - Store probability of each instance to a list and compare it with true y label

In [9]:
y_pred, y_true = [], []
with torch.no_grad():
  for x, y in test_loader:
    x, y = x.float().to(DEVICE), y.to(DEVICE)       # beware that input to embedding should be type 'long'
    outputs = F.softmax(model(x)).max(1)[-1]       # predicted label
    y_true += list(y.cpu().numpy())                # true label
    y_pred += list(outputs.cpu().numpy())   

  """


In [10]:
# evaluation result
from sklearn.metrics import accuracy_score
accuracy_score(y_true, y_pred)

0.099