# RNN Example

- **Instructor**: Jongwoo Lim / Jiun Bae
- **Email**: [jlim@hanyang.ac.kr](mailto:jlim@hanyang.ac.kr) / [jiunbae.623@gmail.com](mailto:jiunbae.623@gmail.com)

## Image classification with RNN

We can use RNN using image classification. Flatten image pixels as 1-D sequential array.

## Code

### Import packages

First of all, Import some packages for using PyTorch.

- torch.nn: The **Network** of PyTorch basically starts with nn.Module.
- torch.nn.functional: for **Functions** such as *ReLU*, *MaxPool* (in this example)
- torch.optim: for **Optimizers**
- torchvision: Handling **Datasets**

Numpy the basic scientific computing package used in customary.

In [None]:
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.autograd import Variable
from torchvision import datasets, transforms

import matplotlib.pyplot as plt

## Dataset

PyTorch basically provides MNIST Dataset and support download in running code!

In [None]:
DATASET_DIR = '../data' # path to download mnist dataset

TRAIN_DATASET = datasets.MNIST(DATASET_DIR,   # Dataset root path
                               train=True,    # Train data
                               download=True) # Download if not exist

TEST_DATASET = datasets.MNIST(DATASET_DIR,    # Dataset root path
                              train=False)    # Test data

### Flatten image

In [None]:
from PIL import Image
from IPython.display import display

def show(ary):
    display(Image.fromarray(ary))

In [None]:
image, label = TRAIN_DATASET[0]
image = np.array(image)

In [None]:
show(image)

In [None]:
show(image.reshape(1, -1))

In [None]:
input_size = 28*28
hidden_size = 128

## Run!


### Reproducible (**Important**)

**Reproducible** is very **very** ***very*** important in experiment. An experiment that can not be reproduced can not make any conclusions. So fix random seed before anything else.
In **PyTorch** just call `torch.manual_seed` for fix random seed. It will set the seed of the random generator, so random results will be **reproducible**.


### Hyperparameters

Unfortunately, machine learning does not mean learning all the variables. We call these parameters the **hyperparameters** that need to be set before learning.
In this example, we can set the *learning rate* before training.


### DataLoader

Loading files from disk is a very expensive operation. Especially in machine learning where a lot of training data is needed, also especially if each data is an image.
So, many frameworks provide *data loader* for effectively load data such as use multiple threads and cache. In **PyTorch** DataLoader support shuffle, batch slice, transform and many other functions.
But in this example, just use `batch_size` and `shuffle`.

PyTorch only process `torch.Tensor`. So, must convert data (3d numpy array) to tensor (torch.Tensor) using transform before training(or test).
*`transforms.ToTensor()` automatically transform data to tensor when loader called*


### GPU or CPU

`device` variable use **CUDA** if available. CPU can get results fast enough because there are fewer data and the network is simple. But when more data is available and the network gets more complicated, it's time to get help from the GPU. So now you do not have to worry.
Later `.to (device)` means use the device we specified.


### Build Network

`model = Network()` create network we defined before. In this example, we use [SGD(Stochastic gradient descent)](https://en.wikipedia.org/wiki/Stochastic_gradient_descent) optimizer.

In [None]:
torch.manual_seed(42) # 42, THE ANSWER TO LIFE, THE UNIVERSE AND EVERYTHING

batch = 16            # batch size
lr = .1              # learning rate
epochs = 16


TRAIN_DATASET.transform = transforms.ToTensor()
train_loader = torch.utils.data.DataLoader(TRAIN_DATASET,
                                           batch_size=64,
                                           shuffle=True)

TEST_DATASET.transform = transforms.ToTensor()
test_loader = torch.utils.data.DataLoader(TEST_DATASET,
                                          batch_size=64,
                                          shuffle=True)

In [None]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

In [None]:
class RNN(nn.Module):
    def __init__(self, input_size, hidden_size):
        super(RNN, self).__init__()

        self.hidden_size = hidden_size

        self.rnn = nn.RNN(input_size, hidden_size, 1, bias=True, batch_first=True, 
                          nonlinearity='tanh', dropout=0)
        self.fc = nn.Linear(hidden_size, 10)

    def forward(self, inputs, states):
        out, states = self.rnn(inputs, states)
        out = self.fc(out)
        out = F.log_softmax(out, dim=-1)
        return out, states

    def state(self, _batch):
        # return initialized hidden state
        return torch.zeros(1, _batch, self.hidden_size).to(device)

In [None]:
model = RNN(input_size, hidden_size).to(device)

In [None]:
criterion = nn.NLLLoss()
optimizer = optim.Adam(model.parameters(), lr=lr)

In [None]:
TRAIN_DATASET.transform = transforms.ToTensor()
train_loader = torch.utils.data.DataLoader(TRAIN_DATASET,
                                           batch_size=batch,
                                           shuffle=True)

TEST_DATASET.transform = transforms.ToTensor()
test_loader = torch.utils.data.DataLoader(TEST_DATASET,
                                          batch_size=batch,
                                          shuffle=True)

In [None]:
for epoch in range(epochs):
    model.train()
    optimizer.zero_grad()
    loss = 0
    
    for image, label in train_loader:
        image = image.view(-1, 1, input_size).to(device)
        label = label.to(device)
        out, state = model(image, model.state(batch))
        loss += criterion(out.squeeze(), label)

    loss.backward()
    optimizer.step()
    
    if not (epoch % 2):
        model.eval()
        print(f'Loss: {loss.item()}')
        for image, label in test_loader:
            image = image.view(-1, 1, input_size).to(device)
            label = label.to(device)
            out, state = model(image, model.state(batch))

In [None]:
model.eval()

for _, (image, label) in zip(range(5), TEST_DATASET):
    image = image.view(-1, 1, input_size).to(device)
    out, _ = model(image, model.state(1))
    
    show((image.detach().cpu().numpy().reshape(28, 28) * 255).astype(np.uint8))
    print(f'Label: {label}, prediction: {out.argmax()}')