# RNN Example

- **Instructor**: Jongwoo Lim / Jiun Bae
- **Email**: [jlim@hanyang.ac.kr](mailto:jlim@hanyang.ac.kr) / [jiunbae.623@gmail.com](mailto:jiunbae.623@gmail.com)

## Sequential data prediction

If you are using a neural network to solve the problem of predicting a sequence, such as sentence or time series data, the size of the vector representing the sequence if you want the values you want to predict depend on older, older data. Should be increased.

## RNN (Recurrent Neural Network)

RNN (Recurrent Neural Network) is a neural network structure that can predict long sequences by storing the state of a neuron and using it as input in the next step. Here, we will look at the basic structure of RNN and how to implement RNN supported by the Keras Python package.


The general feedforward neural network structure is shown as the result of applying the activation function $\sigma$ to the product of the output vector $y$, the input $x$ vector and the neural network weight matrix $U$ as follows.

$$\sigma(Ux)$$

In the case of MLP (Multi-Layer Perceptron) having one hidden layer, it can be expressed as follows:

$$h = \sigma(Ux)$$
$$o = \sigma(Vh)$$

In this equation, $h$ is the hidden layer vector, $o$ is the output vector, $U$ is the weight matrix from the input to the hidden layer, and $V$ is the weight matrix from the hidden layer to the output.


RNN outputs a status vector $s$ in addition to the output vector $o$. The state vector is similar to a kind of hidden layer vector, but depends on the input vector as well as the previous state vector value. The output vector depends on the value of the state vector.

$$s_t = \sigma(Ux_t + Ws_{t-1})$$
$$o_t = \sigma(Vs_t)$$

RNN has a similar effect to MLP, which has an infinite number of hidden layers when connected and unfolded according to time steps. The figure is as follows.

![rnn](../assets/rnn.png)

The difference is that RNN can process time series data because the state changes from the previous input. The status of the input sequence may change depending on the sequence of the input vectors.

It can be divided into the following according to the type of input and result.

![sequentail](../assets/sequential.png)

# Code

In [None]:
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.autograd import Variable
from torchvision import datasets, transforms

import matplotlib.pyplot as plt

## Dataset

$Sin$ function prediction
$$y = sin(x)$$

In [None]:
size = 1000

In [None]:
x_ = np.arange(size)
y_ = np.sin(np.pi * .125 * x_)

In [None]:
plt.plot(y_)
plt.xlim(0, 100)
plt.ylim(-1, 1)
plt.show()

In [None]:
input_size = 5
hidden_size = 30

batch_size = 32

In [None]:
train_x = np.empty((size - input_size, input_size))
train_y = np.empty((size - input_size, 1))

In [None]:
for x in x_[:-input_size]:
    train_x[x] = y_[x:x + input_size]
    train_y[x] = y_[x + input_size]

### First train data and label

In [None]:
plt.plot(train_x[0], c='r')
plt.scatter(np.arange(input_size), train_x[0], c='b')
plt.scatter((input_size,), train_y[0], c='g')
plt.xlim(0, 10)
plt.ylim(-1, 1)
plt.show()

## Simple RNN Model

Network forwards RNN and fc

![sequential-figure](../assets/sequential-figure.png)

In [None]:
class RNN(nn.Module):
    def __init__(self, input_size, hidden_size):
        super(RNN, self).__init__()

        self.hidden_size = hidden_size

        self.rnn = nn.RNN(input_size, hidden_size, 1, bias=True, batch_first=True, 
                          nonlinearity='tanh', dropout=0)
        self.fc = nn.Linear(hidden_size, 1)

    def forward(self, inputs, states):
        out, states = self.rnn(inputs, states)
        out = self.fc(out)
        
        return out, states

    def state(self):
        # return initialized hidden state
        return torch.zeros(1, 1, self.hidden_size).to(device)

In [None]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

In [None]:
model = RNN(input_size, hidden_size).to(device)

### Before training

Prediction of un trained model

In [None]:
state = model.state()
preds = [
    model(torch.Tensor(train_x[x]).view(1, 1, 5).to(device), state)[0].item()
    for x in x_[:-input_size]
]

In [None]:
plt.plot(preds, c='b')
plt.plot(train_y, c='r')
plt.xlim(0, 100)
plt.ylim(-1, 1)
plt.show()

## Train

In [None]:
torch.manual_seed(42) # 42, THE ANSWER TO LIFE, THE UNIVERSE AND EVERYTHING

batch = 64            # batch size
lr = .01              # learning rate
epochs = 33

MSE loss and Adam optimizer

In [None]:
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=lr)

In [None]:
for epoch in range(epochs):
    model.train()
    optimizer.zero_grad()
    state = model.state()
    loss = 0
    
    for train, label in zip(train_x, train_y):
        train = torch.Tensor(train).view(1, 1, input_size).to(device)
        label = torch.Tensor(label).to(device)
        out, state = model(train, state)
        
        loss += criterion(out.squeeze(), label.squeeze())
    
    loss.backward()
    optimizer.step()
    
    if not (epoch % 8):
        model.eval()
        print(f'Loss: {loss.item()}')
        state = model.state()
        preds = [
            model(torch.Tensor(train_x[x]).view(1, 1, 5).to(device), state)[0].item()
            for x in x_[:-input_size]
        ]
        
        plt.plot(preds, c='b')
        plt.plot(train_y, c='r')
        plt.xlim(0, 100)
        plt.ylim(-1, 1)
        plt.show()

## Q1. Change Vanila RNN to LSTM

In pytorch, LSTM can be used like this:

```
nn.LSTM(
    input_size,
    hidden_size,
)
```

In [None]:
class LSTM(nn.Module):
    def __init__(self, input_size, hidden_size):
        super(LSTM, self).__init__()

        self.hidden_size = hidden_size

        self.lstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=True, dropout=0)
        self.fc = nn.Linear(hidden_size, 1)

    def forward(self, inputs, states):
        out, states = self.lstm(inputs, states)
        out = self.fc(out)
        
        return out, states
    
    def state(self):
        return torch.zeros(1, 1, self.hidden_size).to(device), torch.zeros(1, 1, self.hidden_size).to(device)

In [None]:
model = LSTM(input_size, hidden_size).to(device)

## Q2. Change pattern length

how about long long pattern like $y = sin(\frac{1} {10} x)$

In [None]:
size = 1000

In [None]:
x_ = np.arange(size)
y_ = np.sin(1/10 * np.pi * .125 * x_)

In [None]:
plt.plot(y_)
plt.xlim(0, 100)
plt.ylim(-1, 1)
plt.show()

## Q3. Change input_size

input_size represents the number of consecutive values received as input. If the pattern length increases, you can adjust the input_size to increase the performance of the model.

In [None]:
input_size = 32

In [None]:
train_x = np.empty((size - input_size, input_size))
train_y = np.empty((size - input_size, 1))

In [None]:
for x in x_[:-input_size]:
    train_x[x] = y_[x:x + input_size]
    train_y[x] = y_[x + input_size]

In [None]:
plt.plot(train_x[0], c='r')
plt.scatter(np.arange(input_size), train_x[0], c='b')
plt.scatter((input_size,), train_y[0], c='g')
plt.xlim(0, 48)
plt.ylim(-1, 1)
plt.show()