# 8. Recurrent Neural Network with PyTorch

## 1. About Recurrent Neural Network

### 1.1 Feedforward Neural Networks Transition to Recurrent Neural Networks
- **RNN is essentially an FNN**

<img src = './images/10-01.png' width=90%>

<img src = './images/10-04.png' width=80%>

<img src = './images/10-05.png' width=80%>

<img src = './images/10-08.png' width=80%>

<img src = './images/10-11.png' width=80%>

<img src = './images/10-14.png' width=90%>

# 2. Building a Recurrent Neural Network with PyTorch

## Model A: 1 Hidden Layer (ReLU)

- Unroll 28 time steps
    - Each step input size: 28 x 1
    - Total per unroll: 28 x 28
        -Feedforward Neural Network input size: 28 x 28
    - 1 Hidden layer
    - ReLU Activation Function

<img src = './images/10-17.png' width=65%>
- ... 28 times

### Steps
- Step 1: Load Dataset
- Step 2: Make Dataset Iterable
- Step 3: Create Model Class
- Step 4: Instantiate Model Class
- Step 5: Instantiate Loss Class
- Step 6: Instantiate Optimizer Class
- Step 7: Train Model

### Step 1: Loading MNIST Train Dataset
#### Images from 0 to 9

In [2]:
import torch
import torch.nn as nn
import torchvision.transforms as transforms
import torchvision.datasets as dsets
from torch.autograd import Variable

In [3]:
train_dataset = dsets.MNIST(root = './data',
                            train = True,
                            transform = transforms.ToTensor(),
                            download = True)

test_dataset = dsets.MNIST(root = './data',
                           train = False,
                           transform = transforms.ToTensor())

In [5]:
print (train_dataset.train_data.size())

torch.Size([60000, 28, 28])


In [6]:
print (train_dataset.train_labels.size())

torch.Size([60000])


In [7]:
print (test_dataset.test_data.size())

torch.Size([10000, 28, 28])


In [8]:
print (test_dataset.test_labels.size())

torch.Size([10000])


### Step 2: Make Dataset Iterable

In [9]:
batch_size = 100
n_iters = 3000
num_epochs = n_iters / (len(train_dataset) / batch_size)
num_epochs = int(num_epochs)

train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
                                           batch_size=batch_size,
                                           shuffle=True)
test_loader = torch.utils.data.DataLoader(dataset=test_dataset,
                                          batch_size=batch_size,
                                          shuffle=False)

### Step 3: Create Model Class

<img src="./images/10-20.png" width=70%>

In [16]:
class RNNModel(nn.Module):
    
    def __init__(self, input_dim, hidden_dim, layer_dim, output_dim):
        super(RNNModel, self).__init__()
        
        # Hidden dimensions
        self.hidden_dim = hidden_dim
        
        # Number of hidden layers
        self.layer_dim = layer_dim
        
        # Building your RNN
        # batch_first = True causes input/output ensors to be of shape
        self.rnn = nn.RNN(input_dim, hidden_dim, layer_dim, batch_first=True, nonlinearity='relu')
        
        # Readout layer
        self.fc = nn.Linear(hidden_dim, output_dim)
        
        def forward(self, x):
        
            # Initialize hidden state with zeros
            # (layer_dim, batch_size, hidden_dim)
            h0 = Variable(torch.zeros(self.layer_dim, x.size(0), self.hidden_dim))

            out, hn = self.rnn(x, h0)

            # Index hidden state of last time step
            # out.size() --> 100, 28, 100
            # out[:, -1, :] --> just want last time step hidden states!
            out = self.fc(out[:, -1, :])
            # out.size() -> 100, 10
            return out
            

### Step 4: Instantiate Model Class

- 28 time steps
    - Each time step: input dimension = 28
- 1 hidden layer
- MNIST 0-9 digits -> output dimension = 10

In [14]:
input_dim = 28
hidden_dim = 100
layer_dim = 1
output_dim = 10

In [17]:
model = RNNModel(input_dim, hidden_dim, layer_dim, output_dim)

### Step 5: Instantiate Loss Class
- Recurrent Neural Network: **Cross Entropy Loss**
    - *Convolutional Neural Network*: **Cross Entropy Loss**
    - *Feedforward Neural Network*: **Cross Entropy Loss**
    - *Logistic Regression*: **Cross Entropy Loss**
    - *Linear Regression*: **MSE**

In [18]:
criterion = nn.CrossEntropyLoss()

### Step 6: Instantiate Optimizer Class

- Simplified equation
    - $\theta = \theta - \eta \cdot \nabla_{\theta}$
        - $\theta$: parameters (our variables)
        - $\eta$: learning rate (how fast we want to learn)
        - $\nabla_{\theta}$: parameters' gradients
        
    - Even simplier equation
        - parameters = parameters - learning_rate * parameters_gradients
        - **At every iteration, we update our model's parameters**

In [20]:
learning_rate = 0.1

optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

### Parameters In-Depth

In [21]:
len(list(model.parameters()))

6

#### Parameters

- Input to Hidden Layer Linear Function
    - A1, B1
- Hidden Layer to Output Linear Function
    - A2, B2
- Hidden Layer to Hidden Layer Linear Function
    - A3, B3

<img src = './images/10-08.png' width=80%>