# 8. Recurrent Neural Network with PyTorch

## 1. About Recurrent Neural Network

### 1.1 Feedforward Neural Networks Transition to Recurrent Neural Networks
- **RNN is essentially an FNN**

<img src = './images/10-01.png' width=90%>

<img src = './images/10-04.png' width=80%>

<img src = './images/10-05.png' width=80%>

<img src = './images/10-08.png' width=80%>

<img src = './images/10-11.png' width=80%>

<img src = './images/10-14.png' width=90%>

# 2. Building a Recurrent Neural Network with PyTorch

## Model A: 1 Hidden Layer (ReLU)

- Unroll 28 time steps
    - Each step input size: 28 x 1
    - Total per unroll: 28 x 28
        -Feedforward Neural Network input size: 28 x 28
    - 1 Hidden layer
    - ReLU Activation Function

<img src = './images/10-17.png' width=65%>
- ... 28 times

### Steps
- Step 1: Load Dataset
- Step 2: Make Dataset Iterable
- Step 3: Create Model Class
- Step 4: Instantiate Model Class
- Step 5: Instantiate Loss Class
- Step 6: Instantiate Optimizer Class
- Step 7: Train Model

### Step 1: Loading MNIST Train Dataset
#### Images from 0 to 9

In [1]:
import torch
import torch.nn as nn
import torchvision.transforms as transforms
import torchvision.datasets as dsets
from torch.autograd import Variable

In [2]:
train_dataset = dsets.MNIST(root='./data', 
                            train=True, 
                            transform=transforms.ToTensor(),
                            download=True)

test_dataset = dsets.MNIST(root='./data', 
                           train=False, 
                           transform=transforms.ToTensor())

In [3]:
print (train_dataset.train_data.size())

torch.Size([60000, 28, 28])


In [4]:
print (train_dataset.train_labels.size())

torch.Size([60000])


In [5]:
print (test_dataset.test_data.size())

torch.Size([10000, 28, 28])


In [6]:
print (test_dataset.test_labels.size())

torch.Size([10000])


### Step 2: Make Dataset Iterable

In [7]:
batch_size = 100
n_iters = 3000
num_epochs = n_iters / (len(train_dataset) / batch_size)
num_epochs = int(num_epochs)

train_loader = torch.utils.data.DataLoader(dataset=train_dataset, 
                                           batch_size=batch_size, 
                                           shuffle=True)

test_loader = torch.utils.data.DataLoader(dataset=test_dataset, 
                                          batch_size=batch_size, 
                                          shuffle=False)

### Step 3: Create Model Class

<img src="./images/10-20.png" width=70%>

In [8]:
class RNNModel(nn.Module):
    def __init__(self, input_dim, hidden_dim, layer_dim, output_dim):
        super(RNNModel, self).__init__()
        # Hidden dimensions
        self.hidden_dim = hidden_dim
        
        # Number of hidden layers
        self.layer_dim = layer_dim
        
        # Building your RNN
        # batch_first = True causes input/output ensors to be of shape
        # (batch_dim, seq_dim, input_dim)
        self.rnn = nn.RNN(input_dim, hidden_dim, layer_dim, batch_first=True, nonlinearity='relu')

        
        # Readout layer
        self.fc = nn.Linear(hidden_dim, output_dim)
        
    def forward(self, x):
        
        # Initialize hidden state with zeros
        # (layer_dim, batch_size, hidden_dim)
        h0 = Variable(torch.zeros(self.layer_dim, x.size(0), self.hidden_dim))

        out, hn = self.rnn(x, h0)

        # Index hidden state of last time step
        # out.size() --> 100, 28, 100    (batch_size, hidden_size)
        # One forward would return 28 steps result 
        # out[:, -1, :] --> just want last time step hidden states!
        out = self.fc(out[:, -1, :])
        # out.size() -> 100, 10
        return out

### Step 4: Instantiate Model Class

- 28 time steps
    - Each time step: input dimension = 28
- 1 hidden layer
- MNIST 0-9 digits -> output dimension = 10

In [9]:
input_dim = 28
hidden_dim = 100
layer_dim = 1
output_dim = 10

In [10]:
model = RNNModel(input_dim, hidden_dim, layer_dim, output_dim)

### Step 5: Instantiate Loss Class
- Recurrent Neural Network: **Cross Entropy Loss**
    - *Convolutional Neural Network*: **Cross Entropy Loss**
    - *Feedforward Neural Network*: **Cross Entropy Loss**
    - *Logistic Regression*: **Cross Entropy Loss**
    - *Linear Regression*: **MSE**

In [11]:
criterion = nn.CrossEntropyLoss()

### Step 6: Instantiate Optimizer Class

- Simplified equation
    - $\theta = \theta - \eta \cdot \nabla_{\theta}$
        - $\theta$: parameters (our variables)
        - $\eta$: learning rate (how fast we want to learn)
        - $\nabla_{\theta}$: parameters' gradients
        
    - Even simplier equation
        - parameters = parameters - learning_rate * parameters_gradients
        - **At every iteration, we update our model's parameters**

In [12]:
learning_rate = 0.1

optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

### Parameters In-Depth

In [13]:
len(list(model.parameters()))

6

#### Parameters

- Input to Hidden Layer Linear Function
    - A1, B1
- Hidden Layer to Output Linear Function
    - A2, B2
- Hidden Layer to Hidden Layer Linear Function
    - A3, B3

<img src = './images/10-08.png' width=80%>

In [14]:
# Input --> Hidden (A1)
list (model.parameters())[0].size()

torch.Size([100, 28])

In [15]:
# Input --> Hidden BIAS (B1)
list (model.parameters())[2].size()

torch.Size([100])

In [16]:
# Hidden -> Hidden (A3)
list (model.parameters())[1].size()

torch.Size([100, 100])

In [17]:
# Hidden -> Hidden BIAS (B3)
list (model.parameters())[3].size()

torch.Size([100])

In [18]:
# Hidden --> Output (A2)
list (model.parameters())[4].size()

torch.Size([10, 100])

In [19]:
# Hidden --> Output BIAS (B2)
list (model.parameters())[5].size()

torch.Size([10])

### Step 7: Train Model

- Process
    1. **Convert inputs/labels to variables**
        - RNN Input: (1, 28)
        - CNN Input: (1, 28, 28)
        - FNN Input: (1, 28\*28)
    2. Clear gradient buffets
    3. Get output given inputs
    4. Get loss
    5. Get gradients w.r.t. parameters
    6. Update parameters using gradients
        - parameters = parameters - learning_rate * parameters_gradients
    7. REPEAT

In [None]:
# Number of steps to unroll
seq_dim = 28  

iter = 0
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        # Load images ad Variable
        # resize our images to batch size
        images = Variable(images.view(-1, seq_dim, input_dim))
        labels = Variable(labels)
        
        # Clear gradients w.r.t. parameters
        optimizer.zero_grad()
        
        # Forward pass to get output/logits
        # outputs.size() --> 100, 10
        outputs = model(images)
        
        # Calculate Loss: softmax --> cross entropy loss
        loss = criterion(outputs, labels)
        
        # Getting gradients w.r.t. parameters
        loss.backward()
        
        # Updating parameters
        optimizer.step()
        
        iter += 1
        
        if iter % 500 == 0:
            # Calculate Accuracy
            correct = 0
            total = 0
            # Iterate through test dataset
            for images, labels in test_loader:
                # Load images to a Torch Variable
                images = Variable(images.view(-1, seq_dim, input_dim))
                
                # Forward pass only to get logits/output
                outputs = model(images)
                
                # Get predictions from the maximum value
                _, predicted = torch.max(outputs.data, 1)
                
                # Total number of labels
                total += labels.size(0)
                
                # Total correct predictions
                correct += (predicted == labels).sum()
                
            accuracy = 100 * correct / total
            
            # Print Loss
            
            print (f'Iteration: {iter}, Loss: {loss.item()}, Accuracy: {accuracy}')

Iteration: 500, Loss: 1.051323413848877, Accuracy: 61
Iteration: 1000, Loss: 1.3841475248336792, Accuracy: 27
Iteration: 1500, Loss: 0.734230637550354, Accuracy: 72
Iteration: 2000, Loss: 0.7189213037490845, Accuracy: 75
Iteration: 2500, Loss: 0.2827105224132538, Accuracy: 89
