# Recurrent Neural Network

## 1. About Recurrent Neural Network

### Feedforward Neural Networks Transition to Recurrent Neural Networks
* RNN is essentially an FNN
<img src="https://user-images.githubusercontent.com/60699771/86014019-c6010500-ba5a-11ea-8181-f31ccd004004.png" align=left>

<img src="https://user-images.githubusercontent.com/60699771/86013571-43784580-ba5a-11ea-9d3a-876703450c4f.png" align=left>
<img src="https://user-images.githubusercontent.com/60699771/86013586-46733600-ba5a-11ea-920d-137c0114739c.png" align=left>
<img src="https://user-images.githubusercontent.com/60699771/86014529-6a834700-ba5b-11ea-8759-965f3740217c.png" align=left>

<img src="https://user-images.githubusercontent.com/60699771/86014683-9dc5d600-ba5b-11ea-8094-55f85e5a7252.png" align=left>

<img src="https://user-images.githubusercontent.com/60699771/86015182-352b2900-ba5c-11ea-9f24-755add53c243.png" align=left>
<img src="https://user-images.githubusercontent.com/60699771/86015190-378d8300-ba5c-11ea-80cb-ce67c828ec78.png" align=left>
<img src="https://user-images.githubusercontent.com/60699771/86015202-39574680-ba5c-11ea-8d2e-d9cefde48c3b.png" align=left>

<img src="https://user-images.githubusercontent.com/60699771/86015220-3eb49100-ba5c-11ea-90ed-a36a9734c12b.png" align=left>
<img src="https://user-images.githubusercontent.com/60699771/86015921-23965100-ba5d-11ea-9667-f4b7cbca5012.png" align=left>

## 2. Building a Recurrent Neural Network

### 2.1 Model A. Hidden Layer (ReLU)
* Unroll 28 time steps
    * Each step input size: 28 * 1
    * Total per unroll: 28 * 28
        * Feedforward Neural Network input size: 28 * 28
* 1 Hidden layer
* ReLU Activation Function

<img src="https://user-images.githubusercontent.com/60699771/86013571-43784580-ba5a-11ea-9d3a-876703450c4f.png" align=left>
<img src="https://user-images.githubusercontent.com/60699771/86013586-46733600-ba5a-11ea-920d-137c0114739c.png" align=left>
<img src="https://user-images.githubusercontent.com/60699771/86014529-6a834700-ba5b-11ea-8759-965f3740217c.png" align=left>

### Steps
* Step 1 : Load Dataset
* Step 2 : Make Dataset Iterable
* Step 3 : Create Model Class
* Step 4 : Instantiate Model Class
* Step 5 : Instantiate Loss Class
* Step 6 : Instantiate Optimizer Class
* Step 7 : Train Model

### Step 1.  Loading MNIST Train Dataset
* images from 0 to 9

In [1]:
import torch
import torch.nn as nn
import torchvision.transforms as transforms
import torchvision.datasets as dsets
from torch.autograd import Variable

In [2]:
train_dataset = dsets.MNIST(root='./data',
                           train= True,
                           transform= transforms.ToTensor(),
                           download=True)

test_dataset = dsets.MNIST(root='./data',
                          train=False,
                          transform = transforms.ToTensor())

### Step 2. Make Dataset Iterable

In [3]:
batch_size = 100
n_iters = 3000
num_epochs = n_iters / (len(train_dataset) / batch_size)
num_epochs = int(num_epochs)

train_loader = torch.utils.data.DataLoader(dataset=train_dataset, 
                                           batch_size=batch_size, 
                                           shuffle=True)

test_loader = torch.utils.data.DataLoader(dataset=test_dataset, 
                                           batch_size=batch_size, 
                                           shuffle=False)

### Step 3. Create Model Class

In [22]:
class RNNModel(nn.Module):
    def __init__(self, input_dim, hidden_dim, layer_dim, output_dim):
        super(RNNModel, self).__init__()
        # Hidden dimensions
        self.hidden_dim = hidden_dim
        
        # Number of hidden layers
        self.layer_dim = layer_dim
        
        # Building your RNN
        # batch_first=True causes input/output tensors to be of shape
        # (batch_dim, seq_dim, feature_dim)
        self.rnn = nn.RNN(input_dim, hidden_dim, layer_dim, batch_first=True, nonlinearity='tanh')
        
        # Readout layer
        self.fc = nn.Linear(hidden_dim, output_dim)
    
    def forward(self, x):
        # Initialize hidden state with zeros
        # (layer_dim, batch_size, hidden_dim
        h0 = Variable(torch.zeros(self.layer_dim, x.size(0), self.hidden_dim)) # x.size(0) = atch_dim
        
        # One time step
        out, hn = self.rnn(x, h0)
        
        # Index hidden state of last time step
        # out.size() --> 100, 28, 100
        # out[:, -1, :] --> 100, 100 --> just want last time step hidden states! 
        out = self.fc(out[:, -1, :]) 
        # out.size() --> 100, 10
        return out
        

### Step 4. Instantiate Model Class
* 28 time steps
    * Each time step: input dimension = 28
* 1 hidden layer
* MNIST 1-9 digits -> output dimension = 10

In [23]:
input_dim = 28
hidden_dim = 100
layer_dim = 1
output_dim = 10

In [24]:
model = RNNModel(input_dim, hidden_dim, layer_dim, output_dim)

### Step 5. Instantiate Loss Class
* Recurrent Neural Network: Cross Entropy Loss
    * cf : Feedforward Nerual Network : Cross Entropy Loss
    * cf : Linear Regression: MSE
    * cf : Logistic Regression : Cross Entropy Loss
    * cf : Convolutional Neural Network: Cross Entropy Loss

In [25]:
criterion = nn.CrossEntropyLoss()

### Step 6. instantiate Optimizer Class
<img src="https://user-images.githubusercontent.com/60699771/85815174-b8215a80-b7a2-11ea-917f-de5c35eb9ea9.png" align=left>

In [26]:
learning_rate = 0.1
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

#### Parameteres in Depth

In [27]:
len(list(model.parameters()))

6

* Input to Hidden Linear Function
    * A1, B1
* Hidden Layer to Output Linear Function
    * A2, B2
* Hidden Layer to Hidden Linear Function
    *A3, B3

In [28]:
# Input -> Hidden (A1)
print(list(model.parameters())[0].size())

# Input -> Hidden Bias (B1)
print(list(model.parameters())[1].size())

# Hidden -> Hidden (A3)
print(list(model.parameters())[2].size())

# Hidden -> Hidden Bias (B3)
print(list(model.parameters())[3].size())

# Hidden -> Output (A2)
print(list(model.parameters())[4].size())

# Hidden -> Output Bias (B2)
print(list(model.parameters())[5].size())

torch.Size([100, 28])
torch.Size([100, 100])
torch.Size([100])
torch.Size([100])
torch.Size([10, 100])
torch.Size([10])


### Step7: Train a Model
* process
    1. Convert inputs/labels to variables
        * RNN Input: (1, 28)
        * CNN Input: (1, 28, 28)
        * Feedforward NN Input: (1, 28*28)
    2. Clear gradient buffets
    3. Get output given inputs
    4. Get loss
    5. Get gradients w.r.t parameters
    6. Update parameters using gradients
        * parameters = parameters - lr * parameters_gradients
    7. Repeat

In [30]:
# Number of steps to unroll
seq_dim = 28

iter = 0
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        # Load Images as Variable
        images = Variable(images.view(-1, seq_dim, input_dim)) # (batch_size, seq_dim, input_dim)
        labels = Variable(labels)
        
        # Clear gradients w.r.t parameters
        optimizer.zero_grad()
        
        # Forward pass to get output/logits
        # outputs.size() -> 100, 10
        outputs = model.forward(images)
        
        # Calculate Loss: softmax -> cross entropy loss
        loss = criterion(outputs, labels)
        
        # Getting Gradients w.r.t parameters
        loss.backward()
        
        #Updating parameters
        optimizer.step()
        
        iter += 1
        
        if iter % 500 == 0:
            # Calculate Accuracy
            correct = 0
            total = 0
            # Iterate through test dataset
            for images, labels in test_loader:
                # Load images to a Torch Variable
                images = Variable(images.view(-1, seq_dim, input_dim))
                
                # Forward pass only to get logits/output
                outputs = model(images)
                
                # Get predictions from the maximum value
                _, predicted = torch.max(outputs.data, 1)
                
                # Total number of labels
                total += labels.size(0)

                # Total correct predictions
                correct += (predicted == labels).sum()
                
            accuracy = 100 * correct /total
            
            # Print Loss
            print('Iteration: {}. Loss: {}. Accuracy: {}'.format(iter, loss.data, accuracy))



Iteration: 500. Loss: 0.39562496542930603. Accuracy: 87
Iteration: 1000. Loss: 0.3929261863231659. Accuracy: 89
Iteration: 1500. Loss: 0.25924012064933777. Accuracy: 92
Iteration: 2000. Loss: 0.19637300074100494. Accuracy: 92
Iteration: 2500. Loss: 0.244120791554451. Accuracy: 93
Iteration: 3000. Loss: 0.23553629219532013. Accuracy: 89


### 2.2 Model B. Hidden Layer (ReLU)
* Unroll 28 time steps
    * Each step input size: 28 * 1
    * Total per unroll: 28 * 28
        * Feedforward Neural Network input size: 28 * 28
* **2 Hidden layer**
* ReLU Activation Function
<img src="https://user-images.githubusercontent.com/60699771/86015220-3eb49100-ba5c-11ea-90ed-a36a9734c12b.png" align=left>
<img src="https://user-images.githubusercontent.com/60699771/86015921-23965100-ba5d-11ea-9667-f4b7cbca5012.png" align=left>

### Steps
* Step 1 : Load Dataset
* Step 2 : Make Dataset Iterable
* Step 3 : Create Model Class
* **Step 4 : Instantiate Model Class**
* Step 5 : Instantiate Loss Class
* Step 6 : Instantiate Optimizer Class
* Step 7 : Train Model

In [32]:
import torch
import torch.nn as nn
import torchvision.transforms as transforms
import torchvision.datasets as dsets
from torch.autograd import Variable

'''
STEP 1: LOADING DATASET
'''
train_dataset = dsets.MNIST(root='./data', 
                            train=True, 
                            transform=transforms.ToTensor(),
                            download=True)

test_dataset = dsets.MNIST(root='./data', 
                           train=False, 
                           transform=transforms.ToTensor())

'''
STEP 2: MAKING DATASET ITERABLE
'''

batch_size = 100
n_iters = 3000
num_epochs = n_iters / (len(train_dataset) / batch_size)
num_epochs = int(num_epochs)

train_loader = torch.utils.data.DataLoader(dataset=train_dataset, 
                                           batch_size=batch_size, 
                                           shuffle=True)

test_loader = torch.utils.data.DataLoader(dataset=test_dataset, 
                                          batch_size=batch_size, 
                                          shuffle=False)

'''
STEP 3: CREATE MODEL CLASS
'''

class RNNModel(nn.Module):
    def __init__(self, input_dim, hidden_dim, layer_dim, output_dim):
        super(RNNModel, self).__init__()
        # Hidden dimensions
        self.hidden_dim = hidden_dim
        
        # Number of hidden layers
        self.layer_dim = layer_dim
        
        # Building your RNN
        # batch_first=True causes input/output tensors to be of shape
        # (batch_dim, seq_dim, feature_dim)
        self.rnn = nn.RNN(input_dim, hidden_dim, layer_dim, batch_first=True, nonlinearity='relu')
        
        # Readout layer
        self.fc = nn.Linear(hidden_dim, output_dim)
    
    def forward(self, x):
        # Initialize hidden state with zeros
        h0 = Variable(torch.zeros(self.layer_dim, x.size(0), self.hidden_dim))
            
        # One time step
        out, hn = self.rnn(x, h0)
        
        # Index hidden state of last time step
        # out.size() --> 100, 28, 100
        # out[:, -1, :] --> 100, 100 --> just want last time step hidden states! 
        out = self.fc(out[:, -1, :]) 
        # out.size() --> 100, 10
        return out

'''
STEP 4: INSTANTIATE MODEL CLASS
'''
input_dim = 28
hidden_dim = 100
layer_dim = 2  # ONLY CHANGE IS HERE FROM ONE LAYER TO TWO LAYER
output_dim = 10

model = RNNModel(input_dim, hidden_dim, layer_dim, output_dim)

print(model)
print(len(list(model.parameters())))
for i in range(len(list(model.parameters()))):
    print(list(model.parameters())[i].size())
    
'''
STEP 5: INSTANTIATE LOSS CLASS
'''
criterion = nn.CrossEntropyLoss()

'''
STEP 6: INSTANTIATE OPTIMIZER CLASS
'''
learning_rate = 0.1

optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)  

'''
STEP 7: TRAIN THE MODEL
'''

# Number of steps to unroll
seq_dim = 28

iter = 0
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        # Load Images as Variable
        images = Variable(images.view(-1, seq_dim, input_dim)) # (batch_size, seq_dim, input_dim)
        labels = Variable(labels)
        
        # Clear gradients w.r.t parameters
        optimizer.zero_grad()
        
        # Forward pass to get output/logits
        # outputs.size() -> 100, 10
        outputs = model.forward(images)
        
        # Calculate Loss: softmax -> cross entropy loss
        loss = criterion(outputs, labels)
        
        # Getting Gradients w.r.t parameters
        loss.backward()
        
        #Updating parameters
        optimizer.step()
        
        iter += 1
        
        if iter % 500 == 0:
            # Calculate Accuracy
            correct = 0
            total = 0
            # Iterate through test dataset
            for images, labels in test_loader:
                # Load images to a Torch Variable
                images = Variable(images.view(-1, seq_dim, input_dim))
                
                # Forward pass only to get logits/output
                outputs = model(images)
                
                # Get predictions from the maximum value
                _, predicted = torch.max(outputs.data, 1)
                
                # Total number of labels
                total += labels.size(0)

                # Total correct predictions
                correct += (predicted == labels).sum()
                
            accuracy = 100 * correct /total
            
            # Print Loss
            print('Iteration: {}. Loss: {}. Accuracy: {}'.format(iter, loss.data, accuracy))

RNNModel(
  (rnn): RNN(28, 100, num_layers=2, batch_first=True)
  (fc): Linear(in_features=100, out_features=10, bias=True)
)
10
torch.Size([100, 28])
torch.Size([100, 100])
torch.Size([100])
torch.Size([100])
torch.Size([100, 100])
torch.Size([100, 100])
torch.Size([100])
torch.Size([100])
torch.Size([10, 100])
torch.Size([10])
Iteration: 500. Loss: 1.0093106031417847. Accuracy: 63
Iteration: 1000. Loss: 0.8149239420890808. Accuracy: 76
Iteration: 1500. Loss: 0.360173761844635. Accuracy: 88
Iteration: 2000. Loss: 0.4796923100948334. Accuracy: 88
Iteration: 2500. Loss: 0.09811796247959137. Accuracy: 94
Iteration: 3000. Loss: 0.14625950157642365. Accuracy: 94


### 10 sets of parameters
* First hidden Layer
    * A1 = [100, 28]
    * A3 = [100, 100]
    * B1 = [100]
    * B3 = [100]
* Second hidden layer
    * A2 = [100, 100]
    * A5 = [100,100]
    * B2 = [100]
    * B5 = [100]
* Readout layer
    * A5 = [10, 100]
    * B5 = [10]

### Steps
* Step 1 : Load Dataset
* Step 2 : Make Dataset Iterable
* Step 3 : Create Model Class
* **Step 4 : Instantiate Model Class**
* Step 5 : Instantiate Loss Class
* Step 6 : Instantiate Optimizer Class
* Step 7 : Train Model

### 2.2 Model B. Hidden Layer (ReLU)
* Unroll 28 time steps
    * Each step input size: 28 * 1
    * Total per unroll: 28 * 28
        * Feedforward Neural Network input size: 28 * 28
* 2 Hidden layer
* **Tahh Activation Function**

In [33]:
import torch
import torch.nn as nn
import torchvision.transforms as transforms
import torchvision.datasets as dsets
from torch.autograd import Variable

'''
STEP 1: LOADING DATASET
'''
train_dataset = dsets.MNIST(root='./data', 
                            train=True, 
                            transform=transforms.ToTensor(),
                            download=True)

test_dataset = dsets.MNIST(root='./data', 
                           train=False, 
                           transform=transforms.ToTensor())

'''
STEP 2: MAKING DATASET ITERABLE
'''

batch_size = 100
n_iters = 3000
num_epochs = n_iters / (len(train_dataset) / batch_size)
num_epochs = int(num_epochs)

train_loader = torch.utils.data.DataLoader(dataset=train_dataset, 
                                           batch_size=batch_size, 
                                           shuffle=True)

test_loader = torch.utils.data.DataLoader(dataset=test_dataset, 
                                          batch_size=batch_size, 
                                          shuffle=False)

'''
STEP 3: CREATE MODEL CLASS
'''

class RNNModel(nn.Module):
    def __init__(self, input_dim, hidden_dim, layer_dim, output_dim):
        super(RNNModel, self).__init__()
        # Hidden dimensions
        self.hidden_dim = hidden_dim
        
        # Number of hidden layers
        self.layer_dim = layer_dim
        
        # Building your RNN
        # batch_first=True causes input/output tensors to be of shape
        # (batch_dim, seq_dim, feature_dim)
        self.rnn = nn.RNN(input_dim, hidden_dim, layer_dim, batch_first=True, nonlinearity='tanh')
        
        # Readout layer
        self.fc = nn.Linear(hidden_dim, output_dim)
    
    def forward(self, x):
        # Initialize hidden state with zeros
        h0 = Variable(torch.zeros(self.layer_dim, x.size(0), self.hidden_dim))
            
        # One time step
        out, hn = self.rnn(x, h0)
        
        # Index hidden state of last time step
        # out.size() --> 100, 28, 100
        # out[:, -1, :] --> 100, 100 --> just want last time step hidden states! 
        out = self.fc(out[:, -1, :]) 
        # out.size() --> 100, 10
        return out

'''
STEP 4: INSTANTIATE MODEL CLASS
'''
input_dim = 28
hidden_dim = 100
layer_dim = 2  # ONLY CHANGE IS HERE FROM ONE LAYER TO TWO LAYER
output_dim = 10

model = RNNModel(input_dim, hidden_dim, layer_dim, output_dim)

print(model)
print(len(list(model.parameters())))
for i in range(len(list(model.parameters()))):
    print(list(model.parameters())[i].size())
    
'''
STEP 5: INSTANTIATE LOSS CLASS
'''
criterion = nn.CrossEntropyLoss()

'''
STEP 6: INSTANTIATE OPTIMIZER CLASS
'''
learning_rate = 0.1

optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)  

'''
STEP 7: TRAIN THE MODEL
'''

# Number of steps to unroll
seq_dim = 28

iter = 0
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        # Load Images as Variable
        images = Variable(images.view(-1, seq_dim, input_dim)) # (batch_size, seq_dim, input_dim)
        labels = Variable(labels)
        
        # Clear gradients w.r.t parameters
        optimizer.zero_grad()
        
        # Forward pass to get output/logits
        # outputs.size() -> 100, 10
        outputs = model.forward(images)
        
        # Calculate Loss: softmax -> cross entropy loss
        loss = criterion(outputs, labels)
        
        # Getting Gradients w.r.t parameters
        loss.backward()
        
        #Updating parameters
        optimizer.step()
        
        iter += 1
        
        if iter % 500 == 0:
            # Calculate Accuracy
            correct = 0
            total = 0
            # Iterate through test dataset
            for images, labels in test_loader:
                # Load images to a Torch Variable
                images = Variable(images.view(-1, seq_dim, input_dim))
                
                # Forward pass only to get logits/output
                outputs = model(images)
                
                # Get predictions from the maximum value
                _, predicted = torch.max(outputs.data, 1)
                
                # Total number of labels
                total += labels.size(0)

                # Total correct predictions
                correct += (predicted == labels).sum()
                
            accuracy = 100 * correct /total
            
            # Print Loss
            print('Iteration: {}. Loss: {}. Accuracy: {}'.format(iter, loss.data, accuracy))

RNNModel(
  (rnn): RNN(28, 100, num_layers=2, batch_first=True)
  (fc): Linear(in_features=100, out_features=10, bias=True)
)
10
torch.Size([100, 28])
torch.Size([100, 100])
torch.Size([100])
torch.Size([100])
torch.Size([100, 100])
torch.Size([100, 100])
torch.Size([100])
torch.Size([100])
torch.Size([10, 100])
torch.Size([10])
Iteration: 500. Loss: 0.5973575711250305. Accuracy: 80
Iteration: 1000. Loss: 0.4522297978401184. Accuracy: 90
Iteration: 1500. Loss: 0.18536020815372467. Accuracy: 93
Iteration: 2000. Loss: 0.3356154263019562. Accuracy: 84
Iteration: 2500. Loss: 0.1598687618970871. Accuracy: 95
Iteration: 3000. Loss: 0.06450636684894562. Accuracy: 95


## Results : Model A < Model B < Model C

### Deep Learning
* 2 Ways to expand a nerual network
    * More non-linear activation units(neurons)
    * More hidden Layers
* Cons
    * Need a larger dataset
        * CUrse of dimensionality
    * Does not neccessarily higher accuarvy