# RNN
- <font size=3> Handle variable - length sequences</font>
- <font size=3> Track long-term dependencies</font>
- <font size=3> Maintain information about order</font>
- <font size=3> Share parameters across the sequence.</font>

<p align=left> 
<img src = "https://cs231n.github.io/assets/rnn/rnn_blackbox.png" height="720" width="120" align="left">
    <img src = "https://cs231n.github.io/assets/rnn/unrolledRNN.png" height="720" width="440" align="center">
    <em>Simplified RNN </em>&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp; <em>Unrolled RNN</em>
</p>

In [2]:
import torch
import torch.nn as nn
import torch.nn.functional as F

## RNN Equations

- Input vector:&emsp;&emsp;&emsp;&emsp;&emsp;$\large {X_t} $
- Update hidden state: &ensp;&nbsp;$\large H_t = tanh(W_{hh} H_{t-1} + W_{xh} X_t) $
- Output vector:&emsp;&emsp;&emsp;&emsp;$\large \hat y = W_{hy} H_t$

<img src = "https://cs231n.github.io/assets/rnn/vanilla_rnn_mformula_1.png">

<img src= "https://cs231n.github.io/assets/rnn/vanilla_rnn_mformula_2.png">

### Individual multiplication Vs Multiplication of ( Concatenation of wieghts,  Concatenation of (input,hidden) )

The calculation of $ {W_{hh} H_{t-1} + W_{xh} X_t} $ for the hidden state is equivalent to:<br>
Matrix Multiplication of concatenation: of $ {X_t}  and  {H_{t-1}} $ and concatenation of ${W_{xh}} and {W_{hh}} $.<br>
* Though this can be proven in mathematics, in the following we just use a simple code snippet to show this.
* To begin with, we define matrices X, W_xh, H, and W_hh, whose shapes are (3, 1), (1, 4), (3, 4), and (4, 4), respectively.
* Multiplying X by W_xh, and H by W_hh, respectively, and then adding these two multiplications, we obtain a matrix of shape (3, 4).

In [3]:
X, W_xh = torch.randn(3, 1), torch.randn(1, 4)
H, W_hh = torch.randn(3, 4), torch.randn(4, 4)

In [4]:
torch.matmul(X, W_xh) + torch.matmul(H, W_hh)

tensor([[ 0.1852,  3.4312,  1.3261, -1.7277],
        [ 0.0846,  2.3784, -3.5482, -4.2472],
        [-0.9283,  0.8541, -1.3491,  1.1663]])

* Now we concatenate the matrices $X and H$ along columns (axis 1), and the matrices $W_{xh} and W_{hh}$ along rows (axis 0).
* These two concatenations result in matrices of shape (3, 5) and of shape (5, 4), respectively.
* Multiplying these two concatenated matrices, we obtain the same output matrix of shape (3, 4) as above.

The weight matrices $ {W_x and W_h}$ are often concatenated vertically into a single weight matrix $W$ of shape (input_size + hidden_size) × hidden_size

In [5]:
input_combined = torch.cat((X, H), 1)
W = torch.cat((W_xh, W_hh), 0)

In [6]:
input_combined.shape, W.shape

(torch.Size([3, 5]), torch.Size([5, 4]))

In [7]:
torch.matmul(input_combined, W)

tensor([[ 0.1852,  3.4312,  1.3261, -1.7277],
        [ 0.0846,  2.3784, -3.5482, -4.2472],
        [-0.9283,  0.8541, -1.3491,  1.1663]])

#### Therefore, by both the ways we get the same result

Reference:https://www.youtube.com/watch?v=qjrad0V0uJE&ab_channel=AlexanderAmini<br> [Dive into Deep Learning](https://d2l.djl.ai/chapter_recurrent-neural-networks/rnn.html#recurrent-neural-networks-with-hidden-states)<br>
[image reference](https://cs231n.github.io/rnn/)<br>

## Concatenated multiplication Vs Linear layer

In [8]:
input_size = 1
hidden_size = 4

In [9]:
linear = nn.Linear(input_size+hidden_size, hidden_size, bias=False)

In [10]:
linear.weight.shape

torch.Size([4, 5])

In [11]:
#using same weights to verify if they calculate the same result
linear.weight.data = nn.Parameter(W.T)

In [12]:
linear(torch.cat((X,H), 1))

tensor([[ 0.1852,  3.4312,  1.3261, -1.7277],
        [ 0.0846,  2.3784, -3.5482, -4.2472],
        [-0.9283,  0.8541, -1.3491,  1.1663]], grad_fn=<MmBackward0>)

#### Thus both of them gave the same results when using same weights

## Model without concatenation of weights and [input, hidden]

In [None]:
class RNN(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super().__init__()
        self.w_xh = nn.Parameter(torch.empty(input_size, hidden_size))
        self.w_hh = nn.Parameter(torch.empty(hidden_size, hidden_size))
        
        self.w_hy = nn.Parameter(torch.empty(hidden_size, output_size))
        
        #using xavier initialization for input, hidden and output weight
        nn.init.xavier_normal_(self.w_xh, gain=1.0)
        nn.init.xavier_normal_(self.w_hh, gain=1.0)
        nn.init.xavier_normal_(self.w_hy, gain=1.0)
        
    def init_hidden(self):
        #initializing hidden state as a tensor of zeros
        hidden = torch.zeros((1,hidden_size), dtype=torch.float32)
        return hidden
        
    def forward(self, x, hidden):        
        linear_comb = torch.matmul(x, self.w_xh) + torch.matmul(hidden, self.w_hh)
        
        hidden = torch.tanh(linear_comb) #updating hidden state
        
        out = torch.matmul(hidden, self.w_hy) #calculationg output
        return out, hidden
    

## Model with concatenation of weights and [input,hidden]

In [14]:
class RNN(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super().__init__()
        self.w_xh = torch.empty(input_size, hidden_size)
        self.w_hh = torch.empty(hidden_size, hidden_size)
        #concatenating the wieghts
        self.W = nn.Parameter(torch.cat((self.w_xh, self.w_hh), 0))
        
        self.w_hy = nn.Parameter(torch.empty(hidden_size, output_size))
        
        #using xavier initialization for [input,hidden] weight and output weight
        nn.init.xavier_normal_(self.W, gain=1.0)
        nn.init.xavier_normal_(self.w_hy, gain=1.0)
        
    def init_hidden(self):
        #initializing hidden state as a tensor of zeros
        hidden = torch.zeros((1,hidden_size), dtype=torch.float32)
        return hidden
        
    def forward(self, x, hidden): 
        X = torch.cat((x, hidden), 1) #concatenating inputs
        
        linear_comb = torch.matmul(X, self.W)
        
        hidden = torch.tanh(linear_comb)  #updating hidden state
        
        out = torch.matmul(hidden, self.w_hy) #calculationg output
        return out, hidden

## Model with linear layers

In [None]:
class RNN(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super().__init__()
        self.i2h = nn.Linear(input_size+hidden_size, hidden_size)
        self.h2o = nn.Linear(hidden_size, output_size)
        
    def init_hidden(self):
        #initializing hidden state as a tensor of zeros
        hidden = torch.zeros((1,hidden_size), dtype=torch.float32)
        return hidden
        
    def forward(self, x, hidden):   
        combined_input = torch.cat((x, hidden), 1) #concatenating inputs
        
        linear_comb = self.i2h(combined_input)
        
        hidden = torch.tanh(linear_comb) #updating hidden state
        
        out = self.h2o(hidden) #calculationg output
        return out, hidden