# RNN Pytorch

### Introduction

In this lesson, we'll work through building the hypothesis function for a recurrent neural network in Python.  In doing so, we'll explore how an RNN generates and uses hidden state for each word in a document.  Let's get started.

### Building Initial Functions

Let's take another look at our diagram for a recurrent neural network.

<img src="Rnn-diagram-no-pred.png"  width="60%">

As we can see, we begin with an index that represents each word.  Then from the index we get the related embedding.  Then we produce a hidden state by taking the embedding and multiplying by weights, and getting the previous hidden state and also multiplying by weights.  We'll combine the two computations with addition.  Ok, let's start implementing.

We'll begin by loading up some libraries, and let's initialize a word vector through the random number generator.

In [55]:
import torch
import torch.nn as nn

In [56]:
torch.manual_seed(15)
e_1 = torch.randn(4, 1)
e_1
# dog

tensor([[-0.7056],
        [ 0.6741],
        [-0.5454],
        [ 0.9107]])

And we can also initialize an inital hidden state $h_0$.

In [62]:
torch.manual_seed(13)
h_0 = torch.randn(1, 1)
h_0

tensor([[-0.1117]])

Now as we know, we'll produce the next hidden state with the function $H_t = e_t \cdot w_e +  h_{t-1}\cdot w_h$.  So we'll need weights to associate with both the embedding and the hidden state. 

In [84]:
torch.manual_seed(12)
w_h = torch.randn(1, 1)
w_e = torch.randn(4, 1)
w_h, w_e 

(tensor([[-0.2138]]),
 tensor([[-1.3780],
         [-0.0546],
         [ 0.4515],
         [ 0.7858]]))

Then we can get to the next hidden state by multiplying the embedding and the hidden state by their respective weights.

In [86]:
torch.mm(h_0, w_h) + torch.mm(e_1.T, w_e)
#        1x1, 1x1            1x4.    4x1 = 1x1

tensor([[1.4288]])

Now move the operation above, so we can reuse it.

In [70]:
def h_t(e_t, w_e, h_t_prev, w_h):
    return torch.mm(h_t_prev, w_h) + torch.mm(e_t.T, w_e)

In [71]:
h_t(e_1, w_e, h0, w_h)

tensor([[1.4288]])

Ok, so we now have our formula to calculate the next hidden state.  Now let's take a sequence of word embeddings, and predict the next hidden state for each word embedding, $e_t$.

In [78]:
# the dog jumped over a
E = torch.randn(4, 5)
E

tensor([[ 1.3029,  0.2045, -0.9202, -0.8919,  0.2516],
        [ 0.9675, -0.6870,  0.9042,  0.3286, -0.0742],
        [ 0.1414, -1.2538, -0.3456, -0.2211, -0.7043],
        [ 0.3368,  0.0064,  0.2326,  0.9527, -0.4139]])

In [107]:
torch.mm(e_t.T.unsqueeze(0), w_e) 

tensor([[-1.5197]])

In [81]:
hidden_states = [h_0]

In [108]:
for e_t in E.T:
    e_t = e_t.T.unsqueeze(0)
    
    prev_hidden = hidden_states[-1]
    hidden_prod = torch.mm(prev_hidden, w_h)
    embed_prod = torch.mm(e_t, w_e)
    new_h = hidden_prod + embed_prod
    hidden_states.append(new_h)

In [109]:
hidden_states

[tensor([[-0.1117]]),
 tensor([[-1.4958]]),
 tensor([[-0.4856]]),
 tensor([[1.3492]]),
 tensor([[1.5714]]),
 tensor([[-1.3220]])]

So we can see that there are six total hidden states.  There's one for each of the words, $e_1 ... e_t$, and plus the initial hidden state $h_0$.

### Object Oriented RNN

In [129]:
import torch.nn as nn
class RNN(nn.Module):
    def __init__(self, n_inputs, n_neurons):
        super(RNN, self).__init__()
        self.hidden_states = [torch.randn(1, 1)]
        self.We = torch.randn(n_inputs, n_neurons) # 4 X 1
        self.Wh = torch.randn(n_neurons, n_neurons) # 1 X 1
    
    def forward(self, e):
        embed_mult = torch.tanh(torch.mm(e, self.We)) # 4 X 1
        prev_hidden = self.hidden_states[-1]
        hidden_mult = torch.mm(prev_hidden, self.Wh) # 4 X 1
        hidden_state = torch.tanh(hidden_mult + embed_mult)
        self.hidden_states.append(hidden_state)
        return hidden_state
        #1x4 4x1

In [141]:
import torch.nn as nn
class RNN(nn.Module):
    def __init__(self, n_inputs, n_neurons):
        super(RNN, self).__init__()
        self.hidden_states = [torch.randn(1, n_neurons)]
        self.We = nn.Linear(n_inputs, n_neurons) # 4 X 1
        self.Wh = nn.Linear(n_neurons, n_neurons) # 1 X 1
    
    def forward(self, e):
        
        prev_hidden = self.hidden_states[-1]
        
        hidden_mult = self.Wh(prev_hidden) 
        embed_mult = self.We(e) # 4 X 1
        hidden_state = torch.tanh(hidden_mult + embed_mult)
        self.hidden_states.append(hidden_state)
        return hidden_state
        #1x4 4x1

In [145]:
rnn = RNN(4, 1)
rnn

RNN(
  (We): Linear(in_features=4, out_features=1, bias=True)
  (Wh): Linear(in_features=1, out_features=1, bias=True)
)

In [146]:
for e_t in E.T:
    rnn(e_t.unsqueeze(0))

In [147]:
rnn.hidden_states

[tensor([[-1.1709]]),
 tensor([[-0.0148]], grad_fn=<TanhBackward>),
 tensor([[0.0774]], grad_fn=<TanhBackward>),
 tensor([[0.1840]], grad_fn=<TanhBackward>),
 tensor([[0.2288]], grad_fn=<TanhBackward>),
 tensor([[0.3159]], grad_fn=<TanhBackward>)]

Now currrently we are setting up our rnn so that our hidden state is a single number.  But we can increase the number of neurons on our W_h, and increase the number of columsn in our hidden state.

In [148]:
rnn = RNN(4, 2)
rnn

RNN(
  (We): Linear(in_features=4, out_features=2, bias=True)
  (Wh): Linear(in_features=2, out_features=2, bias=True)
)

In [149]:
for e_t in E.T:
    rnn(e_t.unsqueeze(0))

In [150]:
rnn.hidden_states

[tensor([[-0.8746, -1.1934]]),
 tensor([[0.7477, 0.7169]], grad_fn=<TanhBackward>),
 tensor([[0.7183, 0.0297]], grad_fn=<TanhBackward>),
 tensor([[-0.0921, -0.0526]], grad_fn=<TanhBackward>),
 tensor([[0.3248, 0.5440]], grad_fn=<TanhBackward>),
 tensor([[0.6136, 0.2024]], grad_fn=<TanhBackward>)]

### Basic RNN

### Resources

[RNN Pytorch](https://medium.com/dair-ai/building-rnns-is-fun-with-pytorch-and-google-colab-3903ea9a3a79)

[NN from Scratch Pytorch](https://medium.com/dair-ai/a-simple-neural-network-from-scratch-with-pytorch-and-google-colab-c7f3830618e0)