# RNN Pytorch

### Introduction

### Loading our Data

Let's get started by loading our data.  We can begin by setting our seed.  And because we will be using cuda, we specify `cudnn.deterministic = True` so that the seed applies Cuda.  

In [7]:
import torch
from torchtext import data

SEED = 12

torch.manual_seed(SEED)
torch.backends.cudnn.deterministic = True

Then we initialize our `TEXT` and `LABEL` fields and download our data. 

In [None]:
TEXT = data.Field(tokenize = 'spacy')
LABEL = data.LabelField(dtype = torch.float)

Next up, let's download our data.

In [3]:
from torchtext import datasets

train_data, test_data = datasets.IMDB.splits(TEXT, LABEL)

downloading aclImdb_v1.tar.gz


aclImdb_v1.tar.gz: 100%|██████████| 84.1M/84.1M [00:47<00:00, 1.76MB/s]


We'll build the vocab and the download the associated embeddings.

In [28]:
TEXT.build_vocab(train_data, max_size = 25000, 
                 vectors = "glove.6B.100d", 
                 unk_init = torch.Tensor.normal_)
LABEL.build_vocab(train_data)

.vector_cache/glove.6B.zip: 862MB [15:24, 932kB/s]                                
100%|█████████▉| 399999/400000 [00:24<00:00, 16651.01it/s]


Finally, we use our BucketIterator to split our data into batches.

In [29]:
BATCH_SIZE = 80

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

train_iterator, test_iterator = data.BucketIterator.splits(
    (train_data, test_data), 
    batch_size = BATCH_SIZE,
    device = device)

In [30]:
for batch in train_iterator:
    text_batch = batch.text
    break

We can see that for each of the 80 documents in the batch we have our the tokens numericalized.

In [27]:
first_doc = text_batch[:, 0]
first_doc[:10]

tensor([  11,  524,   16,   24,   62,   90,    4, 1705, 3659,   99])

### Defining our RNN

At this point, we're ready to define our RNN.  As we know, the structure of our RNN is the following. 

* $x_t$
* $e_t = Ex_t$
* $H_t = F(e_tW_e, H_{t - 1}W_h)$

To accomplish this, we initialize an embedding layer.  Then, to generate the hidden states, we use the `nn.RNN` module, which takes as arguments the number of features per vector, as well as the number of neurons for our hidden state. 

In [50]:
import torch.nn as nn

class RNN(nn.Module):
    def __init__(self, num_embeddings, embedding_dim, num_neurons):
        super().__init__()
        self.embedding = nn.Embedding(num_embeddings, embedding_dim)
        self.rnn = nn.RNN(embedding_dim, num_neurons)
        
    def forward(self, text):
        #text = [sent len, batch size]
        embedded = self.embedding(text)
        #embedded = [sent len, batch size, emb dim]
        output, hidden = self.rnn(embedded)
        return output, hidden
#         return embedded

In [35]:
rnn = RNN(25002, 100, 10)
rnn

RNN(
  (embedding): Embedding(25002, 100)
  (rnn): RNN(100, 10)
)

In [38]:
text_batch.shape

torch.Size([909, 80])

In [37]:
rnn(text_batch).shape

torch.Size([909, 80, 100])

We can see that through the embedding layer, for each of the words, we now have added an embedding vector of length 100.  

Now, let's re-initialize our RNN, but this time change the RNN to return the tuple `output, hidden` which we get from the embedded layer.  

In [41]:
rnn = RNN(25002, 100, 10)
rnn

RNN(
  (embedding): Embedding(25002, 100)
  (rnn): RNN(100, 10)
)

In [44]:
output, hidden = rnn(text_batch)

In [48]:
hidden.shape

torch.Size([1, 80, 10])

We can see that for each of the eighty observations, we get a hidden state vector of length 10.  This `hidden` represents the hidden state after we pass through the last word embedding in the document.  The `output` is the entire hidden state for each word in each document.  

In [49]:
output.shape

torch.Size([909, 80, 10])

So we can see that for each of the 909 tokens in each document, for each observation, there is a hidden state of length 10.  This hidden state is important, because we can eventually pass the output from the hidden state into a linear layer to predict sentiment.  The concept is that we have trained the network to encode the sequence of words in the hidden state.  Let's train a neural network to predict the sentiment by adding a final linear layer. 

### Adding a Linear Layer

We can do so by passing through the final hidden state from each of our observations, hidden, into  our linear layer.

In [53]:
hidden.shape

torch.Size([1, 80, 10])

The shape of hidden is current 1, 80, 10, and we need to reshape it so that we have 80 rows (one for each observation) and one column for the length of each hidden state (10).  In other words, we need to get rid of that first dimension.

In [55]:
reshaped_hidden = hidden.squeeze(0)
reshaped_hidden.shape

torch.Size([80, 10])

Then we can pass this through a linear layer.

In [58]:
linear = nn.Linear(10, 2)

rnn_outputs = linear(reshaped_hidden)
rnn_outputs.shape

torch.Size([80, 2])

Ok, let's move this to our RNN.

In [59]:
import torch.nn as nn

class RNN(nn.Module):
    def __init__(self, num_embeddings, embedding_dim, num_neurons, ouput_len):
        super().__init__()
        self.embedding = nn.Embedding(num_embeddings, embedding_dim)
        self.rnn = nn.RNN(embedding_dim, num_neurons)
        self.linear = nn.Linear(num_neurons, ouput_len)
        
    def forward(self, text):        
        embedded = self.embedding(text)
        output, hidden = self.rnn(embedded)
        return self.linear(hidden.squeeze(0))

In [63]:
rnn_with_linear = RNN(25002, 100, 10, 2)
rnn_with_linear

RNN(
  (embedding): Embedding(25002, 100)
  (rnn): RNN(100, 10)
  (linear): Linear(in_features=10, out_features=2, bias=True)
)

### Training our RNN

Now let's move onto training our neural network.  The first step is to move over our pretrained embeddings to our embedding layer in the neural network.

In [81]:
vocab_vectors = TEXT.vocab.vectors

In [85]:
rnn_with_linear.embedding.weight.data.copy_(vocab_vectors)

tensor([[-0.1320, -0.1254,  0.3443,  ...,  1.6690, -0.1693,  0.2577],
        [ 1.1629, -0.1765,  0.8580,  ..., -0.7312, -0.3655,  0.9063],
        [-0.0382, -0.2449,  0.7281,  ..., -0.1459,  0.8278,  0.2706],
        ...,
        [ 0.9627, -0.0722,  0.2127,  ..., -0.0745, -0.4771,  0.0478],
        [-1.4992,  0.0301, -1.3628,  ..., -0.1628, -0.2913,  0.0398],
        [-0.5560, -0.4405, -0.1017,  ..., -1.8256,  0.4664,  0.5415]])

Then we zero out the padding and unknown vectors.

In [93]:
UNK_IDX = TEXT.vocab.stoi[TEXT.unk_token]
PAD_IDX = TEXT.vocab.stoi[TEXT.pad_token]
rnn_with_linear.embedding.weight.data[UNK_IDX] = torch.zeros(100)
rnn_with_linear.embedding.weight.data[PAD_IDX] = torch.zeros(100)

Now it's time to initialize the our optimizer and our loss function.

In [77]:
import torch.optim as optim

optimizer = optim.Adam(rnn_with_linear.parameters(), lr=.0005)

In [78]:
bce_loss = nn.BCEWithLogitsLoss()

Specify that the operations from the rnn and loss function operate on cuda.

In [79]:
rnn_with_linear = rnn_with_linear.to(device)
bce_loss = bce_loss.to(device)

And then begin the training loop.

In [80]:
for epoch in range(7):
    for batch in train_iterator:
        preds = rnn_with_linear(batch.text.cuda())
        loss = bce_loss(preds.squeeze(1), batch.label.to(device))
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
    print(loss)

Then we can see how we did.

In [1]:
def binary_accuracy(preds, y):
    rounded_preds = torch.round(torch.sigmoid(preds))
    correct = (rounded_preds == y).float() #convert into float for division 
    acc = correct.sum() / len(correct)
    return acc

In [2]:
accuracies = []
batch_lengths = []
with torch.no_grad():
    for batch in test_iterator:
        outputs = rnn_with_linear(batch.text)
        labels = batch.label
        accuracy = binary_accuracy(outputs, labels)
        accuracies.append(accuracy.item())
        batch_lengths.append(len(outputs))

In [None]:
sum([batch_length*accuracy for accuracy, batch_length in zip(accuracies, batch_lengths)])/sum(batch_lengths)
# 40 percent

Admittedly, this is not a very good score.  But worry not, RNNs come with a serious flaw, which we'll explore, and find a remedy for in the next lesson.

### Summary