
# Recurrent Neural Networks
### Let's try to use <code>Natural Language Processing</code> in using Recurrent Neural Networks

2022 DS Elective 4 <br>
University of Science and Technology of the Philippines <br>

**Romen Samuel Wabina, MSc** <br>
Instructor <br>
PhD Data Science and Artificial Intelligence

Objectives:
- To examine the use of RNN in Natural Language Processing

#### RECALL: An RNN is a <code>for</code> loop that reuses quantities computed during the previous iteration of the loop, nothing more.

The output $o_t$ of a recurrent layer for a single instance: 
$$
    h_t = \text{tanh}(W_{ih} x_t + b_{ih} + W_{hh} h_{(t-1)} + b_{h}) \\
$$


RNNs are characterized by their step function, such as the following function in this case
- <code>h_t = np.tanh(np.dot(W, input_t) + np.dot(U, state_t) + b)</code>

### Case study: Predicting the next words

Given some initial word (e.g., good), let's create some model that can predict the next characters til the specified length (e.g., good I am fine). To link with RNN, you can imagine each  as each of the character, i.e., 'g', 'o', 'o', 'd' depicted in integer.

In [2]:
import torch
from torch import nn
import numpy as np
import sys

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(device)

cpu


#### 1. Defining text input
First, we'll define the sentences

In [3]:
text = ['hey whats up','good day','they are cool']

Since computers don't understand characters, let's make some mapping between some integers and characters, which will be useful for making one hot encodings.

In [4]:
chars = set(''.join(text))

In [5]:
int2char = dict(enumerate(chars))
print(int2char)

{0: 'e', 1: 'u', 2: 'r', 3: 't', 4: 'a', 5: ' ', 6: 'h', 7: 'd', 8: 'c', 9: 'p', 10: 'o', 11: 'g', 12: 'y', 13: 'l', 14: 's', 15: 'w'}


In [6]:
char2int = {char: ind for ind, char in int2char.items()}
print(char2int)

{'e': 0, 'u': 1, 'r': 2, 't': 3, 'a': 4, ' ': 5, 'h': 6, 'd': 7, 'c': 8, 'p': 9, 'o': 10, 'g': 11, 'y': 12, 'l': 13, 's': 14, 'w': 15}


#### 2. Padding
We'll be padding our input sentences to ensure that all the sentences are of the sample length. While RNNs are typically able to take in variably sized inputs, we will usually want to feed training data in batches to speed up the training process. In order to used batches to train on our data, we'll need to ensure that each sequence within the input data are of equal size.

Therefore, in most cases, padding can be done by filling up sequences that are too short with 0 values and trimming sequences that are too long. In our case, we'll be finding the length of the longest sequence and padding the rest of the sentences with blank spaces to match that length.

In [7]:
max(text, key=len)

'they are cool'

In [8]:
maxlen = len(max(text, key=len)) 
print("The longest string has {} characters".format(maxlen))

The longest string has 13 characters


In [9]:
# Padding
for i in range(len(text)):  #loop each of the sentence
    while len(text[i]) < maxlen:  #if that sentence length is shorter than max len, keep adding white space
        text[i] += ' '
text

['hey whats up ', 'good day     ', 'they are cool']

#### 3. Defining target sequences
As we're going to predict the next character in the sequence at each time step, we'll have to divide each sentence into

Input data
The last input character should be excluded as it does not need to be fed into the model
Target/Ground Truth Label
One time-step ahead of the Input data as this will be the "correct answer" for the model at each time step corresponding to the input data

In [10]:
input_seq = []
target_seq = []

for i in range(len(text)):
    # Remove last character for input sequence
    input_seq.append(text[i][:-1])
    
    # Remove firsts character for target sequence
    target_seq.append(text[i][1:])
    print("Input Sequence: {}\nTarget Sequence: {}".format(input_seq[i], target_seq[i]))

Input Sequence: hey whats up
Target Sequence: ey whats up 
Input Sequence: good day    
Target Sequence: ood day     
Input Sequence: they are coo
Target Sequence: hey are cool


Now we can convert our input and target sequences to sequences of integers instead of characters by mapping them using the dictionaries we created above. This will allow us to one-hot-encode our input sequence subsequently.

In [11]:
for i in range(len(text)):
    input_seq[i] = [char2int[character] for character in input_seq[i]]
    target_seq[i] = [char2int[character] for character in target_seq[i]]
    
input_seq

[[6, 0, 12, 5, 15, 6, 4, 3, 14, 5, 1, 9],
 [11, 10, 10, 7, 5, 7, 4, 12, 5, 5, 5, 5],
 [3, 6, 0, 12, 5, 4, 2, 0, 5, 8, 10, 10]]

#### 4. One-hot embedding
We are now ready to make our input_sequences into the form of (batch_size, seq_len, vocab_size) via using one-hot embedding. This is the common shape of any text input.

In [12]:
batch_size = len(text)  #batch is the number of sentences
batch_size

3

In [13]:
seq_len = maxlen - 1  #we minus 1 because we remove the last character
seq_len

12

In [14]:
vocab_size = len(char2int)  #number of vocab size; this is also called the dimensions/features
vocab_size

16

In [15]:
def one_hot_encode(sequence, vocab_size, seq_len, batch_size):    
    input_seq_encoded = np.zeros((batch_size, seq_len, vocab_size), dtype=np.float32)
    
    for i in range(batch_size):
        for u in range(seq_len):
            input_seq_encoded[i, u, sequence[i][u]] = 1
    return input_seq_encoded

We also defined a helper function that creates arrays of zeros for each character and replaces the corresponding character index with a 1.

In [16]:
input_seq_encoded = one_hot_encode(input_seq, vocab_size, seq_len, batch_size)
print("Input shape: {} --> (Batch Size, Sequence Length, One-Hot Encoding Size)".format(input_seq_encoded.shape))

Input shape: (3, 12, 16) --> (Batch Size, Sequence Length, One-Hot Encoding Size)


Since we're done with all the data pre-processing, we can now move the data from numpy arrays to PyTorch's very own data structure - Torch Tensors

In [17]:
input_seq_tensor = torch.from_numpy(input_seq_encoded)  #from numpy
target_seq_tensor = torch.Tensor(target_seq) #from list, automatically float

#### 5. Implementing model

In [18]:
#defining hyperparameters for model
input_dim = vocab_size
output_dim = vocab_size
hidden_dim = 10 #this is similar to what hidden dim in fc layer; i just arbitrarily think about it
num_layers = 1

In [19]:
class RNN(nn.Module):
    def __init__(self):
        super(RNN, self).__init__()

        self.rnn = nn.RNN(input_dim, hidden_dim, num_layers, batch_first=True)   
        self.fc = nn.Linear(hidden_dim, output_dim)
    
    def forward(self, x):
        #x = [batch size, seq len, vocab size]
        batch_size = x.size(0)
        h0 = torch.zeros(num_layers, batch_size, hidden_dim).to(device)

        #out = [batch size, seq len, hidden dim]
        #hn = [1, batch size, hidden dim]
        out, hn = self.rnn(x, h0)
        
        out = out.reshape(-1, hidden_dim)
        #out = [batch size*seq len, hidden dim]

        out = self.fc(out)
        #out = [batch size*seq len, output dim]

        return out, hn 

In [20]:
#let understand basic RNN

# defining input
input_test = torch.rand_like(input_seq_tensor)
h0_test = torch.zeros(num_layers, batch_size, hidden_dim)

# defining rnn
rnn_test = nn.RNN(input_dim, hidden_dim, num_layers, batch_first=True)   

# 1. run rnn
out, hn = rnn_test(input_test)

print("RNN output: ", out.shape)  #batch, seq_len, hidden_dim

print("Hn output: ", hn.shape)  #num_layer, batch, hidden_dim

# 2. reshape
out = out.reshape(-1, hidden_dim)

print("Reshape output: ", out.shape)  #batch*seq_len, hidden_dim

# 3. linear
linear_layer = nn.Linear(hidden_dim, output_dim)

out = linear_layer(out)

print("After linear output: ", out.shape)  #batch*seq_len, output_dim

# 4. loss

print("Target seq shape: ", target_seq_tensor.shape) #need to match the dimension of out, thus combine

print("Target seq new shape: ", target_seq_tensor.view(-1).shape)  #view is similar to reshape; share memory

#now you can use CrossEntropyLoss comparing out and target

RNN output:  torch.Size([3, 12, 10])
Hn output:  torch.Size([1, 3, 10])
Reshape output:  torch.Size([36, 10])
After linear output:  torch.Size([36, 16])
Target seq shape:  torch.Size([3, 12])
Target seq new shape:  torch.Size([36])


#### Training

In [21]:
# Define model
model = RNN().to(device)

# Define hyperparameters for learning
num_epochs = 100
lr = 0.01

# Define Loss, Optimizer
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=lr)

In [22]:
input_seq_tensor = input_seq_tensor.to(device)
target_seq_tensor = target_seq_tensor.to(device)

for epoch in range(1, num_epochs + 1):
    
    #1. predict
    output, hidden = model(input_seq_tensor)

    #2. calculate loss
    loss = criterion(output, target_seq_tensor.view(-1).long())  #.view(-1) simply squeeze everything into 1 dimension; 

    #3. backprop
    optimizer.zero_grad() 
    loss.backward() 
    optimizer.step() 
    
    if epoch%10 == 0:
        sys.stdout.write('\rEpoch: {}/{}.............Loss: {:.4f}'.format(epoch, num_epochs,loss.item()))

Epoch: 100/100.............Loss: 0.1919

Let’s test our model now and see what kind of output we will get. Before that, let’s define some helper function to convert our model output back to text.

In [23]:
def _predict(model, character):
    # One-hot encoding our input to fit into the model
    character = np.array([[char2int[c] for c in character]])
    character = one_hot_encode(character, vocab_size, character.shape[1], 1)
    character = torch.from_numpy(character)
    character = character.to(device)
    
    out, hidden = model(character)

    prob = nn.functional.softmax(out[-1], dim=0).data  #out[-1] refers to the last character
    
    char_ind = torch.max(prob, dim=0)[1].item()

    return int2char[char_ind], hidden

In [24]:
def predict(model, out_len, input_text):
    model.eval() # eval mode
    input_text = input_text.lower()
    # First off, run through the starting characters
    chars = [ch for ch in input_text]
    size = out_len - len(chars)
    # Now pass in the previous characters and get a new one
    for ii in range(size):
        char, _ = _predict(model, chars)  #does not need h so underscored
        chars.append(char)

    return ''.join(chars)

In [25]:
predict(model, 15, 'hey')

'hey whats up  y'