## Recurrent Neural Network

This project is a simple example of a "many to one" recurrent neural network that builds on knowledge from the simple neural network. This project will perform basic Sentiment Analysis and the tutorial for it can be found at:
https://victorzhou.com/blog/intro-to-rnns/

RNNs can take variable-length sequences as both inputs and outputs - an advantage over vanilla neural networks and CNNs, which must have pre-determined sizes of inputs/outputs. RNNs work by iteratively updating a hidden state, which is a vector that can also have arbitrary dimension. At any given step t, the next hidden state is calculated using the previous hidden state and the next input x. It takes the same weights for each step (hence "recurrent").

A vanilla RNN typically uses three sets of weights: input(t)->hidden layer, hidden layer(t-1)->hidden layer(t), and hidden layer(t)->output(t).

We will use two biases: b(h) when calculating h(t) and b(y) when calculating y(t).

* Each input will be a vector representing a word from the text.
* Our chosen activation function will be the hyperbolic tangent (tanh) for calculating the hidden layer.
* Our output will be a vector containing two numbers, one representing positive sentiments and the other negative. We will use Softmax to turn those into probabilities in order to ultimately decide the sentiment of the comment. Awesome reminder of how Softmax works: https://victorzhou.com/blog/softmax/

In [8]:
# Let's get our data
# Note: we are utilizing a small amount of data, so we will copy and paste our respective dictionaries.
# If the dataset was larger, we would bring it in as a csv or other type of file.

train_data = {
  'good': True,
  'bad': False,
  'happy': True,
  'sad': False,
  'not good': False,
  'not bad': True,
  'not happy': False,
  'not sad': True,
  'very good': True,
  'very bad': False,
  'very happy': True,
  'very sad': False,
  'i am happy': True,
  'this is good': True,
  'i am bad': False,
  'this is bad': False,
  'i am sad': False,
  'this is sad': False,
  'i am not happy': False,
  'this is not good': False,
  'i am not bad': True,
  'this is not sad': True,
  'i am very happy': True,
  'this is very good': True,
  'i am very bad': False,
  'this is very sad': False,
  'this is very happy': True,
  'i am good not bad': True,
  'this is good not bad': True,
  'i am bad not good': False,
  'i am good and happy': True,
  'this is not good and not happy': False,
  'i am not at all good': False,
  'i am not at all bad': True,
  'i am not at all happy': False,
  'this is not at all sad': True,
  'this is not at all happy': False,
  'i am good right now': True,
  'i am bad right now': False,
  'this is bad right now': False,
  'i am sad right now': False,
  'i was good earlier': True,
  'i was happy earlier': True,
  'i was bad earlier': False,
  'i was sad earlier': False,
  'i am very bad right now': False,
  'this is very good right now': True,
  'this is very sad right now': False,
  'this was bad earlier': False,
  'this was very good earlier': True,
  'this was very bad earlier': False,
  'this was very happy earlier': True,
  'this was very sad earlier': False,
  'i was good and not bad earlier': True,
  'i was not good and not happy earlier': False,
  'i am not at all bad or sad right now': True,
  'i am not at all good or happy right now': False,
  'this was not happy and not good earlier': False,
}

test_data = {
  'this is happy': True,
  'i am good': True,
  'this is not happy': False,
  'i am not good': False,
  'this is not bad': True,
  'i am not sad': True,
  'i am very good': True,
  'this is very bad': False,
  'i am very sad': False,
  'this is bad not good': False,
  'this is good and happy': True,
  'i am not good and not happy': False,
  'i am not at all sad': True,
  'this is not at all good': False,
  'this is not at all bad': True,
  'this is good right now': True,
  'this is sad right now': False,
  'this is very bad right now': False,
  'this was good earlier': True,
  'i was not happy and not good earlier': False,
}

In [9]:
# Create the vocabulary
vocab = list(set([w for text in train_data.keys() for w in text.split(' ')]))
vocab_size = len(vocab)
print('%d unique words found' % vocab_size)

18 unique words found


In [11]:
# Now let's assign an integer index to represent each word in our vocab.
word_to_idx = { w: i for i, w in enumerate(vocab)}
idx_to_word = { i: w for i, w in enumerate(vocab)}
print(word_to_idx['good'])
print(idx_to_word[0])

16
bad


We will use one-hot vectors, which contain all zeros except for a single one, which represents the corresponding integer index. Thus, each x will be an 18-dimensional one-hot vector.

In [16]:
import numpy as np

def createInputs(text):
    """
    Returns an array of one-hot vectors representing the words in the input text string.
    - text is a string
    - Each one-hot vector has shape (vocab_size, 1)
    """
    inputs = []
    for w in text.split(' '):
        v = np.zeros((vocab_size, 1))
        v[word_to_idx[w]] = 1
        inputs.append(v)
    return inputs

Now let's start implementing our RNN. We'll start by initializing the three weights and two biases it needs.

In [31]:
from numpy.random import randn

class RNN:
    # A vanilla recurrent neural network
    
    def __init__(self, input_size, output_size, hidden_size=64):
        # Weights
        self.Whh = randn(hidden_size, hidden_size) / 1000 # We are dividing by 1000 to reduce initial variance of our weights. This is not the best way to initialize weights; we are using it for simplicity.
        self.Wxh = randn(hidden_size, input_size) / 1000
        self.Why = randn(output_size, hidden_size) / 1000
        
        # Biases
        self.bh = np.zeros((hidden_size, 1))
        self.by = np.zeros((output_size, 1))
    
    def forward(self, inputs):
        """
        Perform a forward pass of the RNN using the given inputs
        Returns the final output and hidden state.
        - inputs is an array of one-hot vectors with shape (input_size, 1)
        """
        h = np.zeros((self.Whh.shape[0], 1))
        
        self.last_inputs = inputs # In tutorial, this line was inserted after creating the inputs
        self.last_hs = { 0:h } # In tutorial, this line was inserted after creating the inputs
        
        # Perform each step of the RNN
        for i, x in enumerate(inputs):
            h = np.tanh(self.Wxh @ x + self.Whh @ h + self.bh)
            self.last_hs[i + 1] = h # In tutorial, this line was inserted after creating the inputs
            
        # Compute the output
        y = self.Why @ h + self.by
        
        return y, h
    
    def backprop(self, d_y, learn_rate=2e-2): # In tutorial, this definition was inserted after creating the inputs
        """
        Perform a backward pass of the RNN.
        - d_y (dL/dy) has shape (output_size, 1).
        - learn_rate is a float.
        """
        n = len(self.last_inputs)
        
        d_Why = d_y @ self.last_hs[n].T # In tutorial, this line was inserted after calculating derivatives
        d_by = d_y # In tutorial, this line was inserted after calculating derivatives
        
        # Initialize dL/dWhh, dL/Wxh, and dL/dbh to zero.
        d_Whh = np.zeros(self.Whh.shape) # In tutorial, this line was inserted after calculating derivatives
        d_Wxh = np.zeros(self.Wxh.shape) # In tutorial, this line was inserted after calculating derivatives
        d_bh = np.zeros(self.bh.shape) # In tutorial, this line was inserted after calculating derivatives
        
        # Calculate dL/dh for the last h.
        d_h = self.Why.T @ d_y # In tutorial, this line was inserted after creating the inputs
       
# Now we need the gradients for W(hh), W(xh), and b(h). 
# To calculate the gradient for W(xh), we'll need to backpropagate through all timesteps 
# (Backpropagation Through Time - BPTT). We will add lines to `backprop` to do this.
    
        # Backpropagate through time
        for t in reversed(range(n)): # In tutorial, this line was inserted after calculating derivatives
            # An intermediate value: dL/dh * (1 - h^2)
            temp = ((1 - self.last_hs[t + 1] ** 2) * d_h) # In tutorial, this line was inserted after calculating derivatives
            
            # dL/db = dL/dh * (1 - h^2)
            d_bh += temp # In tutorial, this line was inserted after calculating derivatives
            
            # dL/dWhh = dL/dh * (1 - h^2) * h_{t-1}
            d_Whh += temp @ self.last_hs[t].T # In tutorial, this line was inserted after calculating derivatives
            
            # dL/dWxh = dL/dh * (1 - h^2) * x
            d_Wxh += temp @ self.last_inputs[t].T # In tutorial, this line was inserted after calculating derivatives
            
            # Next dL/dh = dL/dh * (1 - h^2) * Whh
            d_h = self.Whh @ temp # In tutorial, this line was inserted after calculating derivatives
            
            # Clip to prevent exploding gradients. This is when gradients become very large due to having lots of multiplied terms.
            for d in [d_Wxh, d_Whh, d_Why, d_bh, d_by]: # In tutorial, this line was inserted after calculating derivatives
                np.clip(d, -1, 1, out=d) # In tutorial, this line was inserted after calculating derivatives
            
            # Update weights and biases using gradient descent.
            self.Whh -= learn_rate * d_Whh # In tutorial, this line was inserted after calculating derivatives
            self.Wxh -= learn_rate * d_Wxh # In tutorial, this line was inserted after calculating derivatives
            self.Why -= learn_rate * d_Why # In tutorial, this line was inserted after calculating derivatives
            self.bh -= learn_rate * d_bh # In tutorial, this line was inserted after calculating derivatives
            self.by -= learn_rate * d_by # In tutorial, this line was inserted after calculating derivatives
            

    
def softmax(xs):
    # Applies the Softmax Function to the input array
    return np.exp(xs) / sum(np.exp(xs))
    
# Initialize our RNN!
rnn = RNN(vocab_size, 2)

inputs = createInputs('i am very good')
out, h = rnn.forward(inputs)
probs = softmax(out)
print(probs)

[[0.50000273]
 [0.49999727]]


Now we will begin working on the backward phase. To train our RNN, we will need a loss function. This example will use cross-entropy loss, which is often paired with Softmax and enables us to quantify how sure we are RNN has predicted correctly. Reminder tutorial here: https://victorzhou.com/blog/intro-to-cnns-part-1/#52-cross-entropy-loss

In [24]:
# Loop over each training example
for x, y in train_data.items():
    inputs = createInputs(x)
    target = int(y)
    
    # Forward
    out, _ = rnn.forward(inputs)
    probs = softmax(out)
    
    # Build dL/dy
    d_L_d_y = probs
    d_L_d_y[target] -= 1
    
    # Backward
    rnn.backprop(d_L_d_y)

Let's build a helper function to process data with our RNN:

In [29]:
import random

def processData(data, backprop=True):
    """
    Returns the RNN's loss and accuracy for the given data.
    - data is a dictionary mapping text to True or False.
    - backprop determines if the backward phase should be run.
    """
    items = list(data.items())
    random.shuffle(items)
    
    loss = 0
    num_correct = 0
    
    for x, y in items:
        inputs = createInputs(x)
        target = int(y)
        
        # Forward
        out, _ = rnn.forward(inputs)
        probs = softmax(out)
        
        # Calculate loss / accuracy
        loss -= np.log(probs[target])
        num_correct += int(np.argmax(probs) == target)
        
        if backprop:
            # Build dL/dy
            d_L_d_y = probs
            d_L_d_y[target] -= 1
            
            # Backward
            rnn.backprop(d_L_d_y)
    return loss / len(data), num_correct / len(data)

Woohoo! Now it's time to write our training loop

In [33]:
# Training loop
# *Note: If we were using a large dataset that uses a lot of computing power,
# we would want to write a callback bit here to stop the training once a threshold of improvement had been reached.
for epoch in range(1000):
    train_loss, train_acc = processData(train_data)
    
    if epoch % 100 == 99:
        print('--- Epoch %d' % (epoch + 1))
        print('Train:\tLoss %.3f | Accuracy: %.3f' % (train_loss, train_acc))
        
        test_loss, test_acc = processData(test_data, backprop=False)
        print('Test:\tLoss %.3f | Accuracy: %.3f' % (test_loss, test_acc))

--- Epoch 100
Train:	Loss 0.000 | Accuracy: 1.000
Test:	Loss 0.000 | Accuracy: 1.000
--- Epoch 200
Train:	Loss 0.000 | Accuracy: 1.000
Test:	Loss 0.000 | Accuracy: 1.000
--- Epoch 300
Train:	Loss 0.000 | Accuracy: 1.000
Test:	Loss 0.000 | Accuracy: 1.000
--- Epoch 400
Train:	Loss 0.000 | Accuracy: 1.000
Test:	Loss 0.000 | Accuracy: 1.000
--- Epoch 500
Train:	Loss 0.000 | Accuracy: 1.000
Test:	Loss 0.000 | Accuracy: 1.000
--- Epoch 600
Train:	Loss 0.000 | Accuracy: 1.000
Test:	Loss 0.000 | Accuracy: 1.000
--- Epoch 700
Train:	Loss 0.000 | Accuracy: 1.000
Test:	Loss 0.000 | Accuracy: 1.000
--- Epoch 800
Train:	Loss 0.000 | Accuracy: 1.000
Test:	Loss 0.000 | Accuracy: 1.000
--- Epoch 900
Train:	Loss 0.000 | Accuracy: 1.000
Test:	Loss 0.000 | Accuracy: 1.000
--- Epoch 1000
Train:	Loss 0.000 | Accuracy: 1.000
Test:	Loss 0.000 | Accuracy: 1.000
