# Quick system overview on RNN

A Recurrent Neural Network (RNN) is a tool to model *dynamic systems* (for example, modeling temporal data).

A dynamical system can be modeled with:

$h_t = f(h_{t-1}, x_t)$ 

where $h_t = g(x_0,...,x_t)$

## What is a recurrent neural network?

| ![A simple (vanilla) RNN](http://d3kbpzbmcynnmx.cloudfront.net/wp-content/uploads/2015/09/rnn.jpg) |
|:--:|
| A simple (vanilla) RNN |

The model is:

$h_{t+1} = activation(U * x_t + W * h_{t})$ where activation is usually a *tanh* or *relu* function.

## Different modes of usage
| ![Modes](http://karpathy.github.io/assets/rnn/diags.jpeg) |
|:--:|
| (Reused from Karpathy blog) - Each rectangle is a vector and arrows represent functions (e.g. matrix multiply). Input vectors are in red, output vectors are in blue and green vectors hold the RNN's state (more on this soon). From left to right: (1) Vanilla mode of processing without RNN, from fixed-sized input to fixed-sized output (e.g. image classification). (2) Sequence output (e.g. image captioning takes an image and outputs a sentence of words). (3) Sequence input (e.g. sentiment analysis where a given sentence is classified as expressing positive or negative sentiment). (4) Sequence input and sequence output (e.g. Machine Translation: an RNN reads a sentence in English and then outputs a sentence in French). (5) Synced sequence input and output (e.g. video classification where we wish to label each frame of the video). Notice that in every case are no pre-specified constraints on the lengths sequences because the recurrent transformation (green) is fixed and can be applied as many times as we like. |


# Let's start :-)

## Basic imports

In [57]:
import numpy as np
import torch
import torch.nn as nn
from torch.autograd import Variable
import torch.optim as optim

## Read the data

In [53]:
def read_data(data_path="./data/penn/"):
    train_data = None
    val_data = None
    test_data = None
    with open(data_path+"train.txt") as fileHandle:
        train_data = list(fileHandle.read())
#     with open(data_path+"test.txt") as fileHandle:
#         test_data = list(fileHandle.read())
#    with open(data_path+"valid.txt") as fileHandle:
#        val_data = list(fileHandle.read())
    
#    unique_letters = list(set(train_data + test_data + val_data))
    unique_letters = list(set(train_data))
    nb_tokens = len(unique_letters)
    return train_data, test_data, val_data, unique_letters, nb_tokens

train_data, test_data, val_data, unique_letters, nb_tokens = read_data()
print "# of tokens: ", nb_tokens, "\n"
print "Tokens: ", unique_letters, "\n"

print "First 100 characters in the training set:\n", "".join(train_data[:100])
print "Len of train data: ", len(train_data)
#print "Len of test data: ", len(test_data)
#print "Len of validation data: ", len(val_data)

# of tokens:  48 

Tokens:  ['\n', ' ', '#', '$', "'", '&', '*', '-', '/', '.', '1', '0', '3', '2', '5', '4', '7', '6', '9', '8', 'N', '\\', 'a', 'c', 'b', 'e', 'd', 'g', 'f', 'i', 'h', 'k', 'j', 'm', 'l', 'o', 'n', 'q', 'p', 's', 'r', 'u', 't', 'w', 'v', 'y', 'x', 'z'] 

First 100 characters in the training set:
 pierre N years old will join the board as a nonexecutive director nov. N 
 mr. is chairman of n.v. 
Len of train data:  4831321


In [39]:
# Convert letters to numbers
train_data_int, test_data_int, val_data_int = [], [], []
for letter in train_data:
    train_data_int.append(unique_letters.index(letter))

# for letter in test_data:
#     test_data_int.append(unique_letters.index(letter))

# for letter in val_data:
#    val_data_int.append(unique_letters.index(letter))

train_data_int = np.array(train_data_int).reshape(-1, 1)[0:50000]
# test_data_int = np.array(test_data_int).reshape(-1, 1)
# val_data_int = np.array(val_data_int).reshape(-1, 1)
print train_data_int[0:10]

[[ 1]
 [38]
 [29]
 [25]
 [40]
 [40]
 [25]
 [ 1]
 [20]
 [ 1]]


In [43]:
# Convert our data into one-hot encoding
from sklearn.preprocessing import OneHotEncoder
import pandas
#train_data_real = pandas.get_dummies(train_data)  # These two matrices are important in order to retrieve the actual letters and writers
train_data_onehot = OneHotEncoder(n_values=nb_tokens).fit_transform(train_data_int).toarray()
#test_data_onehot = OneHotEncoder(n_values=nb_tokens).fit_transform(test_data_int).toarray()
#val_data_onehot = OneHotEncoder(n_values=nb_tokens).fit_transform(val_data_int).toarray()
print train_data_onehot.shape
print train_data_int[0, :]
print train_data_onehot[0, :]

(50000, 48)
[1]
[ 0.  1.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]


In [81]:
# Build sequences
def sequence_builder(data, seq_len=100, target_len=1, step=1):
    sequence = []
    target = []
    data_len = data.shape[0]
    for i in range(0, data_len, step):
        if i+seq_len+1 < data_len:
            sequence.append(data[i:i+seq_len, :])
            target.append(data[i+1:i+seq_len+1, :])
    sequence = np.array(sequence)#.reshape(data_len, seq_len)
    target = np.array(target)#.reshape(data_len, target_len)
    return sequence, target
train_data_onehot_seq, train_data_onehot_target = sequence_builder(train_data_onehot, seq_len=100, target_len=1)
print train_data_onehot_seq.shape
print train_data_onehot_target.shape

(49899, 100, 48)
(49899, 100, 48)


In [82]:
# Wrapping numpy into tensors, then variables
train_data_onehot_seq = Variable(torch.from_numpy(train_data_onehot_seq).float(), requires_grad=False)
train_data_onehot_target = Variable(torch.from_numpy(train_data_onehot_target).float(), requires_grad=False)

In [92]:
class rnn_network(nn.Module):
    def __init__(self, nb_tokens=3, hidden_size=2):
        super(rnn_network, self).__init__()
        self.nb_tokens = nb_tokens
        self.hidden_size = hidden_size
        
        self.rnn1 = nn.GRUCell(input_size=nb_tokens, hidden_size=hidden_size)
        self.rnn2 = nn.GRUCell(input_size=hidden_size, hidden_size=hidden_size)
        
        self.output_layer = nn.Linear(in_features=hidden_size, out_features=nb_tokens)
        self.softmax = nn.Softmax()
    def forward(self, data_min_batch):
        """
        The data mini batch is in the form of : mini_batch_size X number of time steps X dimensions
        """
        hidden_state_layer_0 = Variable(torch.zeros(data_min_batch.size(0), self.hidden_size), requires_grad=False)
        hidden_state_layer_1 = Variable(torch.zeros(data_min_batch.size(0), self.hidden_size), requires_grad=False)
        
        timesteps = data_min_batch.size(1)
        rnn_output = []
        
        for timestep in range(timesteps):
            hidden_state_layer_0 = self.rnn1(data_min_batch[:, timestep, :], hidden_state_layer_0)
            hidden_state_layer_1 = self.rnn2(hidden_state_layer_0, hidden_state_layer_1)
            # print "hidden_state_layer_1: ", hidden_state_layer_1
            rnn_output.append(self.softmax(self.output_layer(hidden_state_layer_1)))
        
        rnn_output = torch.stack(rnn_output, 1)
#         final_output = self.softmax(self.output_layer(rnn_output))
        
        return rnn_output

In [93]:
learning_model = rnn_network(nb_tokens=nb_tokens, hidden_size=10)
output = learning_model.forward(train_data_onehot_seq[0:2, :, :])
print "output size: ", output.size()
# print torch.sum(output, dim=2)

output size:  torch.Size([2, 100, 48])


In [94]:
# Just to cut batches
def getbatch(*args, **kwargs):
    """
    Give it any number of arguments
    """
    i = kwargs['i']
    batch_size = kwargs['batch_size']
    assert len(args) > 0
    output_list = []
    # min_len = min(batch_size, len(args[0]) - 1 - i)
    min_len = min(batch_size, len(args[0]) - i)
    for argument in args:
        output_list.append(argument[i:i + min_len])
    return output_list

In [95]:
batch_size = 64
loss_fn = nn.NLLLoss()
optimizer = optim.Adam(learning_model.parameters(), lr=0.001)

nb_epochs = 100
for epoch in range(nb_epochs):
    train_loss_temp = []
    learning_model.train()
    for batch, i in enumerate(range(0, train_data_onehot_seq.size(0) - 1, batch_size)):
        data_in_batch, data_out_batch = \
        getbatch(train_data_onehot_seq, train_data_onehot_target, i=i, batch_size=batch_size)
        
        output = learning_model.forward(data_in_batch)
        optimizer.zero_grad()
        _, data_out_batch = torch.max(data_out_batch, dim=2, keepdim=False)
        # Calculating the loss manualy is better - I can be certain of how the loss works
        loss = 0
        for timestep in range(data_out_batch.size(1)):
            loss += loss_fn(output[:, timestep, :].contiguous(), data_out_batch[:,
                                                                 timestep].contiguous())
        loss.backward()
        optimizer.step()
        train_loss_temp.append(loss.data[0])
    print  "Ground truth - Epoch " + str(epoch) + " -- train loss = " + str(np.mean(train_loss_temp))

Ground truth - Epoch 0 -- train loss = -15.0442208831


KeyboardInterrupt: 