<a href="https://colab.research.google.com/github/mancinimassimiliano/DeepLearningLab/blob/master/Lab4/char_rnn_classification.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Tutorial on Recurrent Neural Networks

Recurrent Neural Networks (RNN) are models which are useful anytime we want to model sequences of data (e.g. video, text). In this tutorial (adapted from [here](https://pytorch.org/tutorials/intermediate/char_rnn_classification_tutorial.html)), we will see how we can predict the language of a name using an RNN model taking single word characters as input. 

Specifically, we will train the network on a list of surnames from 18 languages of origin, and predict which language a name is from based on the spelling:

```
$ python predict.py Hinton
(0.63) Scottish
(0.22) English
(0.02) Irish

$ python predict.py Schmidhuber
(0.83) German
(0.08) Czech
(0.07) Dutch
```

# Preparing the Data

The [link](https://download.pytorch.org/tutorial/data.zip) to download the needed data is provided within the official pytorch tutorial. The data must be downloaded and extracted in your virtual machine. We can do this through:

In [0]:
!wget https://download.pytorch.org/tutorial/data.zip
!unzip data.zip

Under the downloaded directory there are 18 text files named as "[Language].txt". Each file contains a bunch of names, one name per line. In the following, we will take care of data preprocessing by :

* Extracting all the names and numbers of categories from the files.
* Converting from Unicode to ASCII each name.
* Instantiating a dictionary containing all names (values) of a given language (key)

In [0]:
import glob
import unicodedata
import string

all_filenames = glob.glob('data/names/*.txt')
all_letters = string.ascii_letters + " .,;'"
n_letters = len(all_letters)

# Turn a Unicode string to plain ASCII, thanks to http://stackoverflow.com/a/518232/2809427
def unicode_to_ascii(s):
    return ''.join(
        c for c in unicodedata.normalize('NFD', s)
        if unicodedata.category(c) != 'Mn'
        and c in all_letters
    )

print(unicode_to_ascii('Ślusàrski'))

# Build the category_lines dictionary, a list of names per language
category_lines = {}
all_categories = []

# Read a file and split into lines
def readLines(filename):
    lines = open(filename).read().strip().split('\n')
    return [unicode_to_ascii(line) for line in lines]

for filename in all_filenames:
    category = filename.split('/')[-1].split('.')[0]
    all_categories.append(category)
    lines = readLines(filename)
    category_lines[category] = lines

n_categories = len(all_categories)
print('n_categories =', n_categories)

# Turning Names into Tensors

A crucial point in this problem is how to define the input to the network. Since the network threats numbers and not plain text, we must convert text to numerical representation. To this extent we represent each letter as a one-hot vector of size `<1 x n_letters>`. A one-hot vector is filled with 0s except for a 1 at index of the current letter, e.g. `"b" = <0 1 0 0 0 ...>`.

To make a word we join a bunch of those into a 2D matrix `<line_length x 1 x n_letters>`.

That extra 1 dimension is because PyTorch assumes everything is in batches - we're just using a batch size of 1 here.

In [0]:
import torch
  
# Just for demonstration, turn a letter into a <1 x n_letters> Tensor
def letter_to_tensor(letter):
    tensor = torch.zeros(1, n_letters)
    letter_index = all_letters.find(letter)
    tensor[0][letter_index] = 1
    return tensor


# Turn a line into a <line_length x n_letters>,
# (or <line_length x 1 x n_letters> if the batch dimension is added)
# of one-hot letter vectors
def line_to_tensor(line,add_batch_dimension=True):
    tensor = torch.zeros(len(line), n_letters)
    for li, letter in enumerate(line):
        letter_index = all_letters.find(letter)
        tensor[li][letter_index] = 1
    if add_batch_dimension:
      return tensor.unsqueeze(1)
    else:
      return tensor
  
  
# Create a batch of samples given a list of lines
def create_batch(lines):
    tensors = []
    for l in lines:
      tensors.append(line_to_tensor(l,add_batch_dimension=False))
      
    padded_tensor = # TODO
    return padded_tensor

# Creating the Network

Instantiate a simple recurrent neural network. The newtork should have a recurrent layer followed by a fully connected layer mapping the features of the recurrent unit to the output space (i.e. number of categories).



To run a step of this network we need to pass an input (in our case, the Tensor for the current sequence/s) and a previous hidden state (which we initialize as zeros at first). We'll get back the logits (i.e. network activation before the softmax) for each each language.


In [0]:
# Create a simple recurrent network      
class SimpleRNN(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(SimpleRNN, self).__init__()
        
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.output_size = output_size
        
        # Create the RNN layer. Be aware that the activation is included in most of them.
        self.i2h = #TODO
        
        # Create the classifier 
        self.i2o = #TODO
        
    # Forward the whole sequence at once
    def forward(self, input, hidden=None):
        if hidden==None:
          hidden = self.init_hidden(input.shape[1])
          
        output, _ = self.i2h # TODO: Be aware that the output changes with the chosen recurrent layer.
        output = self.i2o(output[-1])
        
        
        return output

    # Instantiate the hidden state of the first element of the sequence dim: 1 x batch_size x hidden_size)
    def init_hidden(self,shape=1):
        return torch.zeros(1, shape, self.hidden_size)
      
      
class SimpleRNNwithCell(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(SimpleRNNwithCell, self).__init__()
        
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.output_size = output_size
        
        # Create the RNN unit/cell. Be aware that the activation is included in most of them.
        self.i2h = #TODO
        
        # Create the classifier 
        self.i2o = #TODO
    
    def forward(self, input, hidden=None):
        
        if hidden==None:
          hidden = self.init_hidden(input.shape[1])
        
        # Loop over the sequence 
        for i in range(input.shape[0]):
          hidden = #TODO
          output = #TODO
          
        return output

    def init_hidden(self,shape=1):
        return torch.zeros(shape, self.hidden_size)


# Preparing for Training

Before going into training we should make a few helper functions. The first is to interpret the output of the network, which we know to be a logits of each category. We can use `Tensor.topk` to get the index of the greatest value:

In [0]:
def category_from_output(output):
    top_n, top_i = output.data.topk(1)
    category_i = top_i[0][0]
    return all_categories[category_i], category_i

We will also want a quick way to get a training example (a name and its language):

In [0]:
import random

def random_training_pair(bs=1):
    lines = []
    categories = []
    for b in range(bs):
      category = random.choice(all_categories)
      line = random.choice(category_lines[category])
      
      lines.append(line)
      categories.append(category)
      
      
    categories_tensor = torch.LongTensor([all_categories.index(c) for c in categories])
    lines_tensor = create_batch(lines)
    
    return categories_tensor, lines_tensor


# Training the Network

Now all it takes to train this network is show it a bunch of examples, have it make guesses, and tell it if it's wrong.

Since the output of the networks are logits and the task is classification, we can use a standard cross-entropy loss.

In [0]:
criterion = nn.CrossEntropyLoss()

Now we instantiate a standard training loop where we will:

*   Forward the inpu to the network
*   Compute the loss
*   Backpropagate it
*   Do a step of the optimizer
*   Reset the optimizer/network's grad


In [0]:
def train(rnn, optimizer, categories_tensor, lines_tensor):
    # TODO
    # - reset the optimizer 
    # - forward pass (output = ...)
    # - compute loss  (loss = ...)
    # - backward pass
    # - update the parameters
    
    return output, loss.item()

Now we just have to:
*   Instantiate the network
*   Instantiate the optimizer
*   Run the training steps for a given number of iterations



In [0]:
# Initialize the network:
n_hidden = 128
rnn = # TODO

# Initialize the optimizer
learning_rate = 0.005 # Example: different LR could work better
optimizer = # TODO

# Initialize the training loop
batch_size = 2
n_iterations = 100000
print_every = 5000

# Keep track of losses
current_loss = 0

for iter in range(1, n_iterations + 1):
    # Get a random training input and target
    category_tensor, line_tensor = random_training_pair(bs=batch_size)
    
    # Process it through the train function
    output, loss = train(rnn, optimizer, category_tensor, line_tensor)
    
    # Accumulate loss for printing
    current_loss += loss
    
    # Print iteration number and loss
    if iter % print_every == 0:
        print('%d %d%% %.4f ' % (iter, iter / n_iterations * 100, current_loss/print_every))
        current_loss = 0


# Running on User Input

Finally, followith the original tutorial [in the Practical PyTorch repo](https://github.com/spro/practical-pytorch/tree/master/char-rnn-classification) we instantiate a prediction function and test on some user defined inputs.

In [0]:
normalizer = torch.nn.Softmax(dim=-1)

def predict(input_line, n_predictions=3):
    print('\n> %s' % input_line)
    output = rnn(line_to_tensor(input_line))
    output = normalizer(output)
    # Get top N categories
    topv, topi = output.data.topk(n_predictions, 1, True)
    predictions = []

    for i in range(n_predictions):
        value = topv[0][i]
        category_index = topi[0][i]
        print('(%.2f) %s' % (value, all_categories[category_index]))
        predictions.append([value, all_categories[category_index]])

predict('Dovesky')
predict('Jackson')
predict('Satoshi')
