# Introduction to LSTM in PyTorch

## Building a simple vocabulary from a collection of sentences

In PyTorch the input to an LSTM is expected to be a 3D tensor. Suppose we have a collection of sentences as an input. Let's start from the very beginning, showing how to go from a collection of strings to a vocabulary organized as a Python dictionary. As a simplification, all the sentences contain the same number of words.

In [1]:
import torch
import torch.autograd as autograd
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import numpy as np
from functools import reduce


sentences = ['I had breakfast this morning'.split(' '),
             'then I took a walk'.split(' '),
             'but the weather was bad'.split(' ')]
print(sentences)

[['I', 'had', 'breakfast', 'this', 'morning'], ['then', 'I', 'took', 'a', 'walk'], ['but', 'the', 'weather', 'was', 'bad']]


We can build a vocabulary turning each sentence into a set, and then iteratively taking the union of the sets. This will provide us with a single set with the unique words.

In [2]:
vocabulary = reduce(set.union, [set(sentence) for sentence in sentences])
print(vocabulary)

{'had', 'but', 'weather', 'this', 'was', 'a', 'I', 'breakfast', 'then', 'the', 'took', 'walk', 'bad', 'morning'}


We can easily turn this set into a dictionary.

In [3]:
vocabulary = {word: ix for ix, word in enumerate(list(vocabulary))}
print(vocabulary)

{'had': 0, 'but': 1, 'weather': 2, 'this': 3, 'was': 4, 'a': 5, 'I': 6, 'breakfast': 7, 'then': 8, 'the': 9, 'took': 10, 'walk': 11, 'bad': 12, 'morning': 13}


Now we need to turn the sentences into NumPy arrays that can be passed to the LSTM model. We start pre-allocating a numpy array of zeroes with shape (number of sentences, maximum length of a sentence, number of words in the vocabulary).

In [4]:
sentence_max_length = max([len(sentence) for sentence in sentences])
inputs = np.zeros((len(sentences), sentence_max_length, len(vocabulary)), dtype='float32')
print(inputs.shape)

(3, 5, 14)


We can then populate this array with a one hot encoder (this would not work if we wanted to create word embeddings).

In [5]:
for ix_sentence in range(len(sentences)):
    for ix_word in range(len(sentences[ix_sentence])):
        ix_vocab = vocabulary[sentences[ix_sentence][ix_word]]
        inputs[ix_sentence, ix_word, ix_vocab] = 1

print(inputs)
print(inputs.shape)

[[[0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0.]
  [1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
  [0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0.]
  [0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
  [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]]

 [[0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]
  [0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0.]
  [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
  [0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
  [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0.]]

 [[0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
  [0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]
  [0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
  [0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
  [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0.]]]
(3, 5, 14)


### Using the correct shape

Our inputs consist in three sequences each one-hot encoded into a numpy array of size 5 x 14. We can now introduce an LSTM module that will read these sequences and store the state in a hidden layer of size 5.
From the [documentation of the LSTM layer](http://pytorch.org/docs/master/nn.html#torch.nn.LSTM) we see that the inputs are supposed to be of shape (seq_len, batch, input_size). Here we are providing batches of size 1, which means that we will need to reshape each input to (5, 1, 14).

In [6]:
inputs = inputs.reshape(3, 1, 5, 14)

We can now instatiate an LSTM layer with 4 hidden units.

In [7]:
lstm = nn.LSTM(input_size=14, hidden_size=4)

### Initialization of the hidden and cell states

From the [documentation of the LSTM layer](http://pytorch.org/docs/master/nn.html#torch.nn.LSTM), LSTM returns a tuple containing the outputs and a tuple with the hidden hidden state and the cell state, and receives in input the input sequences and, optionally, the same tuple. If this is not provided, it will default to a zero initialization for both the hidden and the cell states.

In [8]:
input_var = autograd.Variable(torch.from_numpy(inputs))
print(input_var.size())

torch.Size([3, 1, 5, 14])


We can see that the LSTM layer is actually working by assigning the output of `lstm` applied to the first sequence. Here, even if it is not necessary, we provide a manually zero-initialized tuple of hidden and cell states, to show the full syntax.

In [9]:
hidden = (autograd.Variable(torch.zeros(1, 1, 4)), autograd.Variable(torch.zeros(1, 1, 4)))
out, hidden = lstm(input_var[0], hidden)

The output has shape while the hidden and cell states have shapes respectively

In [10]:
print('Output size: {}'.format(out.size()))
print('Hidden sizes: {0} and {1}'.format(hidden[0].size(), hidden[1].size()))

Output size: torch.Size([1, 5, 4])
Hidden sizes: torch.Size([1, 5, 4]) and torch.Size([1, 5, 4])


For each of the 5 elements in the sentence, we have the output of the 4 hidden/cell states.