# LSTM Language Model

In this notebook, we are going to make a Language Model using LSTMs. This is the "old-school" way to make language models. Recently, with the introduction of the Transformer architecture, one can successfully make a Language Model with better overall quality instead of using LSTM.

In [2]:
%load_ext autoreload
%autoreload 2
from practicalnlp import settings
from practicalnlp.models import *
from practicalnlp.training import *
from practicalnlp.data import *
import torch

# Loading data

Here we load all data with `batch_size = 20`. It's important to note that we subdivide data with 2 parameters: `nctx` and `batch_size`. `nctx` is the number of words we are using in a single pass of a training phase. For example, the figure below ilustrates each *step* in the training phase for `nctx = 3` over a single `batch_size` of the entire sentence below.


<img src="training_step_lm.svg" width="800" />
<!--- [svg](training_step_lm.svg)> --->

Arrows indicate that the origin word is trying to predict the next word in the `nctx` window. When the last word of the `nctx` window is processed, the window is translated by `nctx` words and the process repeats until it reads the entire batch. The `nctx` param is also known as `bptt` (*backpropagation through time*), and is the name used in the official PyTorch tutorial for Language Modeling.

Although this example shows the execution for only a single batch, in practice, we do it for all batchs at the same time. It might be easy to understand how it can be done in practice with a 2-dimensional tensor (one dimension for batch size, and other for the sequence length). In the code below, we do it using PyTorch.

In [8]:
batch_size = 20
nctx = 35
TRAIN = settings.WIKI_TRAIN_DATA
VALID = settings.WIKI_VALID_DATA
reader = WordDatasetReader(nctx)
reader.build_vocab((TRAIN,))

train_set = reader.load(TRAIN, batch_size)
valid_set = reader.load(VALID, batch_size)

In [13]:
train_set.shape

torch.Size([20, 104431])

In [21]:
model = LSTMLanguageModel(len(reader.vocab), 512, 512)
model.to('cuda:0')

num_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
print(f"Model has {num_params} parameters") 


learnable_params = [p for p in model.parameters() if p.requires_grad]
optimizer = torch.optim.Adam(learnable_params, lr=0.001)
fit_lm(model, optimizer, 1, batch_size, nctx, train_set, valid_set)

Model has 21274623 parameters
EPOCH 1
Training Results
average_train_loss 7.093464 (7.688572)
average_train_loss 6.886086 (7.213753)
average_train_loss 6.399582 (6.992481)
average_train_loss 6.709525 (6.850131)
average_train_loss 6.531233 (6.745780)
average_train_loss 6.296781 (6.665744)
average_train_loss 6.202570 (6.608568)
average_train_loss 6.164864 (6.558448)
average_train_loss 6.103103 (6.514183)
average_train_loss 5.831127 (6.476745)
average_train_loss 6.082990 (6.445060)
average_train_loss 5.877787 (6.417976)
average_train_loss 6.173513 (6.395627)
average_train_loss 6.158544 (6.369488)
average_train_loss 6.182152 (6.351835)
average_train_loss 6.053542 (6.336367)
average_train_loss 6.132944 (6.315442)
average_train_loss 5.806489 (6.295173)
average_train_loss 5.750424 (6.279107)
average_train_loss 5.949520 (6.262082)
average_train_loss 5.634626 (6.244754)
average_train_loss 5.706845 (6.225194)
average_train_loss 5.873329 (6.207461)
average_train_loss 5.798521 (6.195122)
average_t

In [79]:
def sample(model, index2word, start_word='the', maxlen=20):
  

    model.eval() 
    words = [start_word]
    x = torch.tensor(reader.vocab.get(start_word)).long().reshape(1, 1).to('cuda:0')
    hidden = model.init_hidden(1)

    with torch.no_grad():
        for i in range(20):
            output, hidden = model(x, hidden)
            word_softmax = output.squeeze().exp().cpu()
            selected = torch.multinomial(word_softmax, 1)[0]
            x.fill_(selected)
            word = index2word[selected.item()]
            words.append(word)
    words.append('...')
    return words

index2word = {i: w for w, i in reader.vocab.items()}
words = sample(model, index2word)
print(' '.join(words))

the game . Right also record a obey Archive in nine @-@ books , with around 1 yards pounder . Larry ...
