## N-Gram Language Modelling

Similar to how we defined a unique index for each word when making one-hot vectors, we also need to define an index for each word when using embeddings. These will be keys into a lookup table. That is, embeddings are stored as a |V|×D matrix, where D is the dimensionality of the embeddings, such that the word assigned index i has its embedding stored in the i‘th row of the matrix. In all of my code, the mapping from words to indices is a dictionary named word_to_ix.

The module that allows you to use embeddings is torch.nn.Embedding, which takes two arguments: the vocabulary size, and the dimensionality of the embeddings.


Recall that in an n-gram language model, given a sequence of words w, we want to compute

                                            P(wi|wi−1,wi−2,…,wi−n+1)
Where wi is the ith word of the sequence.

    

In [1]:
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
import torch.autograd as autograd

torch.manual_seed(1)

context_size = 2
embedding_size = 10
test_sentence = """When forty winters shall besiege thy brow,
And dig deep trenches in thy beauty's field,
Thy youth's proud livery so gazed on now,
Will be a totter'd weed of small worth held:
Then being asked, where all thy beauty lies,
Where all the treasure of thy lusty days;
To say, within thine own deep sunken eyes,
Were an all-eating shame, and thriftless praise.
How much more praise deserv'd thy beauty's use,
If thou couldst answer 'This fair child of mine
Shall sum my count, and make my old excuse,'
Proving his beauty by succession thine!
This were to be new made when thou art old,
And see thy blood warm when thou feel'st it cold.""".split()

trigrams = [([test_sentence[i], test_sentence[i + 1]], test_sentence[i + 2])
            for i in range(len(test_sentence) - 2)]

vocab = set(test_sentence)
word_to_ix = {word: i for i, word in enumerate(vocab)}
print(trigrams[:3])
print(word_to_ix)

[(['When', 'forty'], 'winters'), (['forty', 'winters'], 'shall'), (['winters', 'shall'], 'besiege')]
{'all': 0, 'brow,': 1, 'being': 2, 'couldst': 3, 'treasure': 4, 'Proving': 5, 'to': 6, 'field,': 7, 'worth': 8, 'his': 9, 'thine!': 10, 'lies,': 11, 'Where': 12, 'dig': 13, 'succession': 14, 'small': 15, 'praise.': 16, 'where': 17, 'old': 18, 'fair': 19, 'see': 20, 'Thy': 21, "deserv'd": 22, 'sum': 23, 'shall': 24, 'forty': 25, 'new': 26, 'be': 27, 'asked,': 28, 'days;': 29, 'This': 30, 'Were': 31, 'say,': 32, "feel'st": 45, 'on': 34, 'thou': 35, 'of': 36, 'thine': 37, 'own': 38, 'gazed': 39, 'within': 40, 'When': 41, 'art': 42, 'now,': 43, 'trenches': 44, 'by': 33, 'much': 46, 'more': 47, 'held:': 48, 'count,': 49, 'mine': 67, 'warm': 51, 'deep': 68, 'child': 53, 'livery': 69, "youth's": 55, 'And': 56, 'made': 57, 'To': 70, "'This": 59, 'praise': 60, 'were': 61, 'eyes,': 62, 'my': 63, 'old,': 64, 'and': 65, 'use,': 66, 'it': 50, "beauty's": 52, 'an': 54, 'How': 58, 'shame,': 71, 'in': 

## Defining a model

In [15]:
class NGramModel(nn.Module):
    def __init__(self, vocab_size, context_size, embedding_size):
        super(NGramModel, self).__init__()
        self.embedd = nn.Embedding(vocab_size, embedding_size) #define random embeddings
        self.linear1 = nn.Linear(context_size*embedding_size, 128)
        self.linear2 = nn.Linear(128, vocab_size)
        
    def forward(self, inputs):
        embeds = self.embedd(inputs).view((1,-1))
        out = F.relu(self.linear1(embeds))
        out = self.linear2(out)
        return F.log_softmax(out, dim=1)
    
model = NGramModel(len(vocab), context_size, embedding_size)
print(model)

NGramModel(
  (embedd): Embedding(97, 10)
  (linear1): Linear(in_features=20, out_features=128)
  (linear2): Linear(in_features=128, out_features=97)
)


## Training the model

In [30]:
loss_fn = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr = 0.001)
total_loss= 0
losses = []

for epochs in range(10):
    for context, target in trigrams:

        context_idxs = [word_to_ix[w] for w in context] # get word index
        context_var = autograd.Variable(torch.LongTensor(context_idxs))

        model.zero_grad()

        log_probs = model(context_var)
        loss = loss_fn(log_probs, autograd.Variable(
            torch.LongTensor([word_to_ix[target]])))
        loss.backward()
        optimizer.step()

        total_loss += loss.data[0]
    losses.append(total_loss)
print(losses)


[323.0873278081417, 643.6426765918732, 961.6597440838814, 1277.132811397314, 1590.0542487502098, 1900.417294204235, 2208.2167975008488, 2513.4463525414467, 2816.101941421628, 3116.17862893641]
