# Word embedding
As for the one-hot vectors, we need to define and index for each word, when using embeddings. Embeddings are stored as a $|V|\times D$ matrix, where $D$ is the dimensionality of the embedding and $|V|$ is the dimension of the vocabulary. 

In [2]:
import torch.nn as nn
import torch.nn.functional as F

## N-Gram Language Modeling
Given a sequence of words $w$ we want to compute $\mathbb{P}(w_i | w_{i-1}, ..., w_{i-n+1})$, where $w_i$ is the $i$-th word of the sequence.

In [3]:
class NGramLanguageModeler(nn.Module): ##in this case the heredited class is nn.Module
    
    def __init__(self, vocab_size, embedding_dim, context_size):
        super(NGramLanguageModeler, self).__init__()
        self.embeddings = nn.Embedding(vocab_size, embedding_dim)
        self.linear1 = nn.Linear(context_size * embedding_dim, 128)
        self.linear2 = nn.Linear(128, vocab_size)
        
    def forward(self, inputs):
        embeds = self.embeddings(inputs).view((1, -1))
        out = F.relu(self.linear1(embeds))
        out = self.linear2(out)
        log_probs = F.log_softmax(out, dim=1)
        return log_probs