<a href="https://colab.research.google.com/github/pdevineni/NLP-Portfolio/blob/main/NNLM.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## **Neural Probabilistic Language Model**

This work is based on the 2003 paper by Yoshua Bengio et al. titled **A Neural Probabilistic Language Model**

**Task:** Develop a neural probabilistic language model that can generalize to unseen data.

**Literature Survey:** Previous statistical language models have been limited by their inability to generalize to unseen data. This is because they are based on n-gram models, which only consider the last n-1 words in a sequence when predicting the next word. This means that they cannot handle new combinations of words that were not seen in the training data.

**Approach:** The proposed approach is to use a neural network to learn a probability distribution over the next word, given all the previous words in a sequence. The neural network is trained on a large corpus of text, and it is able to learn the statistical relationships between words. This allows the model to generalize to unseen data, even if it contains new combinations of words.

**Architecture:** 
The neural network architecture used in this paper is a recurrent neural network. Recurrent neural networks are able to learn long-range dependencies by feeding the output of the network back into the input. This allows the network to learn the probability distribution of words in a language, even if the words are separated by a long distance.

- The input layer consists of a vector of word embeddings, where each word embedding represents the meaning of the word.
- The hidden layer consists of a number of neurons, which are connected to the input layer by a weight matrix.
- The output layer consists of a vector of word probabilities, where each word probability represents the probability of the next word being that word.

**Novelty:** The novelty of this approach is that it is able to learn a probability distribution over the next word, given all the previous words, without explicitly stating the statistical relationships between words in the training corpus. This allows the model to generalize to new combinations of words that were not seen in the training corpus.

**Results:** The neural probabilistic language model is able to generalize better than previous statistical language models.
The neural probabilistic language model is able to achieve state-of-the-art results on a number of language modeling benchmarks.

This paper has made significant contributions to the field of language modeling. Neural probabilistic language models have become the state-of-the-art approach to language modeling, and they are used in a variety of applications, such as speech recognition, machine translation, and text generation.


In [None]:
import torch
import torch.nn as nn
import torch.optim as optim

In [None]:
def make_batch():
    input_batch = []
    target_batch = []

    for sen in sentences:
        word = sen.split() # space tokenizer
        input = [word_dict[n] for n in word[:-1]] # create (1~n-1) as input
        target = word_dict[word[-1]] # create (n) as target, We usually call this 'casual language model'

        input_batch.append(input)
        target_batch.append(target)

    return input_batch, target_batch

In [None]:
# Model
class NNLM(nn.Module):
    def __init__(self):
        super(NNLM, self).__init__()
        # The embedding layer in Py
        self.C = nn.Embedding(n_class, m)
        self.H = nn.Linear(n_step * m, n_hidden, bias=False)
        self.d = nn.Parameter(torch.ones(n_hidden))
        self.U = nn.Linear(n_hidden, n_class, bias=False)
        self.W = nn.Linear(n_step * m, n_class, bias=False)
        self.b = nn.Parameter(torch.ones(n_class))

    def forward(self, X):
        X = self.C(X) # X : [batch_size, n_step, m]
        X = X.view(-1, n_step * m) # [batch_size, n_step * m]
        tanh = torch.tanh(self.d + self.H(X)) # [batch_size, n_hidden]
        output = self.b + self.W(X) + self.U(tanh) # [batch_size, n_class]
        return output

In [None]:
if __name__ == '__main__':
    n_step = 2 # number of steps, n-1 in paper
    n_hidden = 2 # number of hidden size, h in paper
    m = 2 # embedding size, m in paper

    sentences = ["i like dog", "i love coffee", "i hate milk"]

    word_list = " ".join(sentences).split()
    word_list = list(set(word_list))
    word_dict = {w: i for i, w in enumerate(word_list)}
    number_dict = {i: w for i, w in enumerate(word_list)}
    n_class = len(word_dict)  # number of Vocabulary

    model = NNLM()

    criterion = nn.CrossEntropyLoss()
    optimizer = optim.Adam(model.parameters(), lr=0.001)

    input_batch, target_batch = make_batch()
    input_batch = torch.LongTensor(input_batch)
    target_batch = torch.LongTensor(target_batch)

    # Training
    for epoch in range(5000):
        optimizer.zero_grad()
        output = model(input_batch)

        # output : [batch_size, n_class], target_batch : [batch_size]
        loss = criterion(output, target_batch)
        if (epoch + 1) % 1000 == 0:
            print('Epoch:', '%04d' % (epoch + 1), 'cost =', '{:.6f}'.format(loss))

        loss.backward()
        optimizer.step()

    # Predict
    predict = model(input_batch).data.max(1, keepdim=True)[1]

    # Test
    print([sen.split()[:2] for sen in sentences], '->', [number_dict[n.item()] for n in predict.squeeze()])        


Epoch: 1000 cost = 0.049638
Epoch: 2000 cost = 0.010600
Epoch: 3000 cost = 0.004060
Epoch: 4000 cost = 0.001900
Epoch: 5000 cost = 0.000976
[['i', 'like'], ['i', 'love'], ['i', 'hate']] -> ['dog', 'coffee', 'milk']


In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
sentences = ["i like dog", "i love coffee", "i hate milk"]
word_list = " ".join(sentences).split()
word_list = list(set(word_list))
word_list

['milk', 'coffee', 'i', 'love', 'like', 'hate', 'dog']

In [None]:
word_dict = {w: i for i, w in enumerate(word_list)}
word_dict

{'milk': 0, 'coffee': 1, 'i': 2, 'love': 3, 'like': 4, 'hate': 5, 'dog': 6}

In [None]:
number_dict = {i: w for i, w in enumerate(word_list)}
number_dict

{0: 'milk', 1: 'coffee', 2: 'i', 3: 'love', 4: 'like', 5: 'hate', 6: 'dog'}

In [None]:
n_class = len(word_dict)
n_class

7