<a href="https://colab.research.google.com/github/huongtn1112/Nlp/blob/main/19021297_TranNgocHuong.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Introduction
In this assignment, you will learn how to build a language model from scratch and use the model to generate new text.
You will also see how training a language model helps you learn word representation.

Note: 
- Plagiarism will result in 0 mark.
- The following template shows how your code should look like. You are free to add more functions, change the parameters. You are not allowed to use existing implementations.


In [14]:
import torch
import torch.nn as nn
import torch.autograd as ag
from matplotlib import pyplot as plt

## Recurrent Neural Network (5 points)
To begin, you have to implement the vanila RNN in Pytorch.

Recall the formula for vanila RNN:
        \begin{eqnarray}
        h_t & = & \sigma(W_h h_{t-1} + W_x x_t + b_1) \\
        y_t & = & \phi(W_y h_t + b_2)
        \end{eqnarray}
where $\sigma$ is the usually the sigmoid activation function and $\phi$ is usually the softmax function.

Hints:
For RNNLM, the input is a squence of word_id, e.g. [10, 8, 5, 2, 101, 23]. You have to convert each word_id to an embedding vector. To implement this, you can use the `torch.nn.Embedding` class.

In [16]:
class VanilaRNNLM(nn.Module):
    def __init__(self, n_inputs, n_hiddens, n_outputs, vocab, sigma='sigmoid', phi='softmax'):
        """
        Construct a vanila RNN. 
        
        Params:
        n_inputs: number of input neurons
        n_hiddens: number of hidden neurons
        n_outputs: number of output neurons
        vocab: a dictionary of the form {word: word_id}
        sigma: activation function for hidden layer
        phi: output function
        """
        super().__init__()
        self.n_inputs = n_inputs
        self.n_hiddens = n_hiddens
        self.n_outputs = n_outputs
        self.vocab = vocab
        self.sigma = sigma
        self.phi = phi
      
        self.w_h = nn.Parameter(torch.tensor(n_hiddens, n_hiddens), requires_grad=True)
        self.w_x = nn.Parameter(torch.tensor(n_hidden, n_inputs), requires_grad=True)
        self.w_b1 = nn.Parameter(torch.tensor(1, n_hiddens), requires_grad=True)
        self.w_y = nn.Parameter(torch.tensor(n_outputs, n_hiddens), requires_grad=True)
        self.w_b2 = nn.Parameter(torch.tensor(1, n_outputs), requires_grad=True)
        self.embedding = nn.Embedding(n_outputs, embedding_dim, max_norm=True)

    
    def forward(self, xs, h0):
        """
        Params:
        xs: the input sequence [x_1, x_2, ..., x_n]. x_i is the id of the i-th word in the sequence. 
            For example, xs = [1, 3, 11, 6, 8, 2]
        h0: the initial hidden state
        
        Returns: (ys, hs) where
        ys = [y_1, y_2, ..., y_n] and
        hs = [h_1, h_2, ..., h_n]
        """
        h0 = torch.zeros(1, n_hiddens)
        h_t = torch.matmul(self.)
        nn.LogSigmoid()
        y_t = 
        pass
embedding_dim = 10
xs = [1, 2, 3]
rnn = VanilaRNN()

SyntaxError: ignored

Vanila RNN suffers from the gradient vanishing/exploding problem. Your next task is to implement a more sophisticated RNN that is more robust to gradient vanishing/exploding.

In [6]:
class FancyRNNLM(nn.Module):
    def __init__(self, n_inputs, n_hiddens, n_outputs, vocab, sigma='sigmoid', phi='softmax'):
        """
        Construct a fancy RNN, this could be LSTM, GRU, or your own invention.
        
        Params:
        n_inputs: number of input neurons
        n_hiddens: number of hidden neurons
        n_outputs: number of output neurons
        vocab: a dictionary {word: word_id}
        sigma: activation function for hidden layer
        phi: output function
        """
        super().__init__()
        self.n_inputs = n_inputs
        self.n_hiddens = n_hiddens
        self.n_outputs = n_outputs
        self.vocab = vocab
        self.sigma = sigma
        self.phi = phi

    
    def forward(self, xs, h0):
        pass

## Language Modeling with RNN (4 points)
The next step is to use our RNNs in some real world tasks. One of the most common application of RNN is language modeling.

### Data
For this assignment, we will use text data from Wikipedia. To start, download the data from this website:

https://s3.amazonaws.com/research.metamind.io/wikitext/wikitext-103-v1.zip

Some information about this dataset can be found here:

https://blog.einstein.ai/the-wikitext-long-term-dependency-language-modeling-dataset/



### Training a LM with RNN
Write the code to train RNNLMs with the VanilaRNNLM and FancyRNNLM classes above. Train 1 instance of VanilaRNNLM and 1 instance of FancyRNNLM

In [None]:
def train_rnnlm(corpus, rnnlm, **train_params):
    """
    Params:
    corpus: the text corpus
    rnnlm: the RNN
    train_params: other parameters, e.g. learning rate, batch size, number of GPUs, ...
    """
    # Your code here
    return rnnlm

In [None]:
# vanila_rnnlm = train_rnnlm(corpus=wiki_corpus_train, rnnlm=vanila_rnn)
# fancy_rnnlm = train_rnnlm(corpus=wiki_corpus_train, rnnlm=fancy_rnn)

### Generating new text with RNNLM
Write the code to generate new text segments from the RNNLM. Produce several outputs from both VanilaRNN and FancyRNN to compare the quality of 2 models.

In [None]:
def generate_text(rnnlm, seed_text, length, **params):
    """
    Params:
    rnnlm: the language model
    seed_text: a string of initial text
    length: the length of the generated text
    params: other params
    """
    pass

In [None]:
# seed_text = input("Enter your initial text")
# output_text = generate_text(rnnlm=rnnlm, seed_text=seed_text, length=100)
# print(output_text)

### Perplexity (+2 bonus points)
Compute the perplexity of the models. The lower the perplexity, the higher your score.

In [None]:
def perplexity(rnnlm, corpus):
    pass

In [None]:
# perp = perplexity(rnnlm=rnnlm, corpus=wiki_corpus_test)
# print(perp)

## Word Embedding (1 point + 1 bonus point)

Now you have trained your RNNLM, the `torch.nn.Embedding` layer in your model stores the embeddings of words in the dictionary. You can use dimensionality reduction algorithms such as PCA and TSNE to visualize the word embeddings.
Produce a 2D plot of 100 to 1000 words and write a short analysis of the plot (e.g. the clusters of words with similar meaning, arithmetic operations you can apply on these words).