#**Maestría en Inteligencia Artificial Aplicada**
##**Advanced Machine Learning Methods (Gpo 10)**
###Tecnológico de Monterrey
###Profesor Ph.D. José Antonio Cantoral Ceballos

## **Activity 4**

### Text Generator

##**Team 33**:

### Humberto Lozano Cedillo A01363184
### Julio Cesar Lynn Jimenez A01793660
### Sarah Mendoza Medina A01215352
### David Mireles Samaniego A01302935

## TC 5033
### Text Generation

<br>

#### Activity 4: Building a Simple LSTM Text Generator using WikiText-2
<br>

- Objective:
    - Gain a fundamental understanding of Long Short-Term Memory (LSTM) networks.
    - Develop hands-on experience with sequence data processing and text generation in PyTorch. Given the simplicity of the model, amount of data, and computer resources, the text you generate will not replace ChatGPT, and results must likely will not make a lot of sense. Its only purpose is academic and to understand the text generation using RNNs.
    - Enhance code comprehension and documentation skills by commenting on provided starter code.
    
<br>

- Instructions:
    - Code Understanding: Begin by thoroughly reading and understanding the code. Comment each section/block of the provided code to demonstrate your understanding. For this, you are encouraged to add cells with experiments to improve your understanding

    - Model Overview: The starter code includes an LSTM model setup for sequence data processing. Familiarize yourself with the model architecture and its components. Once you are familiar with the provided model, feel free to change the model to experiment.

    - Training Function: Implement a function to train the LSTM model on the WikiText-2 dataset. This function should feed the training data into the model and perform backpropagation. 

    - Text Generation Function: Create a function that accepts starting text (seed text) and a specified total number of words to generate. The function should use the trained model to generate a continuation of the input text.

    - Code Commenting: Ensure that all the provided starter code is well-commented. Explain the purpose and functionality of each section, indicating your understanding.

    - Submission: Submit your Jupyter Notebook with all sections completed and commented. Include a markdown cell with the full names of all contributing team members at the beginning of the notebook.
    
<br>

- Evaluation Criteria:
    - Code Commenting (60%): The clarity, accuracy, and thoroughness of comments explaining the provided code. You are suggested to use markdown cells for your explanations.

    - Training Function Implementation (20%): The correct implementation of the training function, which should effectively train the model.

    - Text Generation Functionality (10%): A working function is provided in comments. You are free to use it as long as you make sure to uderstand it, you may as well improve it as you see fit. The minimum expected is to provide comments for the given function. 

    - Conclusions (10%): Provide some final remarks specifying the differences you notice between this model and the one used  for classification tasks. Also comment on changes you made to the model, hyperparameters, and any other information you consider relevant. Also, please provide 3 examples of generated texts.



In [106]:
import numpy as np
#PyTorch libraries
import torch
import torchtext
from torchtext.datasets import WikiText2
# Dataloader library
from torch.utils.data import DataLoader, TensorDataset
from torch.utils.data.dataset import random_split
# Libraries to prepare the data
from torchtext.data.utils import get_tokenizer
from torchtext.vocab import build_vocab_from_iterator
from torchtext.data.functional import to_map_style_dataset
# neural layers
from torch import nn
from torch.nn import functional as F
import torch.optim as optim
from tqdm import tqdm

import random
import time

In [107]:
device = 'cuda' if torch.cuda.is_available() else 'cpu'

In [108]:
train_dataset, val_dataset, test_dataset = WikiText2()

In [109]:
tokeniser = get_tokenizer('basic_english')
def yield_tokens(data):
    for text in data:
        yield tokeniser(text)

In [110]:
# Build the vocabulary
vocab = build_vocab_from_iterator(yield_tokens(train_dataset), specials=["<unk>", "<pad>", "<bos>", "<eos>"])
#set unknown token at position 0
vocab.set_default_index(vocab["<unk>"])

In [111]:
seq_length = 50
def data_process(raw_text_iter, seq_length = 50):
    data = [torch.tensor(vocab(tokeniser(item)), dtype=torch.long) for item in raw_text_iter]
    data = torch.cat(tuple(filter(lambda t: t.numel() > 0, data))) #remove empty tensors
#     target_data = torch.cat(d)
    return (data[:-(data.size(0)%seq_length)].view(-1, seq_length), 
            data[1:-(data.size(0)%seq_length-1)].view(-1, seq_length))  

# # Create tensors for the training set
x_train, y_train = data_process(train_dataset, seq_length)
x_val, y_val = data_process(val_dataset, seq_length)
x_test, y_test = data_process(test_dataset, seq_length)

In [112]:
train_dataset = TensorDataset(x_train, y_train)
val_dataset = TensorDataset(x_val, y_val)
test_dataset = TensorDataset(x_test, y_test)

In [118]:
batch_size = 128  # choose a batch size that fits your computation resources
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True, drop_last=True)
val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=True, drop_last=True)
test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=True, drop_last=True)

In [119]:
# Define the LSTM model
# Feel free to experiment
class LSTMModel(nn.Module):
    def __init__(self, vocab_size, embed_size, hidden_size, num_layers):
        super(LSTMModel, self).__init__()
        self.embeddings = nn.Embedding(vocab_size, embed_size)
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        self.lstm = nn.LSTM(embed_size, hidden_size, num_layers, batch_first=True)
        self.fc = nn.Linear(hidden_size, vocab_size)

    def forward(self, text, hidden):
        embeddings = self.embeddings(text)
        output, hidden = self.lstm(embeddings, hidden)
        decoded = self.fc(output)
        return decoded, hidden

    def init_hidden(self, batch_size):

        return (torch.zeros(self.num_layers, batch_size, self.hidden_size).to(device),
                torch.zeros(self.num_layers, batch_size, self.hidden_size).to(device))



vocab_size = len(vocab) # vocabulary size
emb_size = 300 # embedding size
neurons = 128 # the dimension of the feedforward network model, i.e. # of neurons 
num_layers = 2 # the number of nn.LSTM layers
model = LSTMModel(vocab_size, emb_size, neurons, num_layers)


In [115]:
def accuracy(model, loader):
    model.eval()
    total_acc, total_count = 0, 0

    with torch.no_grad():
        for i, (data, targets) in enumerate(loader):
            # Detach the hidden state between batches
            hidden = model.init_hidden(batch_size) 
            
            xi = data.to(device=device, dtype = torch.int32)
            yi = targets.to(device=device, dtype = torch.long) 
            yi_flatten = yi.view(-1)

            y_pred, hidden = model(xi, hidden)
            y_pred_max_prob, y_pred_max_idx = y_pred.max(dim=2)

            total_acc += (y_pred_max_idx.view(-1) == yi_flatten).sum().item()
            total_count += yi_flatten.size(0)
            
    return total_acc / total_count

In [116]:
from torcheval.metrics.text import Perplexity

def perplexity(model, loader):
    model.eval()
    perplexity_metric = Perplexity(device=device)
    
    with torch.no_grad():
        for i, (data, targets) in enumerate(loader):
            # Detach the hidden state between batches
            hidden = model.init_hidden(batch_size) 
            
            xi = data.to(device=device, dtype = torch.int32)
            yi = targets.to(device=device, dtype = torch.long) 

            y_pred, hidden = model(xi, hidden)

            perplexity_metric.update(y_pred, yi)
            
    return perplexity_metric.compute()

In [117]:
def train(model, epochs, optimiser, loss_function, train_loader, val_loader):
    '''
    The following are possible instructions you may want to conside for this function.
    This is only a guide and you may change add or remove whatever you consider appropriate
    as long as you train your model correctly.
        - loop through specified epochs
        - loop through dataloader
        - don't forget to zero grad!
        - place data (both input and target) in device
        - init hidden states e.g. hidden = model.init_hidden(batch_size)
        - run the model
        - compute the cost or loss
        - backpropagation
        - Update paratemers
        - Include print all the information you consider helpful
    
    '''
    
    model = model.to(device=device)
    
    for epoch in range(epochs):

        total_acc, total_count = 0, 0
        log_interval = int(len(train_loader) / 5)
        start_time = time.time()
        model.train()

        for i, (data, targets) in enumerate(train_loader):
            # Detach the hidden state between batches
            hidden = model.init_hidden(batch_size) 
            
            # data_words = []
            # for i in range(data.size(0)):
            #     for j in range(data.size(1)):
            #         idx = data[i][j]
            #         data_words.append(vocab.lookup_token(idx))
            #     data_words.append('|'*4)
            # data_str = ' '.join(data_words)
            # print(data_str)
            

            # target_words = []
            # for i in range(targets.size(0)):
            #     for j in range(targets.size(1)):
            #         idx = targets[i][j]
            #         target_words.append(vocab.lookup_token(idx))
            #     target_words.append('|'*4)
            # target_str = ' '.join(target_words)
            # print('-'*40)
            # print(target_str)
        
            xi = data.to(device=device, dtype = torch.int32)
            yi = targets.to(device=device, dtype = torch.long) # Needs to be long for the loss_function to treat the target as class indices
            yi_flatten = yi.view(-1)
            optimiser.zero_grad()

            y_pred, hidden = model(xi, hidden)
            y_pred_max_prob, y_pred_max_idx = y_pred.max(dim=2)

            # input needs to match the yi flattened size in one dimension and the vocab_size (our classes) on another
            cost = loss_function(input=y_pred.view(batch_size*seq_length, vocab_size), target=yi_flatten)
            cost.backward()

            torch.nn.utils.clip_grad_norm_(model.parameters(), 0.1)
            optimiser.step()

            total_acc += (y_pred_max_idx.view(-1) == yi_flatten).sum().item()
            total_count += yi_flatten.size(0)

            perplexity_metric = Perplexity(device=device)
            perplexity_metric.update(y_pred, yi)
            perplex_test = perplexity_metric.compute()
            
            if i % log_interval == 0 and i > 0:
                print(
                    "| epoch {:3d} | {:5d}/{:5d} batches "
                    "| accuracy {:8.3f} | perplexity {:8.3f}".format(
                        epoch, i, len(train_loader), total_acc / total_count, perplex_test
                    )
                )
            total_acc, total_count = 0, 0

        accu_val = accuracy(model, val_loader)
        perplex_val = perplexity(model, val_loader)

        print("-" * 59)
        print(
            "| end of epoch {:3d} | time: {:5.2f}s | "
            "valid accuracy {:8.3f} | valid perplexity {:8.3f} ".format(
                epoch, time.time() - start_time, accu_val, perplex_val
            )
        )
        print("-" * 59)
                    

In [121]:
# Call the train function
loss_function = nn.CrossEntropyLoss()
lr = 0.0005
epochs = 3
optimiser = optim.Adam(model.parameters(), lr=lr)
train(model, epochs, optimiser, loss_function, train_loader, val_loader)

| epoch   0 |    64/  320 batches | accuracy    0.065 | perplexity 1277.424
| epoch   0 |   128/  320 batches | accuracy    0.065 | perplexity 1095.540
| epoch   0 |   192/  320 batches | accuracy    0.065 | perplexity 1077.024
| epoch   0 |   256/  320 batches | accuracy    0.064 | perplexity 1088.238
-----------------------------------------------------------
| end of epoch   0 | time: 58.78s | valid accuracy    0.069 | valid perplexity  806.268 
-----------------------------------------------------------
| epoch   1 |    64/  320 batches | accuracy    0.066 | perplexity  977.097
| epoch   1 |   128/  320 batches | accuracy    0.068 | perplexity  931.063
| epoch   1 |   192/  320 batches | accuracy    0.083 | perplexity  908.169
| epoch   1 |   256/  320 batches | accuracy    0.121 | perplexity  849.101
-----------------------------------------------------------
| end of epoch   1 | time: 58.47s | valid accuracy    0.118 | valid perplexity  698.133 
----------------------------------

In [None]:
def generate_text(model, start_text, num_words, temperature=1.0):
    '''
    model.eval()
    words = tokeniser(start_text)
    hidden = model.init_hidden(1)
    for i in range(0, num_words):
        x = torch.tensor([[vocab[word] for word in words[i:]]], dtype=torch.long, device=device)
        y_pred, hidden = model(x, hidden)
        last_word_logits = y_pred[0][-1]
        p = (F.softmax(last_word_logits / temperature, dim=0).detach()).to(device='cpu').numpy()
        word_index = np.random.choice(len(last_word_logits), p=p)
        words.append(vocab.lookup_token(word_index))

    return ' '.join(words)
    '''
    
    model.eval()
    words = tokeniser(start_text)
    hidden = model.init_hidden(1)
    for i in range(0, num_words):
        x = torch.tensor([[vocab[word] for word in words[i:]]], dtype=torch.long, device=device)
        y_pred, hidden = model(x, hidden)
        last_word_logits = y_pred[0][-1]
        p = (F.softmax(last_word_logits / temperature, dim=0).detach()).to(device='cpu').numpy()
        word_index = np.random.choice(len(last_word_logits), p=p)
        words.append(vocab.lookup_token(word_index))

    return ' '.join(words)

In [None]:
# Generate some text
print(generate_text(model, start_text="I like", num_words=100, temperature=0.9))

In [None]:
print(generate_text(model, start_text="Mexico (Spanish: México), officially the United Mexican States, is a  ", num_words=100, temperature=0.9))

mexico ( spanish méxico ) , officially the united mexican states , is a of people for many <unk> and shifted gold for the <unk> of the <unk> , he who was money . building canaan oscar to the single , the churchyard saw the intersection as a track and the tuition , and it to get the right into support and inventory . , they ' s are captain , such as future <unk> , was expressed operating in the ninth river in the port , and the historic @-@ u education and that it said that they were a real what wished to love . it was played one of the episode


In [None]:
def weights_init(m):
    if isinstance(m, nn.LSTM):
        torch.nn.init.kaiming_uniform_(m.weight)

In [122]:
# Define the LSTM model
# Feel free to experiment
class LSTMModel2(nn.Module):
    def __init__(self, vocab_size, embed_size, hidden_size, num_layers):
        super(LSTMModel2, self).__init__()
        self.embeddings = nn.Embedding(vocab_size, embed_size)
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        self.lstm = nn.LSTM(embed_size, hidden_size=hidden_size, num_layers=num_layers, batch_first=True, dropout=0.5)
        self.dropout = nn.Dropout(0.5)

        self.fc1 = nn.Linear(hidden_size, 500)
        nn.init.kaiming_normal_(self.fc1.weight)
        self.fc2 = nn.Linear(500, vocab_size)
        nn.init.kaiming_normal_(self.fc2.weight)

    def forward(self, text, hidden):
        embeddings = self.embeddings(text)
        output_lstm1, hidden = self.lstm(embeddings, hidden)
        out_drop1 = self.dropout(output_lstm1)
        out_fc1 = self.fc1(out_drop1)
        decoded = self.fc2(out_fc1)
        return decoded, hidden

    def init_hidden(self, batch_size):

        return (torch.zeros(self.num_layers, batch_size, self.hidden_size).to(device),
                torch.zeros(self.num_layers, batch_size, self.hidden_size).to(device))



vocab_size = len(vocab) # vocabulary size
emb_size = 300 # embedding size
neurons = 256 # the dimension of the feedforward network model, i.e. # of neurons 
num_layers = 5 # the number of nn.LSTM layers
model2 = LSTMModel2(vocab_size, emb_size, neurons, num_layers)

In [124]:
loss_function = nn.CrossEntropyLoss()
lr = 0.0005
epochs = 3
optimiser = optim.Adam(model2.parameters(), lr=lr)
train(model2, epochs, optimiser, loss_function, train_loader, val_loader)

| epoch   0 |    64/  320 batches | accuracy    0.062 | perplexity 1170.618
| epoch   0 |   128/  320 batches | accuracy    0.097 | perplexity  929.465
| epoch   0 |   192/  320 batches | accuracy    0.108 | perplexity  808.244
| epoch   0 |   256/  320 batches | accuracy    0.120 | perplexity  724.486
-----------------------------------------------------------
| end of epoch   0 | time: 111.23s | valid accuracy    0.141 | valid perplexity  492.439 
-----------------------------------------------------------
| epoch   1 |    64/  320 batches | accuracy    0.134 | perplexity  634.047
| epoch   1 |   128/  320 batches | accuracy    0.139 | perplexity  522.719
| epoch   1 |   192/  320 batches | accuracy    0.137 | perplexity  576.984
| epoch   1 |   256/  320 batches | accuracy    0.139 | perplexity  568.515
-----------------------------------------------------------
| end of epoch   1 | time: 116.27s | valid accuracy    0.163 | valid perplexity  375.683 
--------------------------------

In [None]:
accuracy(model2, test_loader)

0.23424166666666665

In [None]:
perplexity(model2, test_loader)

tensor(147.2586, device='cuda:0', dtype=torch.float64)

In [None]:
# Execute this if cuda runs out of memory
import gc
gc.collect()

torch.cuda.empty_cache()

In [None]:
print(generate_text(model2, start_text="I like music", num_words=100, temperature=0.9))


i like music , and smoking that the whole hits of dylan ' s stage performance . the song has been released after their release , following the release . the same day is the original @-@ up website , calling the <unk> and the eye score in the game . in the episode , shot and a female , inspired that us @-@ owner through a 10th @-@ century @-@ legal @-@ pop mystery lesson , i can like more important stories . but i ' re going in everything to be originally used following , thus singing a label . after


In [None]:
print(generate_text(model2, start_text="What I like the most is ", num_words=5, temperature=0.9))

what i like the most is destroyed , ( also infected


In [None]:
print(generate_text(model2, start_text="Mexico (Spanish: México), officially the United Mexican States, is a  ", num_words=10, temperature=1))

mexico ( spanish méxico ) , officially the united mexican states , is a indonesian movement , a series of special intellectual cities ,


# Using GloVe embeddings

In [13]:
from torchtext.vocab import GloVe

global_vectors = GloVe(name='6B', dim=300)

In [14]:
train_dataset, val_dataset, test_dataset = WikiText2()

In [15]:
train_dataset, val_dataset, test_dataset = to_map_style_dataset(train_dataset), to_map_style_dataset(val_dataset), to_map_style_dataset(test_dataset)

In [21]:
# Build the vocabulary
vocab = build_vocab_from_iterator(yield_tokens(train_dataset), specials=["<unk>", "<pad>", "<bos>", "<eos>"])
#set unknown token at position 0
vocab.set_default_index(vocab["<unk>"])

In [54]:
embed_len = 300
seq_length = 50
seq_lenght_extra = seq_length + 1
def embed_vector_batch(batch):
    X_embed = torch.zeros(len(batch), seq_length, embed_len)
    Y = torch.zeros(len(batch), seq_length)

    for i, batch_item in enumerate(batch):      
        tokens = tokeniser(batch_item) 
        clean_tokens = []

        for token in tokens:
            if token == '': 
                continue
            clean_tokens.append(token)

        if (len(clean_tokens)) < seq_lenght_extra:
            continue
        

        vector = global_vectors.get_vecs_by_tokens(clean_tokens[:seq_lenght_extra])
        if len(vector) >= seq_length:
            X_embed[i] = vector[0:-1]
            # Shift all tokens to make Y the next word prediction equivalent of X
            Y[i] = torch.tensor(vocab(clean_tokens[1:seq_lenght_extra]), dtype=torch.long) 

    return X_embed, Y

In [55]:
batch_size = 128  # choose a batch size that fits your computation resources
train_glove_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True, drop_last=True, collate_fn=embed_vector_batch)
val_glove_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=True, drop_last=True, collate_fn=embed_vector_batch)
test_glove_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False, drop_last=True, collate_fn=embed_vector_batch)

In [58]:
# Define the LSTM model
# Feel free to experiment
class LSTMModel_GloVe(nn.Module):
    def __init__(self, vocab_size, embed_size, hidden_size, num_layers):
        super(LSTMModel_GloVe, self).__init__()
        
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        self.lstm = nn.LSTM(embed_size, hidden_size=hidden_size, num_layers=num_layers, batch_first=True)
        self.dropout = nn.Dropout(0.5)

        self.fc = nn.Linear(hidden_size, vocab_size)
        nn.init.kaiming_normal_(self.fc.weight)


    def forward(self, X, hidden):
        output, hidden = self.lstm(X, hidden)
        output = self.dropout(output)
        output = self.fc(output)
        return output, hidden

    def init_hidden(self, batch_size):
        return (torch.zeros(self.num_layers, batch_size, self.hidden_size).to(device),
                torch.zeros(self.num_layers, batch_size, self.hidden_size).to(device))



vocab_size = len(vocab) # vocabulary size
emb_size = 300 # embedding size
neurons = emb_size # the dimension of the feedforward network model, i.e. # of neurons 
num_layers = 2 # the number of nn.LSTM layers
model_glove = LSTMModel_GloVe(vocab_size, emb_size, neurons, num_layers)

In [63]:
def accuracy_glove(model, loader):
    model.eval()
    total_acc, total_count = 0, 0

    with torch.no_grad():
        for i, (data, targets) in enumerate(loader):
            # Detach the hidden state between batches
            hidden = model.init_hidden(batch_size) 
            
            xi = data.to(device=device, dtype = torch.float32)
            yi = targets.to(device=device, dtype = torch.long) 
            yi_flatten = yi.view(-1)

            y_pred, hidden = model(xi, hidden)
            y_pred_max_prob, y_pred_max_idx = y_pred.max(dim=2)

            total_acc += (y_pred_max_idx.view(-1) == yi_flatten).sum().item()
            total_count += yi_flatten.size(0)
            
    return total_acc / total_count

In [64]:
from torcheval.metrics.text import Perplexity

def perplexity_glove(model, loader):
    model.eval()
    perplexity_metric = Perplexity(device=device)
    
    with torch.no_grad():
        for i, (data, targets) in enumerate(loader):
            # Detach the hidden state between batches
            hidden = model.init_hidden(batch_size) 
            
            xi = data.to(device=device, dtype = torch.float32)
            yi = targets.to(device=device, dtype = torch.long) 

            y_pred, hidden = model(xi, hidden)

            perplexity_metric.update(y_pred, yi)
            
    return perplexity_metric.compute()

In [95]:
def train_glove(model, epochs, optimiser, loss_function, train_loader, val_loader):
    '''
    The following are possible instructions you may want to conside for this function.
    This is only a guide and you may change add or remove whatever you consider appropriate
    as long as you train your model correctly.
        - loop through specified epochs
        - loop through dataloader
        - don't forget to zero grad!
        - place data (both input and target) in device
        - init hidden states e.g. hidden = model.init_hidden(batch_size)
        - run the model
        - compute the cost or loss
        - backpropagation
        - Update paratemers
        - Include print all the information you consider helpful
    
    '''
    
    model = model.to(device=device)
    
    for epoch in range(epochs):

        total_acc, total_count = 0, 0
        log_interval = int(len(train_loader) / 5)
        start_time = time.time()
        model.train()

        for i, (data, targets) in enumerate(train_loader):
            # Detach the hidden state between batches
            hidden = model.init_hidden(batch_size) 
        
            xi = data.to(device=device, dtype = torch.float32)
            yi = targets.to(device=device, dtype = torch.long)
            yi_flatten = yi.view(-1)
            optimiser.zero_grad()

            y_pred, hidden = model(xi, hidden)
            y_pred_max_prob, y_pred_max_idx = y_pred.max(dim=2)

            # input needs to match the yi flattened size in one dimension and the vocab_size (our classes) on another
            cost = loss_function(input=y_pred.view(batch_size*seq_length, vocab_size), target=yi_flatten)
            cost.backward()

            torch.nn.utils.clip_grad_norm_(model.parameters(), 0.1)
            optimiser.step()

            total_acc += (y_pred_max_idx.view(-1) == yi_flatten).sum().item()
            total_count += yi_flatten.size(0)

            perplexity_metric = Perplexity(device=device)
            perplexity_metric.update(y_pred, yi)
            perplex_test = perplexity_metric.compute()
            
            if i % log_interval == 0 and i > 0:
                print(
                    "| epoch {:3d} | {:5d}/{:5d} batches "
                    "| accuracy {:8.3f} | perplexity {:8.3f}".format(
                        epoch, i, len(train_loader), total_acc / total_count, perplex_test
                    )
                )
            total_acc, total_count = 0, 0

        accu_val = accuracy_glove(model, val_loader)
        perplex_val = perplexity_glove(model, val_loader)

        print("-" * 59)
        print(
            "| end of epoch {:3d} | time: {:5.2f}s | "
            "valid accuracy {:8.3f} | valid perplexity {:8.3f} ".format(
                epoch, time.time() - start_time, accu_val, perplex_val
            )
        )
        print("-" * 59)
                    

In [96]:
loss_function = nn.CrossEntropyLoss()
lr = 0.0005
epochs = 3
optimiser = optim.Adam(model_glove.parameters(), lr=lr)
train_glove(model_glove, epochs, optimiser, loss_function, train_glove_loader, val_glove_loader)

| epoch   0 |    57/  286 batches | accuracy    0.710 | perplexity    7.947
| epoch   0 |   114/  286 batches | accuracy    0.708 | perplexity    8.256
| epoch   0 |   171/  286 batches | accuracy    0.755 | perplexity    5.804
| epoch   0 |   228/  286 batches | accuracy    0.699 | perplexity    8.991
| epoch   0 |   285/  286 batches | accuracy    0.757 | perplexity    5.840
-----------------------------------------------------------
| end of epoch   0 | time: 86.57s | valid accuracy    0.695 | valid perplexity    8.402 
-----------------------------------------------------------
| epoch   1 |    57/  286 batches | accuracy    0.708 | perplexity    7.845
| epoch   1 |   114/  286 batches | accuracy    0.656 | perplexity   11.528
| epoch   1 |   171/  286 batches | accuracy    0.673 | perplexity   10.393
| epoch   1 |   228/  286 batches | accuracy    0.747 | perplexity    5.865
| epoch   1 |   285/  286 batches | accuracy    0.791 | perplexity    4.423
-------------------------------

In [97]:
torch.save(model_glove, "lstm_glove_model.h5")

In [98]:
accuracy_glove(model_glove, test_glove_loader)

0.7042647058823529

In [99]:
perplexity_glove(model_glove, test_glove_loader)

tensor(7.6612, device='cuda:0', dtype=torch.float64)

In [87]:
def generate_text_glove(model, start_text, num_words, temperature=1.0):
    '''
    model.eval()
    words = tokeniser(start_text)
    hidden = model.init_hidden(1)
    for i in range(0, num_words):
        x = torch.tensor([[vocab[word] for word in words[i:]]], dtype=torch.long, device=device)
        y_pred, hidden = model(x, hidden)
        last_word_logits = y_pred[0][-1]
        p = (F.softmax(last_word_logits / temperature, dim=0).detach()).to(device='cpu').numpy()
        word_index = np.random.choice(len(last_word_logits), p=p)
        words.append(vocab.lookup_token(word_index))

    return ' '.join(words)
    '''
    
    model.eval()
    words = tokeniser(start_text)

    # while len(words) < 50:
    #     words.append('')

    hidden = model.init_hidden(1)
    for i in range(0, num_words):
        # x = torch.tensor(global_vectors.get_vecs_by_tokens(words), dtype=torch.float32, device=device)
        x = global_vectors.get_vecs_by_tokens(words)
        x = x.to(device=device, dtype=torch.float32)
        y_pred, hidden = model(x.reshape([1, len(words), emb_size]), hidden)
        last_word_logits = y_pred[0][-1]
        p = (F.softmax(last_word_logits / temperature, dim=0).detach()).to(device='cpu').numpy()
        word_index = np.random.choice(len(last_word_logits), p=p)
        words.append(vocab.lookup_token(word_index))

    return ' '.join(words)

In [100]:
# Generate some text
print(generate_text_glove(model_glove, start_text="I like", num_words=100, temperature=0.9))

i like various critics . in the explained in britain , the oldest <unk> also also screened by the <unk> , as a major male and stem jackrabbit ( prince and 73 metres ) . it is discovered on the commercial road of khouw and entrance in the northern order . inscriptions praised that a small face of the <unk> ( long between the 8th ) has been established under the domestic bc . branch of the naval church , croatia , are delicate shaped @-@ , marina , <unk> and rape , which in the turning of caribbean areas area .


In [101]:
print(generate_text_glove(model_glove, start_text="I like music", num_words=100, temperature=0.9))

i like music ' s <unk> and the british ' s name books . in the world , musical guitar , lisa hero and robert <unk> , as these window , based to in each of his inventor applied except by president ' s car , but he towed police armor . it was statehood in london and 1924 in henry powers ( kings of the italian <unk> in america ) . a result of halmahera is took yet racial from the <unk> as one of the feast of <unk> ( <unk> ) in zhou . night , causing turkey , while in


In [105]:
print(generate_text_glove(model_glove, start_text="Mexico (Spanish: México), officially the United Mexican States, is a  ", num_words=10, temperature=1))

mexico ( spanish méxico ) , officially the united mexican states , is a dog , hawaii <unk> in sagebrush . it was from
