# LSTM Bot

## Project Overview

In this project, you will build a chatbot that can converse with you at the command line. The chatbot will use a Sequence to Sequence text generation architecture with an LSTM as it's memory unit. You will also learn to use pretrained word embeddings to improve the performance of the model. At the conclusion of the project, you will be able to show your chatbot to potential employers.

Additionally, you have the option to use pretrained word embeddings in your model. We have loaded Brown Embeddings from Gensim in the starter code below. You can compare the performance of your model with pre-trained embeddings against a model without the embeddings.



---



A sequence to sequence model (Seq2Seq) has two components:
- An Encoder consisting of an embedding layer and LSTM unit.
- A Decoder consisting of an embedding layer, LSTM unit, and linear output unit.

The Seq2Seq model works by accepting an input into the Encoder, passing the hidden state from the Encoder to the Decoder, which the Decoder uses to output a series of token predictions.

## Dependencies

- Pytorch
- Numpy
- Pandas
- NLTK
- Gzip
- Gensim


Please choose a dataset from the Torchtext website. We recommend looking at the Squad dataset first. Here is a link to the website where you can view your options:

- https://pytorch.org/text/stable/datasets.html





In [2]:
# Project Steps Overview and Estimated Duration
# Below you will find each of the components of the project, and estimated times to complete each portion. 
# These are estimates and not exact timings to help you expect the amount of time necessary to put aside to work on your project.

# Prepare data (~2 hours)
# Build your vocabulary from a corpus of language data. The Vocabulary object is described in Lesson Six: Seq2Seq.

# Build Model (~4 hours)
# Build your Encoder, Decoder, and larger Sequence to Sequence pattern in PyTorch. This pattern is described in Lesson Six: Seq2Seq.

# Train Model (~3 hours)
# Write your training procedure and divide your dataset into train/test/validation splits. Then, train your network and plot your evaluation metrics. Save your model after it reaches a satisfactory level of accuracy.

# Evaluate & Interact w/ Model (~1 hour)
# Write a script to interact with your network at the command line.

In [3]:
# Instructions Summary
# The LSTM Chatbot will help you show off your skills as a deep learning practitioner. You will develop the chatbot using a new architecture called a Seq2Seq. 
# Additionally, you can use pre-trained word embeddings to improve the performance of your model. Let's get started by following the steps below:

# Step 1: Build your Vocabulary & create the Word Embeddings
# The most important part of this step is to create your Vocabulary object using a corpus of data drawn from TorchText.

# (Extra Credit)
# Use Gensim to extract the word embeddings from one of its corpus'.
# Use NLTK and Gensim to create a function to clean your text and look up the index of a word's embeddings.

# Step 2: Create the Encoder
# A Seq2Seq architecture consists of an encoder and a decoder unit. You will use Pytorch to build a full Seq2Seq model.
# The first step of the architecture is to create an encoder with an LSTM unit.

# (Extra Credit)
# Load your pretrained embeddings into the LSTM unit.

# Step 3: Create the Decoder
# The second step of the architecture is to create a decoder using a second LSTM unit.

# Step 4: Combine them into a Seq2Seq Architecture
# To finalize your model, you will combine the encoder and decoder units into a working model.
# The Seq2Seq2 model must be able to instantiate the encoder and decoder. Then, it will accept the inputs for these units and manage their interaction to get an output using the forward pass function.

# Step 5: Train & evaluate your model
# Finally you will train and evaluate your model using a Pytorch training loop.

# Step 6: Interact with the Chatbot
# Demonstrate your chatbot by converting the outputs of the model to text and displaying it's responses at the command line.

In [1]:
# Pre-requisites: 
# - PyTorch 2.00 kernel
# - ml.g5.xlarge instance

# Install requirements
# !pip install gensim==4.3.1 nltk==3.8.1 torchtext torchdata portalocker | grep -v "already satisfied"

# !pip install gensim==4.3.1 nltk==3.8.1  torchtext==0.15.1 portalocker>=2.0.0 | grep -v "already satisfied"
!pip install gensim==4.3.1 nltk==3.8.1  torchtext==0.6.0   | grep -v "already satisfied"

#  torchtext==0.12.0 --> torch==1.11.0
#  torchtext==0.13.0 --> torch==1.12.0
#  torchtext==0.14.0 --> torch==1.12.0
# torchtext==0.15.1 --> torch==2.0.0
# torchtext==0.15.2 --> torch==2.0.1

# !pip install gensim==4.3.1 nltk==3.8.1  torchtext==0.10.0 | grep -v "already satisfied"

# !pip install gensim==4.2.0 nltk torchtext  | grep -v "already satisfied"

[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.1.2[0m[39;49m -> [0m[32;49m23.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


## Dataset load

In [2]:
import gensim
import nltk
import numpy as np
import pandas as pd
import gzip
import torch
import torch.nn as nn
from nltk.corpus import brown
from nltk.tokenize import word_tokenize
# import sklearn.model_selection 
from torchtext.utils import download_from_url
from torchtext.data import Field, BucketIterator, TabularDataset
import random
import json

nltk.download('brown')
nltk.download('punkt')

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
# Output, save, and load brown embeddings

# model = gensim.models.Word2Vec(brown.sents())
# model.save('brown.embedding')

# w2v = gensim.models.Word2Vec.load('brown.embedding')

question_context_field = Field(tokenize=word_tokenize, init_token='<sos>', eos_token='<eos>', lower=True)
answer_field = Field(tokenize=word_tokenize, init_token='<sos>', eos_token='<eos>', lower=True)

[nltk_data] Downloading package brown to /root/nltk_data...
[nltk_data]   Package brown is already up-to-date!
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


In [32]:
def prepare_text(sentence):
    return sentence

def loadDF(path):    
    data_file_path = download_from_url(path, root="data")
    
    with open(data_file_path, 'r') as f:
        squad_data = json.load(f)
        
    data = []
    examples = []
    for item in squad_data['data']:
        for paragraph in item['paragraphs']:
            context = paragraph['context']
            for qa in paragraph['qas']:
                question = qa['question']                
                # question_context = f"{question} {separator_token} {context}"
                question_context = f"{question} {context}"
                answer = qa['answers'][0]['text']
                
                data.append((question_context, answer))                                
                
    df = pd.DataFrame.from_records(data, columns=['question_context', 'answer'])        
    df['question_context'] = df['question_context'].apply(prepare_text)
    df['answer'] = df['answer'].apply(prepare_text)

    return df


def load_datasets(df, random_seed=42):
    df.to_csv("data/data.csv", index=False)
        
    fields = [('question_context', question_context_field), ('answer', answer_field)]
    dataset = TabularDataset("data/data.csv", format='csv', fields=fields)
    
    question_context_field.build_vocab(dataset, min_freq=1)
    answer_field.build_vocab(dataset, min_freq=1)

    train_data, valid_data = dataset.split(split_ratio=0.8, random_state=random.seed(random_seed))
    
    return train_data, valid_data


def decode_answer(tokens):
    return tokens_to_string(tokens, answer_field)

def encode_question(question_string):
    return string_to_tensor(question_string.lower(), question_context_field)
    
def tokens_to_string(tokens, field):
    eos_idx = field.vocab.stoi[field.eos_token]
    
    return " ".join([field.vocab.itos[token] for token in tokens if token != eos_idx])

def string_to_tensor(string, field):
    tokens = field.tokenize(string)

    tokens = ['<sos>'] + tokens + ['<eos>'] 

    token_ids = field.numericalize([tokens])
    
    return token_ids

    # tokens_to_string(token_ids.squeeze().tolist(), question_context_field)    
    
class PrintToBoth:
    def __init__(self, filename):
        self.file = open(filename, 'w')

    def print(self, s):
        print(s)
        print(s, file=self.file, flush=True)    

In [5]:
df = loadDF("https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v1.1.json")

train_data, valid_data = load_datasets(df)

In [30]:
print(len(question_context_field.vocab))

print(len(answer_field.vocab))

102396
42794


In [31]:
df

Unnamed: 0,question_context,answer
0,To whom did the Virgin Mary allegedly appear i...,Saint Bernadette Soubirous
1,What is in front of the Notre Dame Main Buildi...,a copper statue of Christ
2,The Basilica of the Sacred heart at Notre Dame...,the Main Building
3,What is the Grotto at Notre Dame? Architectura...,a Marian place of prayer and reflection
4,What sits on top of the Main Building at Notre...,a golden statue of the Virgin Mary
...,...,...
87594,In what US state did Kathmandu first establish...,Oregon
87595,What was Yangon previously known as? Kathmandu...,Rangoon
87596,With what Belorussian city does Kathmandu have...,Minsk
87597,In what year did Kathmandu create its initial ...,1975


In [8]:
print(df.iloc[20]['question_context'])

print(df.iloc[20]['answer'])

What entity provides help with the management of time for new students at Notre Dame? All of Notre Dame's undergraduate students are a part of one of the five undergraduate colleges at the school or are in the First Year of Studies program. The First Year of Studies program was established in 1962 to guide incoming freshmen in their first year at the school before they have declared a major. Each student is given an academic advisor from the program who helps them to choose classes that give them exposure to any major in which they are interested. The program also includes a Learning Resource Center which provides time management, collaborative learning, and subject tutoring. This program has been recognized previously, by U.S. News & World Report, as outstanding.
Learning Resource Center


## Neural Network

In [9]:
import random

class Encoder(nn.Module):
    
    def __init__(self, input_size, hidden_size, embedding_size, num_layers=1, dropout=0):
        
        super(Encoder, self).__init__()
        
        self.input_size = input_size
        self.hidden_size = hidden_size
        
        self.hidden = torch.zeros(1, 1, hidden_size)        # How to use it ????
        # self.embedding_dim = embedding_size
        
        self.embedding = nn.Embedding(input_size, embedding_size)
        
        self.lstm = nn.LSTM(embedding_size, hidden_size, num_layers, dropout=dropout)

        self.dropout = nn.Dropout(dropout)
    
    def forward(self, i):
        embedded = self.embedding(i)
        
        embedded = self.dropout(embedded)

        output, (hidden, cell) = self.lstm(embedded)
                
        return output, hidden, cell
    

class Decoder(nn.Module):
      
    def __init__(self, hidden_size, output_size, embedding_size, num_layers=1, dropout=0):
        
        super(Decoder, self).__init__()
        
        self.hidden_size = hidden_size
        self.output_size = output_size
        self.embedding_size = embedding_size
        
        # self.embedding provides a vector representation of the target to our model
        # self.embedding = nn.Embedding(self.hidden_size, self.hidden_size)  # From lesson
        
        self.embedding = nn.Embedding(output_size, embedding_size)
        # self.embedding = nn.Embedding(hidden_size, embedding_size) # Why ?
        
        
        # self.lstm, accepts the embeddings and outputs a hidden state
        # self.lstm = nn.LSTM(self.embedding_size, self.hidden_size, num_layers, dropout=(0 if num_layers == 1 else dropout))
        self.lstm = nn.LSTM(embedding_size, hidden_size, num_layers, dropout=dropout)

        # self.ouput, predicts on the hidden state via a linear output layer
        self.fc = nn.Linear(hidden_size, output_size)
        
        self.dropout = nn.Dropout(dropout)
        
    def forward(self, input, hidden, cell):
        
        # print('[Decoder] input shape:', input.shape)
        
        input = input.unsqueeze(0)
        
        embedded = self.embedding(input)
        
        embedded = self.dropout(embedded)
            
        output, (hidden, cell) = self.lstm(embedded, (hidden, cell)) 
        
        
        prediction = self.fc(output.squeeze(0))
        
        # print("prediction", prediction)
                
        return prediction, hidden, cell
        
        

class Seq2Seq(nn.Module):
        
    def __init__(self, encoder, decoder):
        
        super(Seq2Seq, self).__init__()
        
        self.encoder = encoder
        self.decoder = decoder        

    def forward(self, src, trg, teacher_forcing_ratio = 0.5):              
        #src = [src len, batch size]
        #trg = [trg len, batch size]
                
        batch_size = trg.shape[1]
        trg_len = trg.shape[0]
        trg_vocab_size = self.decoder.output_size
                       
        outputs = torch.zeros(trg_len, batch_size, trg_vocab_size).to(device)
        
        _, hidden, cell = self.encoder(src)
        
        # Start with <sos> tokens
        input = trg[0, :]
                
        for t in range(1, trg_len):

            output, hidden, cell = self.decoder(input, hidden, cell)

            outputs[t] = output
            
            # get highest predicted token
            top1 = output.argmax(1)
            
            # print("Most likely token: ", top1)
            
            use_teacher_forcing = random.random() < teacher_forcing_ratio

            input = trg[t] if use_teacher_forcing else top1
    
        return outputs


class Seq2SeqInference(nn.Module):
        
    def __init__(self, encoder, decoder, answer_field):
        
        super(Seq2SeqInference, self).__init__()
        
        self.encoder = encoder
        self.decoder = decoder        

    def forward(self, src, max_length=20):
        #src = [src len, 1]
        #trg = [trg len, 1]
        # expected output shape = [max_length, 1]
        
        if src.shape[1] != 1:
            raise ValueError(f"src.shape[1] != 1: {src.shape[1]}")
            
        batch_size = 1 # src.shape[1]

        trg_vocab_size = self.decoder.output_size
                                   
        logits = torch.zeros(max_length + 1, batch_size, trg_vocab_size).to(device)
        # outputs = torch.zeros(max_length + 1, batch_size).to(device)
        
        _, hidden, cell = self.encoder(src)
        
        # Start with <sos> tokens
        # TODO: create array of <sos> tokens      
        sos_idx = answer_field.vocab.stoi[answer_field.init_token] 
        
        # input = trg[0, :]
        
        # print("input shape", trg.shape)
        input = torch.tensor([sos_idx]).to(device)        
        # print("Start token: ", input)
        
        inferred_tokens = []
                
        # for t in range(1, trg_len):
        for t in range(1, max_length + 1):

            output, hidden, cell = self.decoder(input, hidden, cell)

            logits[t] = output
            
            # Get highest predicted token
            top1 = output.argmax(1)
            
            # print(f"Top1: {top1} {top1.item()}")
            # print(f"Most likely token: {top1.item()} ({answer_field.vocab.itos[top1.item()]})")
            inferred_tokens.append(top1)
                        
            input = top1                        
        
        inferred_tokens_tensor = torch.tensor(inferred_tokens).view(-1, 1)
        
        logits_dim = logits.shape[-1]
        logits = logits[1:].view(-1, logits_dim)
        
        return inferred_tokens_tensor, logits


## Model training

In [40]:
from torch.utils.data import DataLoader, TensorDataset

# Best hyperparameters: {'encoder_dropout': 0.6039111450304973, 'decoder_dropout': 0.33154046338693915, 'embedding_size': 468, 'hidden_size': 938, 'num_layers': 1, 'momentum': 0.5494925017261035, 'learning_rate': 0.04835390488901119, 'weight_decay': 0.00014738274625279338, 'batch_size': 132}
# Validation loss: 6.840339520820101

# Define the model and other parameters
encoder_input_size = len(question_context_field.vocab) 
encoder_embedding_size= 468 #256 
encoder_dropout = 0.6039111450304973 #0.1 

decoder_output_size = len(answer_field.vocab)
decoder_embedding_size = 256 #362 #300
decoder_dropout = 0.33154046338693915 #0.1 

hidden_size = 938 #256 #512 #1602
num_layers = 1
batch_size = 132 #64 #85 #128

learning_rate = 0.04835390488901119 #0.003 
weight_decay= 0.00014738274625279338 #1e-3
momentum= 0.5494925017261035 #0.5

num_epochs =  100

encoder = Encoder(encoder_input_size, hidden_size, encoder_embedding_size, num_layers, encoder_dropout)
decoder = Decoder(hidden_size, decoder_output_size, decoder_embedding_size, num_layers, decoder_dropout)
    
model = Seq2Seq(encoder, decoder).to(device)

train_iterator, valid_iterator = BucketIterator.splits(
    (train_data, valid_data),
   batch_size=batch_size,
   sort_within_batch=True,
    sort_key = lambda x: len(x.question_context),
    device=device)


# Init model weights
for name, param in model.named_parameters():
    nn.init.uniform_(param.data, -0.08, 0.08)


Seq2Seq(
  (encoder): Encoder(
    (embedding): Embedding(102396, 468)
    (lstm): LSTM(468, 938, dropout=0.6039111450304973)
    (dropout): Dropout(p=0.6039111450304973, inplace=False)
  )
  (decoder): Decoder(
    (embedding): Embedding(42794, 256)
    (lstm): LSTM(256, 938, dropout=0.33154046338693915)
    (fc): Linear(in_features=938, out_features=42794, bias=True)
    (dropout): Dropout(p=0.33154046338693915, inplace=False)
  )
)

In [41]:
import torch
import torch.nn as nn
import torch.optim as optim
import time
        
now = datetime.datetime.now()
train_output_filename = f'train_output_{now.strftime("%Y%m%d_%H%M%S")}.txt'        
tee_print = PrintToBoth(train_output_filename)       

# learning_rate = 0.001
pad_idx = answer_field.vocab.stoi[answer_field.pad_token] 
criterion = nn.CrossEntropyLoss(ignore_index = pad_idx)

# optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate, weight_decay=weight_decay)
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate, momentum=momentum, weight_decay=weight_decay)
scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='min', factor=0.1, patience=5, verbose=True, threshold=0.01) # YOUR CODE HERE

# num_epochs =  20
clip = 5
valid_loss_min = None

for epoch in range(1, num_epochs + 1):
    start_time = time.time()
    model.train()
    train_loss = 0.0
    avg_train_loss = 0.0

    # Training loop
    for batch_idx, batch in enumerate(train_iterator):
        src = batch.question_context.to(device)
        trg = batch.answer.to(device)
        
#         print('src:', src.shape)
#         print('trg:', trg.shape)        
        
        optimizer.zero_grad()
        
        # print (src.shape)
        # print (trg.shape)
        
        # Pass the source sequences through the encoder
        output = model(src, trg)        #  [trg length, batch size, output dim]
        # print('output:', output.shape)
        
        output_dim = output.shape[-1]
        
        output = output[1:].view(-1, output_dim)
        trg = trg[1:].view(-1)
        
#         print('output 2:', output.shape)
#         print('trg 2:', trg.shape)
        
#         print('output content:', output)
#         print('trg content:', trg)
        
        loss = criterion(output, trg)
        
        loss.backward()
        
        torch.nn.utils.clip_grad_norm_(model.parameters(), clip)
        
        optimizer.step()
        
        train_loss += loss.item()
        
    average_train_loss = train_loss / len(train_iterator)
    
    end_time = time.time()    
    train_elapsed_time = end_time - start_time
    
    # Evaluation
    start_time = time.time()
    model.eval()
    with torch.no_grad():
        valid_loss = 0.0

        for batch_idx, batch in enumerate(valid_iterator):
            src = batch.question_context.to(device)
            trg = batch.answer.to(device)                    
            
            output = model(src, trg, 0) #turn off teacher forcing

            #trg = [trg len, batch size]
            #output = [trg len, batch size, output dim]

            output_dim = output.shape[-1]

            output = output[1:].view(-1, output_dim)
            trg = trg[1:].view(-1)

            #trg = [(trg len - 1) * batch size]
            #output = [(trg len - 1) * batch size, output dim]

            loss = criterion(output, trg)
            
            valid_loss += loss.item()
            
        average_val_loss = valid_loss / len(valid_iterator)
                    
    end_time = time.time()    
    val_elapsed_time = end_time - start_time
    
    # print(f'Eval Time: {elapsed_time}')
    # print(f'Epoch: {epoch+1:02} | Eval Time: {val_elapsed_time}s')
    scheduler.step(average_val_loss)    
    
    torch.save({
                'epoch': epoch,
                'model_state_dict': model.state_dict(),
                'optimizer_state_dict': optimizer.state_dict(),
                'loss': train_loss,
            }, f'checkpoints/train-{epoch}.pt')
    
    if valid_loss_min is None or (
            (valid_loss_min - average_val_loss) / valid_loss_min > 0.01   # valid_loss_min - 0.01 * valid_loss_min    > average_val_loss
    ):
        # print(f"New minimum validation loss: {average_val_loss:.6f}. Saving model ...")
        tee_print.print(f"New minimum validation loss: {average_val_loss:.6f}. Saving model ...")
        
        torch.save({
                'epoch': epoch,
                'model_state_dict': model.state_dict(),
                'optimizer_state_dict': optimizer.state_dict(),
                'loss': train_loss,
            }, 'checkpoints/best_val_loss.pt')

        valid_loss_min = average_val_loss
        
    tee_print.print(f"Epoch: {epoch:02}, Train Loss: {average_train_loss:.3f}, Val Loss: {average_val_loss:.3f} Train Time: {train_elapsed_time}s Eval Time: {val_elapsed_time}s")



New minimum validation loss: 7.164206. Saving model ...
Epoch: 01, Train Loss: 7.660, Val Loss: 7.164 Train Time: 188.57793474197388s Eval Time: 20.22123908996582s
New minimum validation loss: 6.980491. Saving model ...
Epoch: 02, Train Loss: 7.049, Val Loss: 6.980 Train Time: 187.96233129501343s Eval Time: 20.14483880996704s
New minimum validation loss: 6.890448. Saving model ...
Epoch: 03, Train Loss: 6.922, Val Loss: 6.890 Train Time: 189.05081629753113s Eval Time: 20.188480854034424s
Epoch: 04, Train Loss: 6.864, Val Loss: 6.863 Train Time: 189.4222068786621s Eval Time: 20.15251111984253s
Epoch: 05, Train Loss: 6.825, Val Loss: 6.844 Train Time: 189.15589547157288s Eval Time: 20.124694347381592s
Epoch: 06, Train Loss: 6.800, Val Loss: 6.846 Train Time: 189.11357736587524s Eval Time: 20.227267742156982s
New minimum validation loss: 6.820378. Saving model ...
Epoch: 07, Train Loss: 6.783, Val Loss: 6.820 Train Time: 188.4856460094452s Eval Time: 20.12617588043213s
Epoch: 08, Train Lo

KeyboardInterrupt: 

In [None]:
# New minimum validation loss: 9.777593. Saving model ...
# Epoch: 01, Train Loss: 10.325, Val Loss: 9.778 Train Time: 138.35026669502258s Eval Time: 15.636022806167603s
# New minimum validation loss: 8.429811. Saving model ...
# Epoch: 02, Train Loss: 8.829, Val Loss: 8.430 Train Time: 137.56066799163818s Eval Time: 15.643161058425903s
# New minimum validation loss: 8.051929. Saving model ...
# Epoch: 03, Train Loss: 8.262, Val Loss: 8.052 Train Time: 137.84890174865723s Eval Time: 15.51107907295227s
# New minimum validation loss: 7.829260. Saving model ...
# Epoch: 04, Train Loss: 7.918, Val Loss: 7.829 Train Time: 138.03220987319946s Eval Time: 15.569308996200562s
# New minimum validation loss: 7.725741. Saving model ...
# Epoch: 05, Train Loss: 7.784, Val Loss: 7.726 Train Time: 138.2921051979065s Eval Time: 15.532844543457031s
# New minimum validation loss: 7.636809. Saving model ...
# Epoch: 06, Train Loss: 7.686, Val Loss: 7.637 Train Time: 137.98424077033997s Eval Time: 15.647210597991943s
# Epoch: 07, Train Loss: 7.605, Val Loss: 7.561 Train Time: 138.06058549880981s Eval Time: 15.66235065460205s
# New minimum validation loss: 7.492020. Saving model ...
# Epoch: 08, Train Loss: 7.532, Val Loss: 7.492 Train Time: 137.3553535938263s Eval Time: 15.439005374908447s
# Epoch: 09, Train Loss: 7.467, Val Loss: 7.433 Train Time: 139.83897972106934s Eval Time: 15.809000015258789s
# New minimum validation loss: 7.379917. Saving model ...
# Epoch: 10, Train Loss: 7.412, Val Loss: 7.380 Train Time: 137.8189413547516s Eval Time: 15.560516834259033s
# Epoch: 11, Train Loss: 7.360, Val Loss: 7.332 Train Time: 136.65642929077148s Eval Time: 15.613484621047974s
# New minimum validation loss: 7.288950. Saving model ...
# Epoch: 12, Train Loss: 7.313, Val Loss: 7.289 Train Time: 137.94713592529297s Eval Time: 15.612228393554688s


## HyperParameter Optimization

In [12]:
!pip install  sqlalchemy==1.4.8  optuna | grep -v "already satisfied"

[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.1.2[0m[39;49m -> [0m[32;49m23.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [None]:
from sqlalchemy import orm
import optuna
import time

encoder_input_size = len(question_context_field.vocab) 
decoder_output_size = len(answer_field.vocab)
pad_idx = answer_field.vocab.stoi[answer_field.pad_token] 

def objective(trial):
    torch.cuda.empty_cache()
    
    encoder_dropout = trial.suggest_float('encoder_dropout', 0.1, 0.7)
    decoder_dropout = trial.suggest_float('decoder_dropout', 0.1, 0.7)
    embedding_size = trial.suggest_int('embedding_size', 200, 512)
    hidden_size = trial.suggest_int('hidden_size', 64, 1024)
    num_layers = trial.suggest_int('num_layers', 1, 3)
    momentum = trial.suggest_uniform('momentum', 0.1, 0.9)
        
    learning_rate = trial.suggest_float('learning_rate', 1e-5, 1e-1, log=True)
    weight_decay = trial.suggest_loguniform('weight_decay', 1e-6, 1e-2)
    # opt = trial.suggest_categorical('opt', ['sgd', 'adam'])
    # momentum = trial.suggest_uniform('momentum', 0.1, 0.9)
    
    batch_size = trial.suggest_int('batch_size', 32, 1024)
    
    print(f'Starting trial {trial.number}:\tHyperparameters={trial.params}') 
    
    train_iterator, valid_iterator = BucketIterator.splits(
        (train_data, valid_data),
       batch_size=batch_size,
       sort_within_batch=True,
        sort_key = lambda x: len(x.question_context),
        device=device)
    
    encoder = Encoder(encoder_input_size, hidden_size, embedding_size, num_layers, encoder_dropout)
    decoder = Decoder(hidden_size, decoder_output_size, embedding_size, num_layers, decoder_dropout)
    model = Seq2Seq(encoder, decoder).to(device)    
    
    # Init model weights
    for name, param in model.named_parameters():
        nn.init.uniform_(param.data, -0.08, 0.08)
    
    criterion = nn.CrossEntropyLoss(ignore_index = pad_idx) # https://pytorch.org/docs/2.0/generated/torch.nn.CrossEntropyLoss.html?highlight=crossentropyloss#torch.nn.CrossEntropyLoss
    # optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
    optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate, momentum=momentum, weight_decay=weight_decay)
    scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='min', factor=0.1, patience=5, verbose=True, threshold=0.01) # YOUR CODE HERE
    
    num_epochs = 5
    clip = 5
    valid_loss_min = None

    for epoch in range(num_epochs):
        start_time = time.time()
        model.train()
        train_loss = 0.0
        avg_train_loss = 0.0

        # Training loop
        for batch_idx, batch in enumerate(train_iterator):
            src = batch.question_context.to(device)
            trg = batch.answer.to(device)

    #         print('src:', src.shape)
    #         print('trg:', trg.shape)        

            optimizer.zero_grad()


            # print (src.shape)
            # print (trg.shape)

            # Pass the source sequences through the encoder
            output = model(src, trg)        #  [trg length, batch size, output dim]
            # print('output:', output.shape)

            output_dim = output.shape[-1]

            output = output[1:].view(-1, output_dim)
            trg = trg[1:].view(-1)

    #         print('output 2:', output.shape)
    #         print('trg 2:', trg.shape)

    #         print('output content:', output)
    #         print('trg content:', trg)

            loss = criterion(output, trg)

            loss.backward()

            torch.nn.utils.clip_grad_norm_(model.parameters(), clip)

            optimizer.step()

            # print(train_loss)

            train_loss += loss.item()


        average_train_loss = train_loss / len(train_iterator)

        end_time = time.time()    
        train_elapsed_time = end_time - start_time

        # print(f'Epoch: {epoch+1:02} | Train Time: {train_elapsed_time}s')

        # Evaluation on the test dataset
        start_time = time.time()
        model.eval()
        with torch.no_grad():
            valid_loss = 0.0

            for batch_idx, batch in enumerate(valid_iterator):
                src = batch.question_context.to(device)
                trg = batch.answer.to(device)                    

                output = model(src, trg, 0) #turn off teacher forcing

                #trg = [trg len, batch size]
                #output = [trg len, batch size, output dim]

                output_dim = output.shape[-1]

                output = output[1:].view(-1, output_dim)
                trg = trg[1:].view(-1)

                #trg = [(trg len - 1) * batch size]
                #output = [(trg len - 1) * batch size, output dim]

                loss = criterion(output, trg)

                valid_loss += loss.item()

            average_val_loss = valid_loss / len(valid_iterator)

        end_time = time.time()    
        val_elapsed_time = end_time - start_time

        # print(f'Eval Time: {elapsed_time}')
        # print(f'Epoch: {epoch+1:02} | Eval Time: {val_elapsed_time}s')

        scheduler.step(average_val_loss)
        
        if valid_loss_min is None or (
                (valid_loss_min - average_val_loss) / valid_loss_min > 0.01
        ):
            print(f"New minimum validation loss: {average_val_loss:.6f}. Saving model ...")

            torch.save({
                    'epoch': epoch,
                    'model_state_dict': model.state_dict(),
                    'optimizer_state_dict': optimizer.state_dict(),
                    'loss': train_loss,
                }, f'checkpoints/hpo_{trial.number}_model.pt",')

            valid_loss_min = average_val_loss

        print(f"Epoch: {epoch+1:02}, Train Loss: {average_train_loss:.3f}, Val Loss: {average_val_loss:.3f} Train Time: {train_elapsed_time}s Eval Time: {val_elapsed_time}s")    
        
    return valid_loss_min
    
# Define study and optimize hyperparameters
study = optuna.create_study(direction='minimize')

study.optimize(objective, n_trials=100, catch=(torch.cuda.OutOfMemoryError,))

[I 2023-11-18 23:08:48,921] A new study created in memory with name: no-name-1f73075e-d6d6-42e0-bb28-89e1c8bbfc17
  momentum = trial.suggest_uniform('momentum', 0.1, 0.9)
  weight_decay = trial.suggest_loguniform('weight_decay', 1e-6, 1e-2)


Starting trial 0:	Hyperparameters={'encoder_dropout': 0.10507486873471816, 'decoder_dropout': 0.40833098697311565, 'embedding_size': 240, 'hidden_size': 932, 'num_layers': 2, 'momentum': 0.41814445054158134, 'learning_rate': 0.00011683107152760196, 'weight_decay': 0.0037540186215223427, 'batch_size': 71}
New minimum validation loss: 10.646195. Saving model ...
Epoch: 01, Train Loss: 10.651, Val Loss: 10.646 Train Time: 237.41277265548706s Eval Time: 22.522602796554565s
Epoch: 02, Train Loss: 10.635, Val Loss: 10.624 Train Time: 237.06071305274963s Eval Time: 22.41866135597229s
Epoch: 03, Train Loss: 10.615, Val Loss: 10.600 Train Time: 235.52900218963623s Eval Time: 22.411000728607178s
Epoch: 04, Train Loss: 10.595, Val Loss: 10.575 Train Time: 235.67363333702087s Eval Time: 22.463497400283813s


[I 2023-11-18 23:30:33,565] Trial 0 finished with value: 10.646194859554893 and parameters: {'encoder_dropout': 0.10507486873471816, 'decoder_dropout': 0.40833098697311565, 'embedding_size': 240, 'hidden_size': 932, 'num_layers': 2, 'momentum': 0.41814445054158134, 'learning_rate': 0.00011683107152760196, 'weight_decay': 0.0037540186215223427, 'batch_size': 71}. Best is trial 0 with value: 10.646194859554893.


Epoch: 05, Train Loss: 10.574, Val Loss: 10.549 Train Time: 237.03408646583557s Eval Time: 22.42497444152832s
Starting trial 1:	Hyperparameters={'encoder_dropout': 0.665348632640786, 'decoder_dropout': 0.687053155548927, 'embedding_size': 444, 'hidden_size': 782, 'num_layers': 2, 'momentum': 0.6949344175628585, 'learning_rate': 0.00899846356647276, 'weight_decay': 0.006562449189699982, 'batch_size': 790}


[W 2023-11-18 23:30:35,325] Trial 1 failed with parameters: {'encoder_dropout': 0.665348632640786, 'decoder_dropout': 0.687053155548927, 'embedding_size': 444, 'hidden_size': 782, 'num_layers': 2, 'momentum': 0.6949344175628585, 'learning_rate': 0.00899846356647276, 'weight_decay': 0.006562449189699982, 'batch_size': 790} because of the following error: OutOfMemoryError('CUDA out of memory. Tried to allocate 3.90 GiB (GPU 0; 22.20 GiB total capacity; 18.99 GiB already allocated; 141.12 MiB free; 20.83 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF').
Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/optuna/study/_optimize.py", line 200, in _run_trial
    value_or_values = func(trial)
  File "/tmp/ipykernel_537/4151047639.py", line 83, in objective
    loss = criterion(output, trg)
  File "/opt/conda/lib/

Starting trial 2:	Hyperparameters={'encoder_dropout': 0.12269443211360304, 'decoder_dropout': 0.27968200261748355, 'embedding_size': 415, 'hidden_size': 194, 'num_layers': 1, 'momentum': 0.6686117786661031, 'learning_rate': 0.009988780741110507, 'weight_decay': 0.0011716847959572971, 'batch_size': 990}


[W 2023-11-18 23:30:37,368] Trial 2 failed with parameters: {'encoder_dropout': 0.12269443211360304, 'decoder_dropout': 0.27968200261748355, 'embedding_size': 415, 'hidden_size': 194, 'num_layers': 1, 'momentum': 0.6686117786661031, 'learning_rate': 0.009988780741110507, 'weight_decay': 0.0011716847959572971, 'batch_size': 990} because of the following error: OutOfMemoryError('CUDA out of memory. Tried to allocate 5.37 GiB (GPU 0; 22.20 GiB total capacity; 17.00 GiB already allocated; 1.20 GiB free; 19.77 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF').
Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/optuna/study/_optimize.py", line 200, in _run_trial
    value_or_values = func(trial)
  File "/tmp/ipykernel_537/4151047639.py", line 83, in objective
    loss = criterion(output, trg)
  File "/opt/conda/

Starting trial 3:	Hyperparameters={'encoder_dropout': 0.31464974385354144, 'decoder_dropout': 0.47639274120392805, 'embedding_size': 417, 'hidden_size': 617, 'num_layers': 2, 'momentum': 0.34083918942049596, 'learning_rate': 0.007282064530770312, 'weight_decay': 5.698672313824346e-06, 'batch_size': 672}


[W 2023-11-18 23:30:39,011] Trial 3 failed with parameters: {'encoder_dropout': 0.31464974385354144, 'decoder_dropout': 0.47639274120392805, 'embedding_size': 417, 'hidden_size': 617, 'num_layers': 2, 'momentum': 0.34083918942049596, 'learning_rate': 0.007282064530770312, 'weight_decay': 5.698672313824346e-06, 'batch_size': 672} because of the following error: OutOfMemoryError('CUDA out of memory. Tried to allocate 3.09 GiB (GPU 0; 22.20 GiB total capacity; 15.39 GiB already allocated; 2.48 GiB free; 18.49 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF').
Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/optuna/study/_optimize.py", line 200, in _run_trial
    value_or_values = func(trial)
  File "/tmp/ipykernel_537/4151047639.py", line 69, in objective
    output = model(src, trg)        #  [trg length, 

Starting trial 4:	Hyperparameters={'encoder_dropout': 0.5005741809297067, 'decoder_dropout': 0.3813415180682884, 'embedding_size': 341, 'hidden_size': 243, 'num_layers': 2, 'momentum': 0.772342884257848, 'learning_rate': 0.00015577481472992278, 'weight_decay': 0.0013004440297274059, 'batch_size': 112}
New minimum validation loss: 10.611036. Saving model ...
Epoch: 01, Train Loss: 10.641, Val Loss: 10.611 Train Time: 135.4051811695099s Eval Time: 15.971313238143921s
Epoch: 02, Train Loss: 10.600, Val Loss: 10.562 Train Time: 134.778546333313s Eval Time: 15.887039184570312s
Epoch: 03, Train Loss: 10.556, Val Loss: 10.507 Train Time: 134.14688801765442s Eval Time: 15.866497993469238s
New minimum validation loss: 10.446149. Saving model ...
Epoch: 04, Train Loss: 10.511, Val Loss: 10.446 Train Time: 134.47113823890686s Eval Time: 15.913512706756592s


[I 2023-11-18 23:43:24,781] Trial 4 finished with value: 10.446149346175467 and parameters: {'encoder_dropout': 0.5005741809297067, 'decoder_dropout': 0.3813415180682884, 'embedding_size': 341, 'hidden_size': 243, 'num_layers': 2, 'momentum': 0.772342884257848, 'learning_rate': 0.00015577481472992278, 'weight_decay': 0.0013004440297274059, 'batch_size': 112}. Best is trial 4 with value: 10.446149346175467.


Epoch: 05, Train Loss: 10.459, Val Loss: 10.376 Train Time: 135.02861952781677s Eval Time: 15.879432916641235s
Starting trial 5:	Hyperparameters={'encoder_dropout': 0.2733795509951824, 'decoder_dropout': 0.343888067266337, 'embedding_size': 456, 'hidden_size': 508, 'num_layers': 2, 'momentum': 0.256622034391055, 'learning_rate': 0.003168780250203357, 'weight_decay': 0.0008225111358391471, 'batch_size': 310}


[W 2023-11-18 23:43:27,447] Trial 5 failed with parameters: {'encoder_dropout': 0.2733795509951824, 'decoder_dropout': 0.343888067266337, 'embedding_size': 456, 'hidden_size': 508, 'num_layers': 2, 'momentum': 0.256622034391055, 'learning_rate': 0.003168780250203357, 'weight_decay': 0.0008225111358391471, 'batch_size': 310} because of the following error: OutOfMemoryError('CUDA out of memory. Tried to allocate 1.68 GiB (GPU 0; 22.20 GiB total capacity; 17.38 GiB already allocated; 1.31 GiB free; 19.66 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF').
Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/optuna/study/_optimize.py", line 200, in _run_trial
    value_or_values = func(trial)
  File "/tmp/ipykernel_537/4151047639.py", line 85, in objective
    loss.backward()
  File "/opt/conda/lib/python3.10/sit

Starting trial 6:	Hyperparameters={'encoder_dropout': 0.2853476218666638, 'decoder_dropout': 0.5584316370001761, 'embedding_size': 507, 'hidden_size': 786, 'num_layers': 2, 'momentum': 0.1355060402391354, 'learning_rate': 0.07311490005585702, 'weight_decay': 2.6235359269192113e-05, 'batch_size': 724}


[W 2023-11-18 23:43:29,319] Trial 6 failed with parameters: {'encoder_dropout': 0.2853476218666638, 'decoder_dropout': 0.5584316370001761, 'embedding_size': 507, 'hidden_size': 786, 'num_layers': 2, 'momentum': 0.1355060402391354, 'learning_rate': 0.07311490005585702, 'weight_decay': 2.6235359269192113e-05, 'batch_size': 724} because of the following error: OutOfMemoryError('CUDA out of memory. Tried to allocate 3.58 GiB (GPU 0; 22.20 GiB total capacity; 18.17 GiB already allocated; 887.12 MiB free; 20.10 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF').
Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/optuna/study/_optimize.py", line 200, in _run_trial
    value_or_values = func(trial)
  File "/tmp/ipykernel_537/4151047639.py", line 83, in objective
    loss = criterion(output, trg)
  File "/opt/conda/

Starting trial 7:	Hyperparameters={'encoder_dropout': 0.5155779276085126, 'decoder_dropout': 0.4020495189941875, 'embedding_size': 512, 'hidden_size': 113, 'num_layers': 1, 'momentum': 0.7829295396101653, 'learning_rate': 0.00012648102378313243, 'weight_decay': 1.076582401579101e-06, 'batch_size': 666}


[W 2023-11-18 23:43:30,701] Trial 7 failed with parameters: {'encoder_dropout': 0.5155779276085126, 'decoder_dropout': 0.4020495189941875, 'embedding_size': 512, 'hidden_size': 113, 'num_layers': 1, 'momentum': 0.7829295396101653, 'learning_rate': 0.00012648102378313243, 'weight_decay': 1.076582401579101e-06, 'batch_size': 666} because of the following error: OutOfMemoryError('CUDA out of memory. Tried to allocate 3.29 GiB (GPU 0; 22.20 GiB total capacity; 16.37 GiB already allocated; 863.12 MiB free; 20.13 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF').
Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/optuna/study/_optimize.py", line 200, in _run_trial
    value_or_values = func(trial)
  File "/tmp/ipykernel_537/4151047639.py", line 85, in objective
    loss.backward()
  File "/opt/conda/lib/python3.

Starting trial 8:	Hyperparameters={'encoder_dropout': 0.5727601038025544, 'decoder_dropout': 0.47151524367295394, 'embedding_size': 502, 'hidden_size': 440, 'num_layers': 1, 'momentum': 0.8657430956780438, 'learning_rate': 0.010950242624732218, 'weight_decay': 0.0002677884860718867, 'batch_size': 613}


[W 2023-11-18 23:43:32,160] Trial 8 failed with parameters: {'encoder_dropout': 0.5727601038025544, 'decoder_dropout': 0.47151524367295394, 'embedding_size': 502, 'hidden_size': 440, 'num_layers': 1, 'momentum': 0.8657430956780438, 'learning_rate': 0.010950242624732218, 'weight_decay': 0.0002677884860718867, 'batch_size': 613} because of the following error: OutOfMemoryError('CUDA out of memory. Tried to allocate 2.84 GiB (GPU 0; 22.20 GiB total capacity; 16.49 GiB already allocated; 889.12 MiB free; 20.10 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF').
Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/optuna/study/_optimize.py", line 200, in _run_trial
    value_or_values = func(trial)
  File "/tmp/ipykernel_537/4151047639.py", line 85, in objective
    loss.backward()
  File "/opt/conda/lib/python3.1

Starting trial 9:	Hyperparameters={'encoder_dropout': 0.3527430631179923, 'decoder_dropout': 0.3791869782445797, 'embedding_size': 331, 'hidden_size': 375, 'num_layers': 1, 'momentum': 0.6392604549916208, 'learning_rate': 0.09714692918736852, 'weight_decay': 0.0002236830893184973, 'batch_size': 508}


[W 2023-11-18 23:43:33,300] Trial 9 failed with parameters: {'encoder_dropout': 0.3527430631179923, 'decoder_dropout': 0.3791869782445797, 'embedding_size': 331, 'hidden_size': 375, 'num_layers': 1, 'momentum': 0.6392604549916208, 'learning_rate': 0.09714692918736852, 'weight_decay': 0.0002236830893184973, 'batch_size': 508} because of the following error: OutOfMemoryError('CUDA out of memory. Tried to allocate 2.35 GiB (GPU 0; 22.20 GiB total capacity; 14.99 GiB already allocated; 787.12 MiB free; 20.20 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF').
Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/optuna/study/_optimize.py", line 200, in _run_trial
    value_or_values = func(trial)
  File "/tmp/ipykernel_537/4151047639.py", line 85, in objective
    loss.backward()
  File "/opt/conda/lib/python3.10/

Starting trial 10:	Hyperparameters={'encoder_dropout': 0.20699750452565896, 'decoder_dropout': 0.26934464593124813, 'embedding_size': 413, 'hidden_size': 895, 'num_layers': 3, 'momentum': 0.6443418821750319, 'learning_rate': 0.011054897577873276, 'weight_decay': 0.0005083733165769736, 'batch_size': 1004}


[W 2023-11-18 23:43:35,595] Trial 10 failed with parameters: {'encoder_dropout': 0.20699750452565896, 'decoder_dropout': 0.26934464593124813, 'embedding_size': 413, 'hidden_size': 895, 'num_layers': 3, 'momentum': 0.6443418821750319, 'learning_rate': 0.011054897577873276, 'weight_decay': 0.0005083733165769736, 'batch_size': 1004} because of the following error: OutOfMemoryError('CUDA out of memory. Tried to allocate 2.86 GiB (GPU 0; 22.20 GiB total capacity; 15.56 GiB already allocated; 2.72 GiB free; 18.25 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF').
Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/optuna/study/_optimize.py", line 200, in _run_trial
    value_or_values = func(trial)
  File "/tmp/ipykernel_537/4151047639.py", line 69, in objective
    output = model(src, trg)        #  [trg length,

Starting trial 11:	Hyperparameters={'encoder_dropout': 0.34322977143308675, 'decoder_dropout': 0.39966871653854585, 'embedding_size': 371, 'hidden_size': 618, 'num_layers': 2, 'momentum': 0.377967845819174, 'learning_rate': 2.2494431094821257e-05, 'weight_decay': 0.0045260028142242335, 'batch_size': 705}


[W 2023-11-18 23:43:37,121] Trial 11 failed with parameters: {'encoder_dropout': 0.34322977143308675, 'decoder_dropout': 0.39966871653854585, 'embedding_size': 371, 'hidden_size': 618, 'num_layers': 2, 'momentum': 0.377967845819174, 'learning_rate': 2.2494431094821257e-05, 'weight_decay': 0.0045260028142242335, 'batch_size': 705} because of the following error: OutOfMemoryError('CUDA out of memory. Tried to allocate 5.23 GiB (GPU 0; 22.20 GiB total capacity; 16.17 GiB already allocated; 2.72 GiB free; 18.25 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF').
Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/optuna/study/_optimize.py", line 200, in _run_trial
    value_or_values = func(trial)
  File "/tmp/ipykernel_537/4151047639.py", line 69, in objective
    output = model(src, trg)        #  [trg length,

Starting trial 12:	Hyperparameters={'encoder_dropout': 0.6603852326059189, 'decoder_dropout': 0.16986738272001356, 'embedding_size': 368, 'hidden_size': 697, 'num_layers': 1, 'momentum': 0.6667244061538032, 'learning_rate': 0.000381074516984359, 'weight_decay': 0.0012605640214337886, 'batch_size': 809}


[W 2023-11-18 23:43:38,670] Trial 12 failed with parameters: {'encoder_dropout': 0.6603852326059189, 'decoder_dropout': 0.16986738272001356, 'embedding_size': 368, 'hidden_size': 697, 'num_layers': 1, 'momentum': 0.6667244061538032, 'learning_rate': 0.000381074516984359, 'weight_decay': 0.0012605640214337886, 'batch_size': 809} because of the following error: OutOfMemoryError('CUDA out of memory. Tried to allocate 4.00 GiB (GPU 0; 22.20 GiB total capacity; 15.95 GiB already allocated; 1.20 GiB free; 19.77 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF').
Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/optuna/study/_optimize.py", line 200, in _run_trial
    value_or_values = func(trial)
  File "/tmp/ipykernel_537/4151047639.py", line 83, in objective
    loss = criterion(output, trg)
  File "/opt/conda/

Starting trial 13:	Hyperparameters={'encoder_dropout': 0.22955902660914207, 'decoder_dropout': 0.4911158328281042, 'embedding_size': 491, 'hidden_size': 975, 'num_layers': 1, 'momentum': 0.6972683887927308, 'learning_rate': 0.0006676334684955292, 'weight_decay': 0.0033224254292016676, 'batch_size': 124}




New minimum validation loss: 9.367463. Saving model ...
Epoch: 01, Train Loss: 10.423, Val Loss: 9.367 Train Time: 188.44334268569946s Eval Time: 19.47628903388977s
New minimum validation loss: 8.874996. Saving model ...
Epoch: 02, Train Loss: 9.313, Val Loss: 8.875 Train Time: 187.65702390670776s Eval Time: 19.408082485198975s
New minimum validation loss: 8.588095. Saving model ...
Epoch: 03, Train Loss: 8.907, Val Loss: 8.588 Train Time: 189.47026109695435s Eval Time: 19.43692922592163s
New minimum validation loss: 8.196831. Saving model ...
Epoch: 04, Train Loss: 8.562, Val Loss: 8.197 Train Time: 186.49147987365723s Eval Time: 19.44693946838379s
New minimum validation loss: 8.002150. Saving model ...


[I 2023-11-19 00:01:53,886] Trial 13 finished with value: 8.002150303880933 and parameters: {'encoder_dropout': 0.22955902660914207, 'decoder_dropout': 0.4911158328281042, 'embedding_size': 491, 'hidden_size': 975, 'num_layers': 1, 'momentum': 0.6972683887927308, 'learning_rate': 0.0006676334684955292, 'weight_decay': 0.0033224254292016676, 'batch_size': 124}. Best is trial 13 with value: 8.002150303880933.


Epoch: 05, Train Loss: 8.267, Val Loss: 8.002 Train Time: 188.1686635017395s Eval Time: 19.417216300964355s
Starting trial 14:	Hyperparameters={'encoder_dropout': 0.1373715296914047, 'decoder_dropout': 0.26047182943288816, 'embedding_size': 395, 'hidden_size': 189, 'num_layers': 3, 'momentum': 0.8867896504624373, 'learning_rate': 2.7766072135128198e-05, 'weight_decay': 0.007428208680892792, 'batch_size': 306}


[W 2023-11-19 00:02:43,068] Trial 14 failed with parameters: {'encoder_dropout': 0.1373715296914047, 'decoder_dropout': 0.26047182943288816, 'embedding_size': 395, 'hidden_size': 189, 'num_layers': 3, 'momentum': 0.8867896504624373, 'learning_rate': 2.7766072135128198e-05, 'weight_decay': 0.007428208680892792, 'batch_size': 306} because of the following error: OutOfMemoryError('CUDA out of memory. Tried to allocate 1.71 GiB (GPU 0; 22.20 GiB total capacity; 15.38 GiB already allocated; 765.12 MiB free; 20.22 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF').
Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/optuna/study/_optimize.py", line 200, in _run_trial
    value_or_values = func(trial)
  File "/tmp/ipykernel_537/4151047639.py", line 85, in objective
    loss.backward()
  File "/opt/conda/lib/python3

Starting trial 15:	Hyperparameters={'encoder_dropout': 0.2885671633365916, 'decoder_dropout': 0.5937232621311533, 'embedding_size': 362, 'hidden_size': 787, 'num_layers': 3, 'momentum': 0.7163704242654813, 'learning_rate': 0.09004233783138954, 'weight_decay': 0.00032880978500196144, 'batch_size': 348}


[W 2023-11-19 00:02:44,500] Trial 15 failed with parameters: {'encoder_dropout': 0.2885671633365916, 'decoder_dropout': 0.5937232621311533, 'embedding_size': 362, 'hidden_size': 787, 'num_layers': 3, 'momentum': 0.7163704242654813, 'learning_rate': 0.09004233783138954, 'weight_decay': 0.00032880978500196144, 'batch_size': 348} because of the following error: OutOfMemoryError('CUDA out of memory. Tried to allocate 1.61 GiB (GPU 0; 22.20 GiB total capacity; 18.81 GiB already allocated; 263.12 MiB free; 20.71 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF').
Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/optuna/study/_optimize.py", line 200, in _run_trial
    value_or_values = func(trial)
  File "/tmp/ipykernel_537/4151047639.py", line 85, in objective
    loss.backward()
  File "/opt/conda/lib/python3.1

Starting trial 16:	Hyperparameters={'encoder_dropout': 0.5292350822860933, 'decoder_dropout': 0.6123409550559717, 'embedding_size': 220, 'hidden_size': 809, 'num_layers': 1, 'momentum': 0.7342863109426501, 'learning_rate': 0.0055993120385844, 'weight_decay': 1.0839056244755194e-06, 'batch_size': 721}


[W 2023-11-19 00:02:46,240] Trial 16 failed with parameters: {'encoder_dropout': 0.5292350822860933, 'decoder_dropout': 0.6123409550559717, 'embedding_size': 220, 'hidden_size': 809, 'num_layers': 1, 'momentum': 0.7342863109426501, 'learning_rate': 0.0055993120385844, 'weight_decay': 1.0839056244755194e-06, 'batch_size': 721} because of the following error: OutOfMemoryError('CUDA out of memory. Tried to allocate 3.68 GiB (GPU 0; 22.20 GiB total capacity; 15.77 GiB already allocated; 695.12 MiB free; 20.29 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF').
Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/optuna/study/_optimize.py", line 200, in _run_trial
    value_or_values = func(trial)
  File "/tmp/ipykernel_537/4151047639.py", line 83, in objective
    loss = criterion(output, trg)
  File "/opt/conda/

Starting trial 17:	Hyperparameters={'encoder_dropout': 0.6702518122746408, 'decoder_dropout': 0.6019692277316478, 'embedding_size': 417, 'hidden_size': 158, 'num_layers': 2, 'momentum': 0.5093128260259621, 'learning_rate': 0.0027215101078588897, 'weight_decay': 9.805769151578178e-06, 'batch_size': 893}


[W 2023-11-19 00:02:48,473] Trial 17 failed with parameters: {'encoder_dropout': 0.6702518122746408, 'decoder_dropout': 0.6019692277316478, 'embedding_size': 417, 'hidden_size': 158, 'num_layers': 2, 'momentum': 0.5093128260259621, 'learning_rate': 0.0027215101078588897, 'weight_decay': 9.805769151578178e-06, 'batch_size': 893} because of the following error: OutOfMemoryError('CUDA out of memory. Tried to allocate 3.85 GiB (GPU 0; 22.20 GiB total capacity; 12.77 GiB already allocated; 165.12 MiB free; 20.81 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF').
Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/optuna/study/_optimize.py", line 200, in _run_trial
    value_or_values = func(trial)
  File "/tmp/ipykernel_537/4151047639.py", line 85, in objective
    loss.backward()
  File "/opt/conda/lib/python3.

Starting trial 18:	Hyperparameters={'encoder_dropout': 0.14014043002807133, 'decoder_dropout': 0.1674220731274399, 'embedding_size': 254, 'hidden_size': 744, 'num_layers': 1, 'momentum': 0.559313503017641, 'learning_rate': 0.037775260202830924, 'weight_decay': 0.0006143937477265973, 'batch_size': 850}


[W 2023-11-19 00:02:50,068] Trial 18 failed with parameters: {'encoder_dropout': 0.14014043002807133, 'decoder_dropout': 0.1674220731274399, 'embedding_size': 254, 'hidden_size': 744, 'num_layers': 1, 'momentum': 0.559313503017641, 'learning_rate': 0.037775260202830924, 'weight_decay': 0.0006143937477265973, 'batch_size': 850} because of the following error: OutOfMemoryError('CUDA out of memory. Tried to allocate 3.79 GiB (GPU 0; 22.20 GiB total capacity; 13.53 GiB already allocated; 161.12 MiB free; 20.81 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF').
Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/optuna/study/_optimize.py", line 200, in _run_trial
    value_or_values = func(trial)
  File "/tmp/ipykernel_537/4151047639.py", line 85, in objective
    loss.backward()
  File "/opt/conda/lib/python3.1

Starting trial 19:	Hyperparameters={'encoder_dropout': 0.18872017707563837, 'decoder_dropout': 0.26471499196465104, 'embedding_size': 345, 'hidden_size': 809, 'num_layers': 1, 'momentum': 0.8770181216883318, 'learning_rate': 0.0006037527750744888, 'weight_decay': 1.0067442886238327e-06, 'batch_size': 62}




New minimum validation loss: 8.083847. Saving model ...
Epoch: 01, Train Loss: 9.239, Val Loss: 8.084 Train Time: 202.3244276046753s Eval Time: 21.45066547393799s
New minimum validation loss: 7.732253. Saving model ...
Epoch: 02, Train Loss: 7.929, Val Loss: 7.732 Train Time: 202.05861020088196s Eval Time: 21.439218282699585s
New minimum validation loss: 7.562543. Saving model ...
Epoch: 03, Train Loss: 7.652, Val Loss: 7.563 Train Time: 203.92939352989197s Eval Time: 21.465781211853027s
New minimum validation loss: 7.436394. Saving model ...
Epoch: 04, Train Loss: 7.498, Val Loss: 7.436 Train Time: 201.30253672599792s Eval Time: 21.455437898635864s
New minimum validation loss: 7.340739. Saving model ...


[I 2023-11-19 00:22:13,189] Trial 19 finished with value: 7.340739211429556 and parameters: {'encoder_dropout': 0.18872017707563837, 'decoder_dropout': 0.26471499196465104, 'embedding_size': 345, 'hidden_size': 809, 'num_layers': 1, 'momentum': 0.8770181216883318, 'learning_rate': 0.0006037527750744888, 'weight_decay': 1.0067442886238327e-06, 'batch_size': 62}. Best is trial 19 with value: 7.340739211429556.


Epoch: 05, Train Loss: 7.384, Val Loss: 7.341 Train Time: 203.43801403045654s Eval Time: 21.4598548412323s
Starting trial 20:	Hyperparameters={'encoder_dropout': 0.6184171647212484, 'decoder_dropout': 0.518001303217815, 'embedding_size': 298, 'hidden_size': 323, 'num_layers': 3, 'momentum': 0.8038799036735065, 'learning_rate': 7.247032415215429e-05, 'weight_decay': 3.5126210576018026e-06, 'batch_size': 617}


[W 2023-11-19 00:22:14,348] Trial 20 failed with parameters: {'encoder_dropout': 0.6184171647212484, 'decoder_dropout': 0.518001303217815, 'embedding_size': 298, 'hidden_size': 323, 'num_layers': 3, 'momentum': 0.8038799036735065, 'learning_rate': 7.247032415215429e-05, 'weight_decay': 3.5126210576018026e-06, 'batch_size': 617} because of the following error: OutOfMemoryError('CUDA out of memory. Tried to allocate 2.56 GiB (GPU 0; 22.20 GiB total capacity; 13.47 GiB already allocated; 201.12 MiB free; 20.77 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF').
Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/optuna/study/_optimize.py", line 200, in _run_trial
    value_or_values = func(trial)
  File "/tmp/ipykernel_537/4151047639.py", line 85, in objective
    loss.backward()
  File "/opt/conda/lib/python3.

Starting trial 21:	Hyperparameters={'encoder_dropout': 0.42191202885049683, 'decoder_dropout': 0.35222147502883105, 'embedding_size': 494, 'hidden_size': 216, 'num_layers': 1, 'momentum': 0.19295739923086677, 'learning_rate': 0.029475365616427017, 'weight_decay': 1.4298691564921317e-05, 'batch_size': 889}


[W 2023-11-19 00:22:16,336] Trial 21 failed with parameters: {'encoder_dropout': 0.42191202885049683, 'decoder_dropout': 0.35222147502883105, 'embedding_size': 494, 'hidden_size': 216, 'num_layers': 1, 'momentum': 0.19295739923086677, 'learning_rate': 0.029475365616427017, 'weight_decay': 1.4298691564921317e-05, 'batch_size': 889} because of the following error: OutOfMemoryError('CUDA out of memory. Tried to allocate 4.96 GiB (GPU 0; 22.20 GiB total capacity; 9.93 GiB already allocated; 2.81 GiB free; 18.16 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF').
Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/optuna/study/_optimize.py", line 200, in _run_trial
    value_or_values = func(trial)
  File "/tmp/ipykernel_537/4151047639.py", line 83, in objective
    loss = criterion(output, trg)
  File "/opt/cond

Starting trial 22:	Hyperparameters={'encoder_dropout': 0.47739729381711005, 'decoder_dropout': 0.33786012859128356, 'embedding_size': 403, 'hidden_size': 267, 'num_layers': 3, 'momentum': 0.3100514768252849, 'learning_rate': 5.539001686622702e-05, 'weight_decay': 0.00046915331394621997, 'batch_size': 922}


[W 2023-11-19 00:22:18,145] Trial 22 failed with parameters: {'encoder_dropout': 0.47739729381711005, 'decoder_dropout': 0.33786012859128356, 'embedding_size': 403, 'hidden_size': 267, 'num_layers': 3, 'momentum': 0.3100514768252849, 'learning_rate': 5.539001686622702e-05, 'weight_decay': 0.00046915331394621997, 'batch_size': 922} because of the following error: OutOfMemoryError('CUDA out of memory. Tried to allocate 5.09 GiB (GPU 0; 22.20 GiB total capacity; 10.91 GiB already allocated; 2.86 GiB free; 18.11 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF').
Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/optuna/study/_optimize.py", line 200, in _run_trial
    value_or_values = func(trial)
  File "/tmp/ipykernel_537/4151047639.py", line 69, in objective
    output = model(src, trg)        #  [trg length

Starting trial 23:	Hyperparameters={'encoder_dropout': 0.5880826173004987, 'decoder_dropout': 0.6052703992184986, 'embedding_size': 466, 'hidden_size': 1003, 'num_layers': 1, 'momentum': 0.29204042239105077, 'learning_rate': 0.00010953862251267148, 'weight_decay': 2.3167073523288925e-06, 'batch_size': 521}


[W 2023-11-19 00:22:22,113] Trial 23 failed with parameters: {'encoder_dropout': 0.5880826173004987, 'decoder_dropout': 0.6052703992184986, 'embedding_size': 466, 'hidden_size': 1003, 'num_layers': 1, 'momentum': 0.29204042239105077, 'learning_rate': 0.00010953862251267148, 'weight_decay': 2.3167073523288925e-06, 'batch_size': 521} because of the following error: OutOfMemoryError('CUDA out of memory. Tried to allocate 2.49 GiB (GPU 0; 22.20 GiB total capacity; 13.72 GiB already allocated; 353.12 MiB free; 20.62 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF').
Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/optuna/study/_optimize.py", line 200, in _run_trial
    value_or_values = func(trial)
  File "/tmp/ipykernel_537/4151047639.py", line 85, in objective
    loss.backward()
  File "/opt/conda/lib/pyth

Starting trial 24:	Hyperparameters={'encoder_dropout': 0.3010150382388159, 'decoder_dropout': 0.5424255469565925, 'embedding_size': 414, 'hidden_size': 786, 'num_layers': 3, 'momentum': 0.10674102239085653, 'learning_rate': 0.022079003980209552, 'weight_decay': 9.497198441200903e-06, 'batch_size': 457}


[W 2023-11-19 00:22:27,495] Trial 24 failed with parameters: {'encoder_dropout': 0.3010150382388159, 'decoder_dropout': 0.5424255469565925, 'embedding_size': 414, 'hidden_size': 786, 'num_layers': 3, 'momentum': 0.10674102239085653, 'learning_rate': 0.022079003980209552, 'weight_decay': 9.497198441200903e-06, 'batch_size': 457} because of the following error: OutOfMemoryError('CUDA out of memory. Tried to allocate 3.17 GiB (GPU 0; 22.20 GiB total capacity; 10.32 GiB already allocated; 2.86 GiB free; 18.11 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF').
Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/optuna/study/_optimize.py", line 200, in _run_trial
    value_or_values = func(trial)
  File "/tmp/ipykernel_537/4151047639.py", line 69, in objective
    output = model(src, trg)        #  [trg length, b

Starting trial 25:	Hyperparameters={'encoder_dropout': 0.17476255006006935, 'decoder_dropout': 0.18291147572145944, 'embedding_size': 474, 'hidden_size': 665, 'num_layers': 2, 'momentum': 0.5750458745402064, 'learning_rate': 2.0038030099816665e-05, 'weight_decay': 4.204537887096255e-05, 'batch_size': 301}


[W 2023-11-19 00:24:13,908] Trial 25 failed with parameters: {'encoder_dropout': 0.17476255006006935, 'decoder_dropout': 0.18291147572145944, 'embedding_size': 474, 'hidden_size': 665, 'num_layers': 2, 'momentum': 0.5750458745402064, 'learning_rate': 2.0038030099816665e-05, 'weight_decay': 4.204537887096255e-05, 'batch_size': 301} because of the following error: OutOfMemoryError('CUDA out of memory. Tried to allocate 2.26 GiB (GPU 0; 22.20 GiB total capacity; 12.98 GiB already allocated; 349.12 MiB free; 20.63 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF').
Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/optuna/study/_optimize.py", line 200, in _run_trial
    value_or_values = func(trial)
  File "/tmp/ipykernel_537/4151047639.py", line 85, in objective
    loss.backward()
  File "/opt/conda/lib/pytho

Starting trial 26:	Hyperparameters={'encoder_dropout': 0.666860866556084, 'decoder_dropout': 0.14310546756612483, 'embedding_size': 403, 'hidden_size': 686, 'num_layers': 1, 'momentum': 0.5304037074535177, 'learning_rate': 0.016803731754976535, 'weight_decay': 0.007561913049800378, 'batch_size': 62}




New minimum validation loss: 7.281041. Saving model ...
Epoch: 01, Train Loss: 7.809, Val Loss: 7.281 Train Time: 190.14013409614563s Eval Time: 19.358969926834106s
New minimum validation loss: 7.045174. Saving model ...
Epoch: 02, Train Loss: 7.126, Val Loss: 7.045 Train Time: 188.74524021148682s Eval Time: 19.301857709884644s
Epoch: 03, Train Loss: 6.984, Val Loss: 6.979 Train Time: 189.23022866249084s Eval Time: 19.335997819900513s
New minimum validation loss: 6.916367. Saving model ...
Epoch: 04, Train Loss: 6.911, Val Loss: 6.916 Train Time: 190.07164096832275s Eval Time: 19.33453106880188s


[I 2023-11-19 00:42:04,122] Trial 26 finished with value: 6.916367074204839 and parameters: {'encoder_dropout': 0.666860866556084, 'decoder_dropout': 0.14310546756612483, 'embedding_size': 403, 'hidden_size': 686, 'num_layers': 1, 'momentum': 0.5304037074535177, 'learning_rate': 0.016803731754976535, 'weight_decay': 0.007561913049800378, 'batch_size': 62}. Best is trial 26 with value: 6.916367074204839.


Epoch: 05, Train Loss: 6.869, Val Loss: 6.885 Train Time: 188.61841201782227s Eval Time: 19.39446473121643s
Starting trial 27:	Hyperparameters={'encoder_dropout': 0.21880156466541706, 'decoder_dropout': 0.36376582554937464, 'embedding_size': 280, 'hidden_size': 960, 'num_layers': 1, 'momentum': 0.36099383777669003, 'learning_rate': 0.00032189008410052556, 'weight_decay': 4.067770746713994e-06, 'batch_size': 242}




New minimum validation loss: 10.654181. Saving model ...
Epoch: 01, Train Loss: 10.660, Val Loss: 10.654 Train Time: 171.24145770072937s Eval Time: 19.288434982299805s


In [19]:
print('Best trial:', study.best_trial)
print('Best hyperparameters:', study.best_params)
print('Validation loss:', study.best_value)

Best trial: FrozenTrial(number=31, state=TrialState.COMPLETE, values=[6.840339520820101], datetime_start=datetime.datetime(2023, 11, 19, 1, 42, 47, 67924), datetime_complete=datetime.datetime(2023, 11, 19, 2, 0, 58, 397423), params={'encoder_dropout': 0.6039111450304973, 'decoder_dropout': 0.33154046338693915, 'embedding_size': 468, 'hidden_size': 938, 'num_layers': 1, 'momentum': 0.5494925017261035, 'learning_rate': 0.04835390488901119, 'weight_decay': 0.00014738274625279338, 'batch_size': 132}, user_attrs={}, system_attrs={}, intermediate_values={}, distributions={'encoder_dropout': FloatDistribution(high=0.7, log=False, low=0.1, step=None), 'decoder_dropout': FloatDistribution(high=0.7, log=False, low=0.1, step=None), 'embedding_size': IntDistribution(high=512, log=False, low=200, step=1), 'hidden_size': IntDistribution(high=1024, log=False, low=64, step=1), 'num_layers': IntDistribution(high=3, log=False, low=1, step=1), 'momentum': FloatDistribution(high=0.9, log=False, low=0.1, s

In [None]:
# [I 2023-11-19 00:01:53,886] Trial 13 finished with value: 8.002150303880933 and parameters: {'encoder_dropout': 0.22955902660914207, 'decoder_dropout': 0.4911158328281042, 'embedding_size': 491, 'hidden_size': 975, 'num_layers': 1, 'momentum': 0.6972683887927308, 'learning_rate': 0.0006676334684955292, 'weight_decay': 0.0033224254292016676, 'batch_size': 124}. Best is trial 13 with value: 8.002150303880933.


# [I 2023-11-19 00:22:13,189] Trial 19 finished with value: 7.340739211429556 and parameters: {'encoder_dropout': 0.18872017707563837, 'decoder_dropout': 0.26471499196465104, 'embedding_size': 345, 'hidden_size': 809, 'num_layers': 1, 'momentum': 0.8770181216883318, 'learning_rate': 0.0006037527750744888, 'weight_decay': 1.0067442886238327e-06, 'batch_size': 62}. Best is trial 19 with value: 7.340739211429556.

# [I 2023-11-19 00:42:04,122] Trial 26 finished with value: 6.916367074204839 and parameters: {'encoder_dropout': 0.666860866556084, 'decoder_dropout': 0.14310546756612483, 'embedding_size': 403, 'hidden_size': 686, 'num_layers': 1, 'momentum': 0.5304037074535177, 'learning_rate': 0.016803731754976535, 'weight_decay': 0.007561913049800378, 'batch_size': 62}. Best is trial 26 with value: 6.916367074204839.


# Best hyperparameters: {'encoder_dropout': 0.6039111450304973, 'decoder_dropout': 0.33154046338693915, 'embedding_size': 468, 'hidden_size': 938, 'num_layers': 1, 'momentum': 0.5494925017261035, 'learning_rate': 0.04835390488901119, 'weight_decay': 0.00014738274625279338, 'batch_size': 132}
# Validation loss: 6.840339520820101



## Running Inference

In [42]:
test_encoder = Encoder(encoder_input_size, hidden_size, encoder_embedding_size, num_layers, encoder_dropout)
test_decoder = Decoder(hidden_size, decoder_output_size, decoder_embedding_size, num_layers, decoder_dropout)
    
inference_model = Seq2SeqInference(test_encoder, test_decoder, answer_field)
inference_model.to(device)

pad_idx = answer_field.vocab.stoi[answer_field.pad_token] 
test_criterion = nn.CrossEntropyLoss(ignore_index = pad_idx) # https://pytorch.org/docs/2.0/generated/torch.nn.CrossEntropyLoss.html?highlight=crossentropyloss#torch.nn.CrossEntropyLoss
test_criterion.to(device)

# checkpoint = torch.load('checkpoints/best_val_loss.pt')
checkpoint = torch.load('checkpoints/train-25.pt')
inference_model.load_state_dict(checkpoint['model_state_dict'])


test_iterator = BucketIterator(
        # dataset=valid_data,
        dataset=train_data,
       batch_size=1,
       sort_within_batch=True,
        sort_key = lambda x: len(x.question_context),
        device=device)


In [46]:
# Smoke text

with torch.no_grad():
    valid_loss = 0.0

    for batch_idx, batch in enumerate(test_iterator):
        src = batch.question_context.to(device)
        trg = batch.answer.to(device)
        
        # print("src shape", src.shape)
        # print("src", src)
        # print("trg", trg)
        
        tokens, logits = inference_model(src, 5)

        #trg = [trg len, batch size]
        #logits = [max length, batch, output dim]

#         print("logits shape", logits.shape)
#         print("seq2seq-logits-raw", logits)
        
#         logits_dim = logits.shape[-1]
#         logits = logits[1:].view(-1, logits_dim)
        
        # trg = trg[1:].view(-1)

        # print("seq2seq-logits", logits)
        # print("seq2seq-trg", trg)
        # print("tokens shape", tokens.shape)
        # print("tokens", tokens)
                
        # response_text = tokens_to_string(tokens.squeeze().tolist())
        response_text = decode_answer(tokens.squeeze().tolist())
        
        print(response_text)
        
        #trg = [(trg len - 1) * batch size]
        #output = [(trg len - 1) * batch size, output dim]

        # TODO: question: should 'output' really be the MLP result tensor ?
#         loss = test_criterion(output, trg)

#         valid_loss += loss.item()
        
        break # TODO: remove line





the


## Chatbot experience

In [47]:
def respond_question(question):
    with torch.no_grad():
        question_tensor = encode_question(question)

        tokens, logits = inference_model(src, 30)

        response_text = decode_answer(tokens.squeeze().tolist())

        return response_text
    

In [48]:
respond_question("A week comprise seven days. What day do you prefer ?")


'the'

In [49]:
respond_question("What would change the rotational inertia of a body under Newton's First Law of Motion? Torque is the rotation equivalent of force in the same way that angle is the rotational equivalent for position, angular velocity for velocity, and angular momentum for momentum. As a consequence of Newton's First Law of Motion, there exists rotational inertia that ensures that all bodies maintain their angular momentum unless acted upon by an unbalanced torque. Likewise, Newton's Second Law of Motion can be used to derive an analogous equation for the instantaneous angular acceleration of the rigid body")

'the'

In [None]:
respond_question("What would change the rotational inertia of a body under Newton's First Law of Motion? Torque is the rotation equivalent of force in the same way that angle is the rotational equivalent for position, angular velocity for velocity, and angular momentum for momentum. As a consequence of Newton's First Law of Motion, there exists rotational inertia that ensures that all bodies maintain their angular momentum unless acted upon by an unbalanced torque. Likewise, Newton's Second Law of Motion can be used to derive an analogous equation for the instantaneous angular acceleration of the rigid body")

In [None]:
respond_question("What changes macroscopic closed system energies? The connection between macroscopic nonconservative forces and microscopic conservative forces is described by detailed treatment with statistical mechanics. In macroscopic closed systems, nonconservative forces act to change the internal energies of the system, and are often associated with the transfer of heat. According to the Second law of thermodynamics, nonconservative forces necessarily result in energy transformations within closed systems from ordered to more random conditions as entropy increases.")