# LSTM Bot

## Project Overview

In this project, you will build a chatbot that can converse with you at the command line. The chatbot will use a Sequence to Sequence text generation architecture with an LSTM as it's memory unit. You will also learn to use pretrained word embeddings to improve the performance of the model. At the conclusion of the project, you will be able to show your chatbot to potential employers.

Additionally, you have the option to use pretrained word embeddings in your model. We have loaded Brown Embeddings from Gensim in the starter code below. You can compare the performance of your model with pre-trained embeddings against a model without the embeddings.



---



A sequence to sequence model (Seq2Seq) has two components:
- An Encoder consisting of an embedding layer and LSTM unit.
- A Decoder consisting of an embedding layer, LSTM unit, and linear output unit.

The Seq2Seq model works by accepting an input into the Encoder, passing the hidden state from the Encoder to the Decoder, which the Decoder uses to output a series of token predictions.

## Dependencies

- Pytorch
- Numpy
- Pandas
- NLTK
- Gzip
- Gensim


Please choose a dataset from the Torchtext website. We recommend looking at the Squad dataset first. Here is a link to the website where you can view your options:

- https://pytorch.org/text/stable/datasets.html





In [1]:
# Install Python packages specified in the "requirements.txt" file using the pip.
!pip install -r requirements.txt

Defaulting to user installation because normal site-packages is not writeable
Collecting torch==1.12.0
  Downloading torch-1.12.0-cp37-cp37m-manylinux1_x86_64.whl (776.3 MB)
[K     |████████████████████████████████| 776.3 MB 16 kB/s s eta 0:00:01    |███                             | 70.4 MB 10.4 MB/s eta 0:01:08     |█████████▉                      | 239.7 MB 66.8 MB/s eta 0:00:09███████▌              | 423.3 MB 58.0 MB/s eta 0:00:07�██████████▋        | 572.0 MB 63.2 MB/s eta 0:00:04MB/s eta 0:00:03     |██████████████████████████████▊ | 745.0 MB 48.1 MB/s eta 0:00:01
[?25hCollecting torchdata==0.4.0
  Downloading torchdata-0.4.0-cp37-cp37m-manylinux2014_x86_64.whl (4.4 MB)
[K     |████████████████████████████████| 4.4 MB 70.4 MB/s eta 0:00:01
[?25hCollecting torchtext==0.13.0
  Downloading torchtext-0.13.0-cp37-cp37m-manylinux1_x86_64.whl (1.9 MB)
[K     |████████████████████████████████| 1.9 MB 47.9 MB/s eta 0:00:01
Collecting portalocker>=2.0.0
  Downloading portalocker-2.7.0

In [1]:
# Import nltk library for nlp tasks
import nltk
# Import the 'brown' corpus from nltk
from nltk.corpus import brown
# Import torchtext library for handling text data in PyTorch
import torchtext
# Import torchdata module for data handling in PyTorch
import torchdata
# Import PyTorch library
import torch
# Import neural network module from PyTorch
import torch.nn as nn
# Import pandas library
import pandas as pd
# Import string module
import string
# Import random module
import random

In [2]:
# Check if a CUDA-compatible GPU is available, and set the device accordingly.
device = 'cuda' if torch.cuda.is_available() else 'cpu'
device

'cuda'

In [3]:
# Create a SnowballStemmer from nltk library for stemming English words.
stemmer = nltk.stem.snowball.SnowballStemmer('english')
stemmer

<nltk.stem.snowball.SnowballStemmer at 0x7f039c8a89d0>

In [4]:
# Download 'brown' corpus from nltk
nltk.download('brown')
# Download 'punkt' tokenizer model from nltk
nltk.download('punkt')

[nltk_data] Downloading package brown to /root/nltk_data...
[nltk_data]   Unzipping corpora/brown.zip.
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


True

In [5]:
# Define a function to load dataset into a Pandas DataFrame for processing
def loadDF(path):
    # Use torchtext library to load SQuAD1 dataset into train and testing datasets
    train_dataset, valid_dataset = torchtext.datasets.SQuAD1(root=path, split=('train','dev'))
    
    # Initialize dictionaries to store Questions and Answers for both training and testing
    train_dictionary = {'Questions': [], 'Answers': []}
    
    # Iterate over training dataset and extract Questions and the first Answer for each example
    for x, Qs, As, y in train_dataset:
        train_dictionary['Questions'].append(Qs)
        train_dictionary['Answers'].append(As[0])
    
    # Initialize dictionaries to store Questions and Answers for testing
    test_dictionary = {'Questions': [], 'Answers': []}

    # Iterate over the testing dataset and extract Questions and the first Answer for each example
    for x, Qs, As, y in valid_dataset:
        test_dictionary['Questions'].append(Qs)
        test_dictionary['Answers'].append(As[0])
        
    # Create Pandas DataFrames from dictionaries for both training and testing datasets
    train_df = pd.DataFrame(train_dictionary)    
    test_df = pd.DataFrame(test_dictionary)    
    
    # Concatenate training and testing DataFrames and return the result
    return train_df.append(test_df)

In [6]:
# Load dataset into a Pandas DataFrame using 'loadDF' function with the specified path
df = loadDF('data')
# Select first 6000 rows of DataFrame for further processing
df = df.iloc[:6000, :]

In [7]:
# Display the first few rows of DataFrame to inspect the loaded data
df.head()

Unnamed: 0,Questions,Answers
0,To whom did the Virgin Mary allegedly appear i...,Saint Bernadette Soubirous
1,What is in front of the Notre Dame Main Building?,a copper statue of Christ
2,The Basilica of the Sacred heart at Notre Dame...,the Main Building
3,What is the Grotto at Notre Dame?,a Marian place of prayer and reflection
4,What sits on top of the Main Building at Notre...,a golden statue of the Virgin Mary


In [8]:
# Define a function for cleaning text data
def cleanText(text):
    # Apply stemming to each word in the text using the previously defined stemmer
    text = ' '.join(stemmer.stem(word) for word in text.split())
    # Convert the text to lowercase and remove punctuation
    text = ''.join([ch.lower() for ch in text if ch not in string.punctuation])
    # Tokenize the cleaned text using a regular expression tokenizer from nltk
    tokens = nltk.tokenize.RegexpTokenizer(r'\w+').tokenize(text)

    # Return the cleaned and tokenized text
    return tokens

In [9]:
# Apply 'cleanText' function to 'Questions' column of the DataFrame
df['Questions'] = df['Questions'].apply(cleanText)
# Apply 'cleanText' function to 'Answers' column of the DataFrame
df['Answers'] = df['Answers'].apply(cleanText)

In [10]:
# Display the first few rows of DataFrame after applying 'cleanText' function to 'Questions' and 'Answers' columns
df.head()

Unnamed: 0,Questions,Answers
0,"[to, whom, did, the, virgin, mari, alleg, appe...","[saint, bernadett, soubir]"
1,"[what, is, in, front, of, the, notr, dame, mai...","[a, copper, statu, of, christ]"
2,"[the, basilica, of, the, sacr, heart, at, notr...","[the, main, build]"
3,"[what, is, the, grotto, at, notr, dame]","[a, marian, place, of, prayer, and, reflect]"
4,"[what, sit, on, top, of, the, main, build, at,...","[a, golden, statu, of, the, virgin, mari]"


In [11]:
# Combine the tokenized words in each row of 'Questions' column into a single string and create a list
list1 = df['Questions'].apply(lambda x: " ".join(x)).to_list()
# Combine the tokenized words in each row of 'Answers' column into a single string and create a list
list2 = df['Answers'].apply(lambda x: " ".join(x)).to_list()

In [12]:
# Create a list of pairs by zipping elements from 'list1' and 'list2'
list_pairs = [list(i) for i in zip(list1, list2)]
list_pairs

[['to whom did the virgin mari alleg appear in 1858 in lourd france',
  'saint bernadett soubir'],
 ['what is in front of the notr dame main building',
  'a copper statu of christ'],
 ['the basilica of the sacr heart at notr dame is besid to which structure',
  'the main build'],
 ['what is the grotto at notr dame', 'a marian place of prayer and reflect'],
 ['what sit on top of the main build at notr dame',
  'a golden statu of the virgin mari'],
 ['when did the scholast magazin of notr dame begin publishing',
  'septemb 1876'],
 ['how often is notr dame the juggler published', 'twice'],
 ['what is the daili student paper at notr dame called', 'the observ'],
 ['how mani student news paper are found at notr dame', 'three'],
 ['in what year did the student paper common sens begin public at notr dame',
  '1987'],
 ['where is the headquart of the congreg of the holi cross', 'rome'],
 ['what is the primari seminari of the congreg of the holi cross',
  'moreau seminari'],
 ['what is the olde

In [13]:
# Initialize variables to keep track of the maximum number of words in Questions and Answers
max_ques, max_ans = 0, 0

# Iterate over each pair in the list of pairs and update the maximum lengths
for p in list_pairs:
    max_ques = len(p[0].split()) if len(p[0].split()) > max_ques else max_ques
    max_ans = len(p[1].split()) if len(p[1].split()) > max_ans else max_ans

# Display the maximum number of words in Questions and Answers
max_ques, max_ans

(29, 43)

In [14]:
# Define special tokens for Start of Sequence (SOS) and End of Sequence (EOS).
SOS = 0
EOS = 1

In [15]:
# Define a Vocabulary class to manage the mapping between words and indices
class Vocabulary:
    def __init__(self):
        # Initialize dictionaries for word-to-index and index-to-word mappings with SOS and EOS tokens
        self.word2index = {"": SOS, "": EOS}
        self.index2word = {SOS: "", EOS: ""}
        # Initialize a counter for the number of unique words in the vocabulary
        self.words_count = len(self.word2index)

    def add_words(self, sentence):
        # Add words from a sentence to the vocabulary if they are not already present.
        for word in sentence.split(" "):
            if word not in self.word2index:
                self.word2index[word] = self.words_count
                self.index2word[self.words_count] = word
                self.words_count += 1

In [16]:
# Create instances of the Vocabulary class for both Questions and Answers.
ques_vocab = Vocabulary()
ans_vocab = Vocabulary()

In [17]:
# Iterate over each pair in the list of pairs and add words to the respective vocabularies.
for p in list_pairs:
    ques_vocab.add_words(p[0])
    ans_vocab.add_words(p[1])

In [18]:
# Define a function to convert a text sequence to a PyTorch tensor using a given vocabulary
def toTensor(vocabulary, text):
    # Convert words in the text to corresponding indices using the vocabulary
    indices = [vocabulary.word2index[word] for word in text.split(' ')]
    # Append the index of the empty string to mark the end of the sequence
    indices.append(vocabulary.word2index[''])
    # Convert the list of indices to a PyTorch tensor of type long, move it to the specified device, and reshape it
    return torch.Tensor(indices).long().to(device).view(-1, 1)

In [19]:
# Convert each question in the list of pairs to a PyTorch tensor using the Questions vocabulary
SRC = [toTensor(ques_vocab, p[0]) for p in list_pairs]
# Convert each answer in the list of pairs to a PyTorch tensor using the Answers vocabulary
TRG = [toTensor(ans_vocab, p[1]) for p in list_pairs]

In [20]:
# Define the Encoder class as a subclass of nn.Module
class Encoder(nn.Module):
    def __init__(self, input_size, hidden_size):
        super(Encoder, self).__init__()
        
        # Initialize the input size and hidden size attributes
        self.input_size = input_size
        self.hidden_size = hidden_size
        
        # Create an embedding layer for input sequences
        self.embedding = nn.Embedding(self.input_size, self.hidden_size)
        # Create an LSTM layer with input and hidden size set to the hidden size
        self.lstm = nn.LSTM(self.hidden_size, self.hidden_size)

    def forward(self, x, hidden, cell):
        # Embed the input sequence
        x = self.embedding(x)
        # Reshape the embedded sequence
        x = x.view(1, 1, -1)
        # Pass the reshaped sequence through the LSTM layer, updating the hidden and cell states
        x, (hidden, cell) = self.lstm(x, (hidden, cell))
        # Return the output, updated hidden state, and updated cell state
        return x, hidden, cell

In [21]:
# Define the Decoder class as a subclass of nn.Module
class Decoder(nn.Module):
    def __init__(self, hidden_size, output_size):
        super(Decoder, self).__init__()
        
        # Initialize the hidden size and output size attributes
        self.hidden_size = hidden_size
        self.output_size = output_size
        
        # Create an embedding layer for output sequences
        self.embedding = nn.Embedding(output_size, self.hidden_size)
        # Create an LSTM layer with input and hidden size set to the hidden size
        self.lstm = nn.LSTM(self.hidden_size, self.hidden_size)
        # Create a fully connected layer to map the hidden size to the output size
        self.fc = nn.Linear(self.hidden_size, self.output_size)
        # Apply LogSoftmax activation to the output
        self.output = nn.LogSoftmax(dim=1)

    def forward(self, x, hidden, cell):
        # Embed the input sequence
        x = self.embedding(x)
        # Reshape the embedded sequence
        x = x.view(1, 1, -1)
        # Pass the reshaped sequence through the LSTM layer, updating the hidden and cell states
        x, (hidden, cell) = self.lstm(x, (hidden, cell))
        # Map the output of the LSTM through the fully connected layer and apply LogSoftmax
        x = self.output(self.fc(x[0]))
        # Return the output, updated hidden state, and updated cell state
        return x, hidden, cell

In [22]:
class Seq2Seq(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(Seq2Seq, self).__init__()
        
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.output_size = output_size
        
        self.encoder = Encoder(self.input_size, self.hidden_size)
        self.decoder = Decoder(self.hidden_size, self.output_size)
        
    def forward(self, src, trg, src_len, trg_len, teacher_force=1):
        
        output = {'decoder_output':[]}
        
        encoder_hidden = torch.zeros([1, 1, self.hidden_size]).to(device) # 1 = number of LSTM layers
        cell = torch.zeros([1, 1, self.hidden_size]).to(device)  
        
        for i in range(src_len):
            encoder_output, encoder_hidden, cell = self.encoder(src[i], encoder_hidden, cell)

        decoder_input = torch.Tensor([[0]]).long().to(device) # 0 = SOS_token
        decoder_hidden = encoder_hidden
        
        for i in range(trg_len):
            decoder_output, decoder_hidden, cell = self.decoder(decoder_input, decoder_hidden, cell)
            output['decoder_output'].append(decoder_output)
            
            if self.training: # Model not in eval mode
                decoder_input = target_tensor[i] if random.random() > teacher_force else decoder_output.argmax(1)
            else:
                _, top_index = decoder_output.data.topk(1)
                decoder_input = top_index.squeeze().detach()
                
        return output
    
    
# Define the Seq2Seq class as a subclass of nn.Module
class Seq2Seq(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(Seq2Seq, self).__init__()
        
        # Initialize input size, hidden size, and output size attributes
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.output_size = output_size
        
        # Create instances of the Encoder and Decoder classes
        self.encoder = Encoder(self.input_size, self.hidden_size)
        self.decoder = Decoder(self.hidden_size, self.output_size)
        
    def forward(self, src, trg, src_len, trg_len, teacher_force=1):
        # Initialize an empty dictionary to store the decoder output
        output = {'decoder_output': []}
        
        # Initialize hidden and cell states for the encoder
        encoder_hidden = torch.zeros([1, 1, self.hidden_size]).to(device)
        cell = torch.zeros([1, 1, self.hidden_size]).to(device)
        
        # Pass the source sequence through the encoder
        for i in range(src_len):
            encoder_output, encoder_hidden, cell = self.encoder(src[i], encoder_hidden, cell)

        # Initialize the input to the decoder as the SOS token
        decoder_input = torch.Tensor([[0]]).long().to(device)
        decoder_hidden = encoder_hidden
        
        # Loop through the target sequence length for decoding
        for i in range(trg_len):
            # Pass the decoder input through the decoder and update hidden and cell states
            decoder_output, decoder_hidden, cell = self.decoder(decoder_input, decoder_hidden, cell)
            # Append the decoder output to the dictionary
            output['decoder_output'].append(decoder_output)
            
            if self.training:
                # Use teacher forcing with probability 'teacher_force' during training
                decoder_input = trg[i] if random.random() > teacher_force else decoder_output.argmax(1)
            else:
                # During inference, use the top-k prediction as the next input
                _, top_index = decoder_output.data.topk(1)
                decoder_input = top_index.squeeze().detach()
                
        return output

In [23]:
# Access the number of unique words in the Questions vocabulary
ques_vocab.words_count

6031

In [24]:
# Access the number of unique words in the Answers vocabulary.
ans_vocab.words_count

5338

In [25]:
# Define the dimensions for the encoder and decoder
encoder_input_size = ques_vocab.words_count  # Define input vocabulary size for the encoder
hidden_size = 128  # Define hidden size for both encoder and decoder 
decoder_output_size = ans_vocab.words_count  # Define output vocabulary size for the decoder

In [26]:
# Instantiate the Seq2Seq model with the specified input size, hidden size, and output size
seq2seq_model = Seq2Seq(encoder_input_size, hidden_size, decoder_output_size)

In [27]:
# Display the Seq2Seq model object
seq2seq_model

Seq2Seq(
  (encoder): Encoder(
    (embedding): Embedding(6031, 128)
    (lstm): LSTM(128, 128)
  )
  (decoder): Decoder(
    (embedding): Embedding(5338, 128)
    (lstm): LSTM(128, 128)
    (fc): Linear(in_features=128, out_features=5338, bias=True)
    (output): LogSoftmax(dim=1)
  )
)

In [28]:
# Define hyperparameters

# Set the learning rate for the Adam optimizer
learning_rate = 0.001
# Initialize the Adam optimizer with the parameters of the Seq2Seq model
optimizer = torch.optim.Adam(seq2seq_model.parameters(), lr=learning_rate)
# Define the loss criterion as CrossEntropyLoss
criterion = nn.CrossEntropyLoss()
# Specify the number of epochs for training
epochs = 64
# Set the batch size for training
batch_size = 128

In [29]:
# Import the train_test_split function from scikit-learn library
from sklearn.model_selection import train_test_split

# Define a function for splitting the dataset into training and testing sets
def train_test_split_call(SRC, TRG, test_size=0.2, random_state=42):
    '''
    Input: SRC, our list of questions from the dataset
            TRG, our list of responses from the dataset

    Output: Training and test datasets for SRC & TRG

    '''
    # Use train_test_split to split the SRC and TRG lists into training and testing datasets
    SRC_train_dataset, SRC_test_dataset, TRG_train_dataset, TRG_test_dataset = train_test_split(
        SRC, TRG, test_size=test_size, random_state=random_state
    )
    # Return the training and testing datasets for SRC and TRG
    return SRC_train_dataset, SRC_test_dataset, TRG_train_dataset, TRG_test_dataset

In [30]:
# Define a function for training the Seq2Seq model
def train(model, SRC, TRG, epochs, batch_size, optimizer, criterion):
    # Move the model to the specified device (cuda or cpu)
    model.to(device)
    # Initialize variables to track training and testing loss
    total_train_loss = 0
    total_test_loss = 0
    total_loss = 0

    # Split the dataset into training and testing sets
    SRC_train_dataset, SRC_test_dataset, TRG_train_dataset, TRG_test_dataset = train_test_split_call(
        SRC, TRG)

    # Loop through the specified number of epochs
    for e in range(1, epochs + 1):
        # Set the model in training mode
        model.train()
        # Loop through the training dataset
        for i in range(0, len(SRC_train_dataset)):
            src = SRC_train_dataset[i]
            trg = TRG_train_dataset[i]

            # Forward pass through the model
            output = model(src, trg, src.size(0), trg.size(0))

            # Calculate the current loss for each element in the output sequence
            current_loss = 0
            for (s, t) in zip(output["decoder_output"], trg):
                current_loss += criterion(s, t)

            # Accumulate the total loss and update the model parameters
            total_loss += current_loss
            total_train_loss += (current_loss.item() / trg.size(0))

            if i % batch_size == 0 or i == (len(SRC_train_dataset) - 1):
                total_loss.backward()
                optimizer.step()
                optimizer.zero_grad()
                total_loss = 0

        # Set the model in evaluation mode
        model.eval()
        # Loop through the test dataset
        for i in range(0, len(SRC_test_dataset)):
            src = SRC_test_dataset[i]
            trg = TRG_test_dataset[i]

            # Forward pass through the model
            output = model(src, trg, src.size(0), trg.size(0))

            # Calculate the current loss for each element in the output sequence
            current_loss = 0
            for (s, t) in zip(output["decoder_output"], trg):
                current_loss += criterion(s, t)

            # Accumulate the total test loss.
            total_test_loss += (current_loss.item() / trg.size(0))

        # Print the average training and testing loss every two epochs
        if e % 2 == 0:
            train_loss_average = total_train_loss / (len(SRC_train_dataset) * 2)
            test_loss_average = total_test_loss / (len(SRC_test_dataset) * 2)
            print("{}/{} Epoch  -  Training Loss = {:.5f}  -  Testing Loss = {:.5f}".format(e, epochs, train_loss_average, test_loss_average))
            total_train_loss = 0
            total_test_loss = 0

In [31]:
# Train the Seq2Seq model using the provided training function
train(seq2seq_model, SRC, TRG, epochs, batch_size, optimizer, criterion)

2/64 Epoch  -  Training Loss = 6.19209  -  Testing Loss = 5.99154
4/64 Epoch  -  Training Loss = 5.43636  -  Testing Loss = 5.97434
6/64 Epoch  -  Training Loss = 5.31771  -  Testing Loss = 5.99534
8/64 Epoch  -  Training Loss = 5.16459  -  Testing Loss = 5.97688
10/64 Epoch  -  Training Loss = 4.96462  -  Testing Loss = 5.98167
12/64 Epoch  -  Training Loss = 4.73511  -  Testing Loss = 5.98262
14/64 Epoch  -  Training Loss = 4.48985  -  Testing Loss = 5.99094
16/64 Epoch  -  Training Loss = 4.25741  -  Testing Loss = 6.03699
18/64 Epoch  -  Training Loss = 3.98601  -  Testing Loss = 6.05018
20/64 Epoch  -  Training Loss = 3.71335  -  Testing Loss = 6.06371
22/64 Epoch  -  Training Loss = 3.43409  -  Testing Loss = 6.14066
24/64 Epoch  -  Training Loss = 3.17336  -  Testing Loss = 6.17858
26/64 Epoch  -  Training Loss = 2.91537  -  Testing Loss = 6.21766
28/64 Epoch  -  Training Loss = 2.71792  -  Testing Loss = 6.26185
30/64 Epoch  -  Training Loss = 2.47240  -  Testing Loss = 6.21944

In [32]:
# Save the trained Seq2Seq model to a file using PyTorch's torch.save()
model_path = 'seq2seq_model.pt'
torch.save(seq2seq_model, model_path)

In [33]:
# Load the trained Seq2Seq model from the saved file using PyTorch's torch.load()
seq2seq_model = torch.load(model_path, map_location=torch.device('cuda'))
# Set the model in evaluation mode
seq2seq_model.eval()

Seq2Seq(
  (encoder): Encoder(
    (embedding): Embedding(6031, 128)
    (lstm): LSTM(128, 128)
  )
  (decoder): Decoder(
    (embedding): Embedding(5338, 128)
    (lstm): LSTM(128, 128)
    (fc): Linear(in_features=128, out_features=5338, bias=True)
    (output): LogSoftmax(dim=1)
  )
)

In [34]:
# Define a function to evaluate the Seq2Seq model on a given question
def evaluate(seq2seq_model, question, SRC, TRG, max_ans):
    
    try:
        # Convert the input question to a PyTorch tensor using the Questions vocabulary
        question = toTensor(ques_vocab, " ".join(cleanText(question)))
    except:
        print("Words Encountered Does Not Exist!")
        return
    
    # Initialize a list to store the predicted answer words
    answer_words = []
    
    # Perform the forward pass through the Seq2Seq model with the given question
    output = seq2seq_model(question, None, question.size(0), max_ans)

    # Loop through the decoder output to generate the predicted answer words
    for tensor in output['decoder_output']:
        _, top_token = tensor.data.topk(1)
        # Break if the top token corresponds to the EOS token
        if top_token.item() == 1:
            break
        else:
            # Retrieve the word corresponding to the top token and append it to the list
            word = ans_vocab.index2word[top_token.item()]
            answer_words.append(word)
            
    # Print the generated answer
    print("Answer: ", ' '.join(answer_words), "\n")

In [35]:
# Display a prompt for the user to type questions
print("Type 'exit' to end the chat:\n")

# Continue the chat loop until the user types 'exit'
while True:
    # Get a question input from the user
    question = input("Question: ")
    
    # Check if the user wants to exit the chat
    if question.strip() == "exit":
        break
    
    # Call the evaluate function to generate and print the model's response
    evaluate(seq2seq_model, question, SRC, TRG, max_ans)

Type 'exit' to end the chat:

Question: In what year was the Theodore M. Hesburgh Library at Notre Dame finished?
Answer:  1963 

Question: Which college did Notre Dame add in 1921?
Answer:  colleg of commerc 

Question: Over how many years did the change to national standards undertaken at Notre Dame in the early 20th century take place?
Answer:  three year 

Question: Those who attended a Jesuit college may have been forbidden from joining which Law School due to the curricula at the Jesuit institution?
Answer:  harvard law school 

Question: The Notre Dame football team got a new head coach in 1918, who was it?
Answer:  knute rockn 

Question: When did the Scholastic Magazine of Notre dame begin publishing?
Answer:  septemb 1876 

Question: What sits on top of the Main Building at Notre Dame?
Answer:  a golden statu of in virgin in 

Question: What is the Grotto at Notre Dame?
Answer:  a marian of place prayer and reflect 

Question: The Basilica of the Sacred heart at Notre Dame is