# A Seq2seq Chatbot utilizing an Attention-based Encoder-Decoder Architecture


- **Author: Eleftherios P. Loukas - eleftheriosloukas@gmail.com**

## Environment

We uploaded all the local data in our Google Drive. 

We will use Google Colaboratory since a GPU accelerator is provided without cost, along with a 26GB RAM too, which both will help us with our computations.

Let's mount our data to Google Colaboratory so the notebook can have access to them.

In [0]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [0]:
import os
print(os.getcwd())

/content


In [0]:
!pwd
path_to_mount = '/content/drive/My Drive/Colab Notebooks/ncsr/'

# Change current working directory
os.chdir(path_to_mount)
!ls

/content/drive/My Drive/Colab Notebooks/ncsr
chatbot-tf-notebook.ipynb  encoder_serialized.pt  training_checkpoints
data			   old			  version-keras-char-level
decoder_serialized.pt	   pytorch-chatbot.ipynb  version-keras-word-level


Let's import all the libraries we need.

In [0]:
# PyTorch
import torch
import torch.nn as nn
from torch import optim
import torch.nn.functional as F

# Etc
from __future__ import unicode_literals, print_function, division
from io import open
import unicodedata
import re
import random
import glob
import json


# Use GPU if available
if (torch.cuda.is_available()):
    device = torch.device('cuda')
    print("Running on GPU")
else: 
    device = torch.device('cpu')
    print("Running on CPU")

Running on GPU


## JSON Parsing

***This process is only done once.***

***After preprocessing the data, we save it to a local .tsv file and then load it each time. If you don't want to see the whole process, skip to the 'Vocabulary Classes' section.***

-----


Great, we can now see our notebook and the data folder.

By exploring our data, we can see that we are interested in the text files inside the *dialogue folder* and more specifically in the '*turns*' value.

If we visualize the files, we can understand that they are JSON formatted.

However, they are ill-formatted, meaning that the files themselves are not JSONs.

Instead, **each line is a JSON object itself.**

Let's have a look at all the files and parse them with the *json* and *glob* library.

In [0]:
# Get absolute paths of files
dialogues_regex_folder_path = "data/dialogues/*.txt"

# Get the absolute paths for each file 
list_of_files = glob.glob(path_to_mount + dialogues_regex_folder_path)
print(list_of_files[:3]) # Visualize the first 3
print(len(list_of_files)) # 47

['/content/drive/My Drive/Colab Notebooks/ncsr/data/dialogues/AGREEMENT_BOT.txt', '/content/drive/My Drive/Colab Notebooks/ncsr/data/dialogues/APARTMENT_FINDER.txt', '/content/drive/My Drive/Colab Notebooks/ncsr/data/dialogues/CHECK_STATUS.txt']
47


In [0]:
# Parsing
list_of_dicts = [] # Init

# Loop for each file
for filename in list_of_files:
  with open(filename) as f:
      for line in f: # Loop for each line (inside each file)
          list_of_dicts.append(json.loads(line)) # insert in a dictionary


In [0]:
# Visualize the dictionaries
print(list_of_dicts[0])
print(list_of_dicts[1].keys)
print(list_of_dicts[332])
print(list_of_dicts[:3])

{'id': 'c399a493', 'user_id': 'c05f0462', 'bot_id': 'c96edf42', 'domain': 'AGREEMENT_BOT', 'task_id': 'a9203a2c', 'turns': ['Hello how may I help you?', 'i am awesome', 'of course you are', 'and i own rental properties on the moon', 'i doubt you own a property in the moon', 'just kidding. i own them on Earth', "that's a nice joke", 'because i am a billionaire!', "i don't seem to know you", 'and i programmed you', 'i am the programmer']}
<built-in method keys of dict object at 0x7f903fc2d900>
{'id': '77d8f493', 'user_id': '3205aff7', 'bot_id': 'f3420389', 'domain': 'AGREEMENT_BOT', 'task_id': 'd47b54df', 'turns': ['Hello how may I help you?', 'I must say that an agreement bot is really useless', 'I have never heard a more correct statement in my entire life!', 'yes what useless trip agreement bots are *tripe', "I couldn't have said it better myself. Is there anything else I can enthusiastically agree with today?", 'Well I can repeat my sentiment 4 times about agreement bots being utterl

Great! We have a dictionary out of our raw text dataset.

As we can see, we are only interested in the 'turns' value since this contains the array with the QAs.

So, we will dump out all the extra properties and create a new dictionary containing only the useful data.

In [0]:
# Create a new dict containing only useful data
new_list_of_dicts = [] 

for old_dict in list_of_dicts:
  foodict = {k: v for k, v in old_dict.items() if (k == 'turns')} 
  new_list_of_dicts.append(foodict)

print(len(new_list_of_dicts))

# Just to be sure we don't make bad use of the old variable,
# we will make the old dict equal to the new one.
# In the end, they are all the same.
list_of_dicts = []
list_of_dicts = new_list_of_dicts 

print(list_of_dicts[:2])

37884
[{'turns': ['Hello how may I help you?', 'i am awesome', 'of course you are', 'and i own rental properties on the moon', 'i doubt you own a property in the moon', 'just kidding. i own them on Earth', "that's a nice joke", 'because i am a billionaire!', "i don't seem to know you", 'and i programmed you', 'i am the programmer']}, {'turns': ['Hello how may I help you?', 'I am the king of the world', 'I agree that you are the king of the world', 'I can have any woman I want!', 'I agree that you can have any woman you desire.', 'Even you bot, if I were in to AIs', 'Agreed.', "Really? you're awfully agreeable aren't you", 'I agree that I am awfully agreeable, yes.', 'Having an agreement bot seems like a useless thing to have. I need some spice in my life!', 'I really agree with that. I am rather useles.']}]


## Data Augmentation & Preparation

As we see, we now have a new list of dicts which is a list of 37884 dictionaries (if all 47 texts are loaded) that inside contain only one property ('turns'), and each property contains an array with the QA dataset.

Notice that the dialog is instantiated by the bot first, and then the human responds. This goes on and on like ping pong and essentially the dialog is over.

**If we observe the data more carefully, we can see that the last sentence may be given by both the bot and the user!**

This will come in handy when preparing our input, target dataset.

Let's assign the dialogs into 2 matrices:
- questions 
- answers

But first, some corner cases need to be defined:
- A first 'artificial' user 'greeting' to the bot
- An 'artificial' bot 'bye' to the user, if the user was the last one in the dialogue.

In [0]:
# Init matrices
questions = []
answers = []

# We assume that the first answer by the bot (aka "Hello, how may I help you?") 
# is returned after a user greeting.
# This is used in order to ensure that the dataset will be even 
# and each question is paired with an answer.
# That's why we create a mini random catalog 
# of artificial 'ghost' user greetings.
matrix_greetings = ["Hey", "Hi"]

# A similar situation happens in the corner case 
# when the last sentence is from the user.
# As said, each sentence from the user should be paired
# with a sentence from the bot.
# That's why we will in this case add an artificial one.
matrix_byes = ["Ok", "Okie", "Bye"]

# For each dictionary in the list
for dictionary in list_of_dicts:
  matrix_QA = dictionary['turns']
  
  # Append a first random greeting, as explained above
  questions.append(random.choice(matrix_greetings))
    
  # In order to split the QAs to 2 matrices (questions & answers),
  # we will use a flag to indicate if the sentence 
  # is given from the bot or from the user
  bot_flag = True # Init

  # For each Q/A in the matrix
  for sentence in matrix_QA:

    if bot_flag == True:
      answers.append(sentence) # Used for bot's answers
      bot_flag = False # Switch
      continue
    else:
      questions.append(sentence) # Used for user's questions
      bot_flag = True # Switch
      continue

  # The last loop (ideally) ends with a bot's answer,
  # thus making bot_flag equal to False.
  # Although, with data visualization and exploring,
  # we can see that this does not happen all the time.

  # Corner case: If the last answers was from the user, 
  # then we need to add one artificial 'ghost' response 
  # from the bot to make the dataset even.
  if bot_flag == True: 
    answers.append(random.choice(matrix_byes))


In [0]:
assert len(questions) == len(answers), "ERROR: The length of the questions and answer matrices are different."
# If it does not return any warning/error, then everything is good.

print(len(questions)) # We have 238051 QAs (if we load all 47 texts)

238051


In [0]:
"""
    Write to tsv file so we just load this each time
"""
import csv

filepath_to_save = '/tmp/output.tsv' # Change accordingly
with open(filepath_to_save, 'wt') as out_file:
    # Instantiate object
    tsv_writer = csv.writer(out_file, delimiter='\t')

    # Loop QAs & write to file
    for i in range(len(questions)):
        tsv_writer.writerow([questions[i], answers[i]])

-------------------- 

## Vocabulary classes
We build a class for both questions and answers.

It will be used for 2 separate encoding and decoding objects, since
each of them is defined by the same parameters, but with different values.

In [0]:
#### HELPERS

### Helper class for word indexing
SOS_TOKEN = 0 # Start of sentence
EOS_TOKEN = 1 # End of sentence

# Let's define a QA (Questions/Answers) class
# since each class has its own 'language'.

class QA_Lang:
    """ 
    # The constructor should be specified by its:
    # - word2index, a dictionary that maps each word to each index
    # - index2word, a dictionary that maps each index to each word
    # - n_words, the number of words in the dictionary
    """
    def __init__(self):
        self.word2index = {}
        self.index2word = {0: 'SOS', 1: 'EOS'} # Reserved for start and end token
        self.n_words = 2 # Initialize with start and end token

    # Use each sentence and instantiate the class properties
    def add_sentence(self, sentence):
        for word in sentence.split(' '): # For each word in the sentence
            if word not in self.word2index: # If word is not seen
                # Add new word
                self.word2index[word] = self.n_words
                self.index2word[self.n_words] = word
                self.n_words += 1
            


## Text Preprocessing
Let's remove non-alphabet/punctuation characters and make them all ASCII encoded.

In [0]:
# Preprocessing helper function
def preprocess_text(sentence):
    """
    Preprocesses text to lowercase ASCII alphabet-only characters
    without punctuation
    """

    # Conver sentence to lowercase, after removing whitespaces
    sentence = sentence.lower().strip()

    # Convert Unicode string to plain ASCII characters
    normalized_sentence = [c for c in unicodedata.normalize('NFD', sentence) if
                           unicodedata.category(c) != 'Mn']

    # Append the normalized sentence
    sentence = ''
    sentence = ''.join(normalized_sentence)
    
    # Remove punctuation and non-alphabet characters
    sentence = re.sub(r"([.!?])", r" \1", sentence)
    sentence = re.sub(r"[^a-zA-Z.!?]+", r" ", sentence)

    return sentence

In [0]:
# Visualize the path once again
print(os.getcwd())

/content/drive/My Drive/Colab Notebooks/ncsr


## Load file 

Read the already-prepared tsv file from the local storage and clean it using the above-defined method.

The *preprocess_text() method must be compiled.*

In [0]:
# Reading helper function
def readQA():
    """
    Reads the tab-separated data from the storage and cleans it
    """

    print('Reading lines from file...')

    # Read text from file and split into lines
    # Remember that .tsv file separates pairs with the tab character and
    # each pair is separated with a newline character

    data_path = os.getcwd() + "/data/dataset.tsv" # Change to your own
    lines = open(data_path, encoding='utf-8').read().strip().split('\n')

    # Split lines into pairs, normalize
    TAB_CHARACTER = '\t'

    pairs = [[preprocess_text(sentence) \
              for sentence in line.split(TAB_CHARACTER)] \
              for line in lines]
    
    ''' 
    # Find maximum length of pairs
    count1 = count2 = 0
    max_words = 0
    for i in range(len(pairs)):
        count1 = len(pairs[i][0].split())
        count2 = len(pairs[i][1].split())
        result = count1 + count2
        if result > max_words:
            max_words = result

    print(max_words) # 304
    '''
    
    questions = QA_Lang()
    answers = QA_Lang()

    return questions, answers, pairs


## Filtering
The maximum number of words in a dialog is 304!

If we manually look at our data, we can see that normally most sentences have max ~15 words. We can also validate it with a histogram.

If we don't filter the long sentences that compose such dialogs, we will dramatically hurt our training performance.

Since so many words is not a usual thing, we can try to filter some sentences based on their word count.

In [0]:
MAX_LENGTH = 35 # Arbitrary, try different values!

# Filtering helper function
def filter(pairs):
    """
    Filters sentences based on the max length defined above.
    """
    new_pairs = []

    for pair in pairs:
        question_length = len(pair[0].split(' '))
        answer_length = len(pair[1].split(' '))

        if question_length < MAX_LENGTH and answer_length < MAX_LENGTH:
            new_pairs.append(pair)

    return new_pairs

## Preparing the dataset
Let's combine all the above little methods in one.

In [0]:
def prepare_data():
    """
    Prepares the data, combining all of the above methods and returns:
    questions, answers objects and the pairs of sentences
    """
    # Read sentence pairs
    questions, answers, pairs = readQA()
    print("Read " + str(len(pairs)) + " sentence pairs")

    # Filter pairs
    pairs = filter(pairs)
    print("Filtered down to " + str(len(pairs)) + " sentence pairs")

    # Count words and instantiate the 'language' objects 
    for pair in pairs:
        questions.add_sentence(pair[0])
        answers.add_sentence(pair[1])

    print("The questions object is defined by " +
                        str(questions.n_words) + " words")
    
    print("The answers object is defined by " +
                        str(answers.n_words) + " words")

    return questions, answers, pairs

Finally, let's call the method.

In [0]:
# Load and prepare the dataset, printing some characteristics
questions, answers, pairs = prepare_data()

Reading lines from file...
Read 238051 sentence pairs
Filtered down to 236832 sentence pairs
The questions object is defined by 18847 words
The answers object is defined by 21561 words


In [0]:
# Visualize 3 random pairs of Q&A
for _ in range(3):
    print(random.choice(pairs))

['hi', 'hello how may i help you ?']
['what a disappointment . this bot is trash .', 'im sorry but i am just a bot to clarify rules']
['yes please maybe . townhouse to rent ?', 'brookview at citrus park has some spacious bedroom townhomes starting at .']


## NN Design: Attention-based seq2seq Model 
Let's build a class for our Encoder and our Attention-based Decoder
As stated in the report, this will be based on [PyTorch's](https://github.com/pytorch/tutorials/blob/master/intermediate_source/seq2seq_translation_tutorial.py) and [TensorFlow's intermediate tutorials](https://github.com/tensorflow/nmt/tree/master/nmt) on Neural Machine Translation.

In [0]:
##### SEQ2SEQ MODEL

class EncoderRNN(nn.Module):
    """
    The encoder is a GRU in our case.
    It takes the questions matrix as input. For each word in the 
    sentence, it produces a vector and a hidden state; The last one
    will be passed to the decoder in order to initialize it.
    """
    # Initialize encoder
    def __init__(self, input_size, hidden_size): 
        super(EncoderRNN, self).__init__()
        self.hidden_size = hidden_size

        # Embedding layers convert the padded sentences into appropriate vectors
        # The input size is equal to the questions vocabulary
        self.embedding = nn.Embedding(input_size, hidden_size)
        
        # We use a GRU because it's simpler and more efficient (training-wise)
        # than an LSTM
        self.gru = nn.GRU(hidden_size, hidden_size)

    # Forward passes
    def forward(self, input, hidden):
        embedded = self.embedding(input).view(1, 1, -1)
        output = embedded

        # Pass the hidden state and the encoder output to the next word input
        output, hidden = self.gru(output, hidden) 

        return output, hidden

    # PyTorch Forward Passes
    def init_hidden(self):
        return torch.zeros(1, 1, self.hidden_size, device=device)

##### ATTENTION-BASED DECODER
"""
(Description taken from PyTorch Tutorial, as referenced)

Calculate a set of attention weights.

Multiply attention weights by the encoder output vectors to create a weighted
combination. The result would contain information about that specific part of
the input sequence, and thus help the decoder choose the right output words.

To calculate the attention weights, we'll use a feed-forward layer that uses
the decoder's input and hidden state as inputs.

We will have to choose a max sentence length (input length, for encoder outputs),
wherein sentences of the max length will use all attention weights, while shorter
sentences would only use the first few.
"""
class AttnDecoderRNN(nn.Module):
    def __init__(self, hidden_size, output_size, dropout_p=0.1, max_length=MAX_LENGTH):
        # Initialize the constructor
        super(AttnDecoderRNN, self).__init__()
        self.hidden_size = hidden_size
        self.output_size = output_size
        self.dropout_p = dropout_p
        self.max_length = max_length

        self.embedding = nn.Embedding(self.output_size, self.hidden_size)
        # Combine Fully Connected Layer
        self.attention = nn.Linear(self.hidden_size * 2, self.max_length)
        self.attention_combine = nn.Linear(self.hidden_size * 2,
                                           self.hidden_size)
        # Use dropout
        self.dropout = nn.Dropout(self.dropout_p)

        # Follow with a GRU and a FC layer
        # We use a GRU because it's simpler and more efficient (training-wise)
        # than an LSTM
        self.gru = nn.GRU(self.hidden_size, self.hidden_size)
        self.out = nn.Linear(self.hidden_size, self.output_size)

    def forward(self, input, hidden, encoder_outputs):
        # Forward passes as from the repo
        embedded = self.embedding(input).view(1, 1, -1)
        embedded = self.dropout(embedded)

        attention_weights = F.softmax(self.attention(torch.cat((embedded[0],
                                                                hidden[0]), 1)),
                                                                 dim=1)
        
        attention_applied = torch.bmm(attention_weights.unsqueeze(0), encoder_outputs.unsqueeze(0))

        output = torch.cat((embedded[0], attention_applied[0]), 1)
        output = self.attention_combine(output).unsqueeze(0)

        # Follow with a ReLU activation function after dropout
        output = F.relu(output)

        # Then, use the GRU
        output, hidden = self.gru(output, hidden)

        # And use softmax as the activation function
        output = F.log_softmax(self.out(output[0]), dim=1)

        return output, hidden, attention_weights

    def init_hidden(self):
        return torch.zeros(1, 1, self.hidden_size, device=device)

## NN Preprocessing
Neural Networks require fixed-size integer vectors in order to operate.

That's why we will one-hot encode our sentences using the appropriate vocabulary (the encoder's or the decoder's one) each time.

In [0]:
##### NETWORK PREPROCESSING HELPERS

def tensor_from_sentence(lang, sentence):
    """
    Given an input sentence and a 'language' object, 
    it creates an appropriate tensor with the EOS_TOKEN in the end.
    """

    # For each sentence, get a list of the word indices
    indices = [lang.word2index[word] for word in sentence.split(' ')]
    indices.append(EOS_TOKEN) # That will help the decoder know when to stop

    # Convert to a PyTorch tensor
    sentence_tensor = torch.tensor(indices, dtype=torch.long, device=device).view(-1, 1)

    return sentence_tensor

def tensors_from_pair(pair):
    """
    Given our 2D dataset as a list, it calls the 'tensor_from_sentence' method
    and returns the appropriate input/target tensors
    """
    
    input_tensor = tensor_from_sentence(questions, pair[0])
    target_tensor = tensor_from_sentence(answers, pair[1])

    return (input_tensor, target_tensor)

Some display helpers will be used in the training.

In [0]:
##### DISPLAY HELPERS
"""
Helper functions for printing time elapsed and estimated remaining time for
training.
"""
import time
import math

def as_minutes(s):
    m = math.floor(s / 60)
    s -= m * 60

    return '%dm %ds' % (m, s)

def time_since(since, percent):
    now = time.time()
    s = now - since
    es = s / (percent)
    rs = es - s

    return '%s (- %s)' % (as_minutes(s), as_minutes(rs))

## NN Training
We will exploit the teacher forcing policy for training.

Also, we need to specify the encoder-decoder pipeline, along with any initialization needed.

In [0]:
# Training helper method
def train(input_tensor, target_tensor, encoder, decoder, encoder_optimizer,
            decoder_optimizer, criterion, max_length = MAX_LENGTH):
    """
    This method is responsible for the NN training. Specifically:

    - Runs input sentence through encoder
    - Keeps track of every output and the last hidden state
    - Then, the decoder is given the start of sentence token (SOS) 
            as its first input, and the last hidden state of the encoder
            as its first hidden state. We also utilize teacher forcing;
            The decoder uses the real target outputs as each next input.
    - Returns the current loss
    """

    # Train one iteration
    encoder_hidden = encoder.init_hidden()

    # Set gradients to zero 
    encoder_optimizer.zero_grad()
    decoder_optimizer.zero_grad()

    # Get input and target length
    input_length = input_tensor.size(0)
    target_length = target_tensor.size(0)

    # Init outputs to a zeros array equal to MAX_LENGTH 
    # and the encoder's latent dimensionality
    encoder_outputs = torch.zeros(max_length, encoder.hidden_size, device=device)

    # Initialize the loss
    loss = 0 

    # Encode input
    for encoder_input in range(input_length):
        # Include hidden state from the last input when encoding current input
        encoder_output, encoder_hidden = encoder(input_tensor[encoder_input], encoder_hidden)
        encoder_outputs[encoder_input] = encoder_output[0, 0]

    # Decoder uses SOS token as first input
    decoder_input = torch.tensor([[SOS_TOKEN]], device=device)

    # Decoder uses last hidden state of encoder as first hidden state
    decoder_hidden = encoder_hidden

    # Teacher forcing: Feed the actual target as the next input instead of the predicted one
    for d_i in range(target_length):
        decoder_output, decoder_hidden, decoder_attention = decoder(decoder_input,
                                                                    decoder_hidden,
                                                                    encoder_outputs)

        loss += criterion(decoder_output, target_tensor[d_i])

        decoder_input = target_tensor[d_i] # Teacher forcing

    # Compute costs for each trainable parameter (dloss/dx)
    loss.backward()

    # Backpropagate & update parameters
    encoder_optimizer.step()
    decoder_optimizer.step()

    return loss.item() / target_length

For a predefined number of iterations, we will train our Neural Network, using the above helper train() method.

In [0]:
def train_iters(encoder, decoder, n_iters, print_every=1000, learning_rate=0.01):
    """
    Calls the train() method for a number of iterations.
    It tracks the time progress while initializing optimizers and cost function.
    In the same time, it creates the sets of the training pairs.
    """

    start = time.time() # Get start time
    print_loss_total = 0 # Reset after each print_every
    
    # Set optimizers
    #encoder_optimizer = optim.Adam(encoder.parameters(), amsgrad = True, lr=learning_rate)
    #decoder_optimizer = optim.Adam(encoder.parameters(), amsgrad = True, lr=learning_rate)
    encoder_optimizer = optim.SGD(encoder.parameters(), lr=learning_rate)
    decoder_optimizer = optim.SGD(decoder.parameters(), lr=learning_rate)

    # Shuffle the training pairs
    training_pairs = [tensors_from_pair(random.choice(pairs)) for i in range(n_iters)]

    # Set the cost function
    criterion = nn.NLLLoss() # Also known as the multiclass cross-entropy 
    
    # For each iteration
    for i in range(1, n_iters + 1):
        training_pair = training_pairs[i - 1] # Create a training pair

        # Extract input and target tensor from the pair
        input_tensor = training_pair[0]
        target_tensor = training_pair[1]

        # Train for each pair
        loss = train(input_tensor, target_tensor, encoder, decoder,
                encoder_optimizer, decoder_optimizer, criterion)

        print_loss_total += loss

        # Print progress
        if i % print_every == 0:
            print_loss_avg = print_loss_total / print_every
            print_loss_total = 0 # Reset
            print('%s (%d %d%%) %.4f' % (time_since(start, i / n_iters),
                             i, i / n_iters * 100, print_loss_avg))

Let's train our Neural Net.

In [0]:
##### TRAIN 
hidden_size = 512 # Change arbitrarily depending on the results

# Instantiate Encoder and Attention Decoder
encoder = EncoderRNN(questions.n_words, hidden_size).to(device)
attention_decoder = AttnDecoderRNN(hidden_size, answers.n_words, dropout_p=0.2).to(device)

# Train for n_iters random samples
# The dataset holds 238051 dialogs while we filter some of them
# Obviously, deep learning computations need really high-performance hardware.
# Let's experiment with a number of iterations; I guess we just need a proof of concept.
n_iters = 70000 # Seems good after many experiments (check report)

In [0]:
"""
    Call training for a number of iterations, while printing every tenth of that

---- COMMENT THE FOLLOWING LINE IF TESTING WITH ALREADY TRAINED MODELS ---
"""

train_iters(encoder, attention_decoder, n_iters, print_every=(n_iters//15))

"""
---- UNCOMMENT THE FOLLOWING LINE IF TESTING WITH ALREADY TRAINED MODELS ---
# Specify path name
encoder_name = 'encoder_serialized.pt'
decoder_name = 'decoder_serialized.pt'

## Load previously trained models
encoder = torch.load(encoder_name)
attention_decoder = torch.load(decoder_name)
"""

5m 33s (- 77m 44s) (4666 6%) 3.2671
10m 56s (- 71m 10s) (9332 13%) 3.0837
16m 21s (- 65m 28s) (13998 19%) 2.9771
21m 40s (- 59m 38s) (18664 26%) 2.9630
27m 8s (- 54m 17s) (23330 33%) 2.8874
32m 33s (- 48m 50s) (27996 39%) 2.7967
38m 0s (- 43m 27s) (32662 46%) 2.8116
43m 30s (- 38m 4s) (37328 53%) 2.8098
48m 58s (- 32m 40s) (41994 59%) 2.7323
54m 26s (- 27m 13s) (46660 66%) 2.7327
59m 46s (- 21m 44s) (51326 73%) 2.6190
65m 11s (- 16m 18s) (55992 79%) 2.6361
70m 38s (- 10m 52s) (60658 86%) 2.6164
76m 7s (- 5m 26s) (65324 93%) 2.6393
81m 34s (- 0m 0s) (69990 99%) 2.6089


"\n---- UNCOMMENT THE FOLLOWING LINE IF TESTING WITH ALREADY TRAINED MODELS ---\n# Specify path name\nencoder_name = 'encoder_serialized.pt'\ndecoder_name = 'decoder_serialized.pt'\n\n## Load previously trained models\nencoder = torch.load(encoder_name)\nattention_decoder = torch.load(decoder_name)\n"

Remember that this time will be exponentially larger when training on a CPU.
Kudos to Google Colab!

## Inference
Of course, there is no standard metric for such applications. 
We could perform BLEU but it's not in our context.

Let's manually test our input sentences by building an inference method.

In [0]:
# Inference helper method
def inference(encoder, decoder, sentence, max_length=MAX_LENGTH):
    """
    Returns the decoded string after doing a forward pass in the seq2seq model.
    """
      
    with torch.no_grad(): # Stop autograd from tracking history on Tensors

        sentence = preprocess_text(sentence) # Preprocess sentence

        input_tensor = tensor_from_sentence(questions, sentence) # One-hot tensor
        input_length = input_tensor.size()[0]

        # Init encoder hidden state
        encoder_hidden = encoder.init_hidden()

        # Init encoder outputs
        encoder_outputs = torch.zeros(max_length, encoder.hidden_size, device=device)

        # Forward pass in the encoder
        for encoder_input in range(input_length):
            encoder_output, encoder_hidden = encoder(input_tensor[encoder_input],
                                                     encoder_hidden)
            encoder_outputs[encoder_input] += encoder_output[0, 0]

        # Start of sentence token
        decoder_input = torch.tensor([[SOS_TOKEN]], device=device)

        # Decoder's initial hidden state is encoder's last hidden state
        decoder_hidden = encoder_hidden

        # Init the results array
        decoded_words = []

        # Forward pass in the decoder
        for d_i in range(max_length):
            decoder_output, decoder_hidden, decoder_attention = decoder(
                    decoder_input, decoder_hidden, encoder_outputs)
            
            _, top_i = decoder_output.data.topk(1) 

            if top_i.item() == EOS_TOKEN: # If EOS is predicted
                break # Break and return the sentence to the user
            else:
                # Append prediction by using index2word
                decoded_words.append(answers.index2word[top_i.item()])

            # Use prediction as input
            decoder_input = top_i.squeeze().detach()

        return ' '.join(decoded_words) # Return the predicted sentence string 

In [0]:

print("Enter /q to quit")
while (1):
  
  user_input = input("User: ")

  user_input = str(user_input)

  if user_input == '/q':
    print("Quitting chat..")
    break;
  else:
    print("Bot: " + str(inference(encoder, attention_decoder, user_input)))


Enter /q to quit
User: hello
Bot: hi ! how can i help you today ?
User: Do you have anything to offer me today? I need some guidance. What to do today?
Bot: i have sent them to your email
User: Instructions or.. what??!
Bot: you can t own properties on the moon .
User: I did not speak about the moon, mate. You did.
Bot: i agree
User: God, nevermind. Can you maybe maybe help me build a really awesome time machine??????
Bot: i am sorry but that is impossible request
User: It's like going to the moon
Bot: i agree
User: Can you order a margarita pizza for me at least?
Bot: i can help with that .
User: Where will you send it?
Bot: i will send you the link .
User: What link?
Bot: it s a scam website .
User: Then, do not order from it!
Bot: i will send you to the contact .
User: No
Bot: ok
User: Thank you, bye byee!
Bot: bye
User: /q
Quitting chat..


## Serialization
If we serialize our data, then we can load (or deserialize) the whole model from a local file. That way, we skip waiting all of the training time.

*Note: In case you want to deserialize and inference the model, you need to run some above cells like the preprocessing methods, etc.*


In [0]:
encoder_name = 'encoder_serialized.pt'
decoder_name = 'decoder_serialized.pt'

# Serialize the encoder/decoder objects in your local directory
print('Saving model...')
torch.save(encoder, encoder_name)
torch.save(attention_decoder, decoder_name)

Saving model...


  "type " + obj.__name__ + ". It won't be checked "
  "type " + obj.__name__ + ". It won't be checked "
