<a href="https://colab.research.google.com/github/rajy4683/EVAP2/blob/master/ENDS13_chatbot_tutorial_2_3_yrj_SOSFix_128.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
%matplotlib inline

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive



Chatbot Tutorial
================
**Author:** `Matthew Inkawhich <https://github.com/MatthewInkawhich>`_



Preparations
------------

To start, Download the data ZIP file
`here <https://www.cs.cornell.edu/~cristian/Cornell_Movie-Dialogs_Corpus.html>`__
and put in a ``data/`` directory under the current directory.

After that, let’s import some necessities.




In [None]:
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals

import torch
from torch.jit import script, trace
import torch.nn as nn
from torch import optim
import torch.nn.functional as F
import csv
import random
import re
import os
import unicodedata
import codecs
from io import open
import itertools
import math


USE_CUDA = torch.cuda.is_available()
device = torch.device("cuda" if USE_CUDA else "cpu")

In [None]:
!wget http://www.cs.cornell.edu/~cristian/data/cornell_movie_dialogs_corpus.zip 

--2021-02-20 18:56:36--  http://www.cs.cornell.edu/~cristian/data/cornell_movie_dialogs_corpus.zip
Resolving www.cs.cornell.edu (www.cs.cornell.edu)... 132.236.207.36
Connecting to www.cs.cornell.edu (www.cs.cornell.edu)|132.236.207.36|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 9916637 (9.5M) [application/zip]
Saving to: ‘cornell_movie_dialogs_corpus.zip’


2021-02-20 18:56:37 (11.9 MB/s) - ‘cornell_movie_dialogs_corpus.zip’ saved [9916637/9916637]



In [None]:
!nvidia-smi

Sat Feb 20 18:56:37 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.39       Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   37C    P8     9W /  70W |      3MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [None]:
!unzip -q /content/cornell_movie_dialogs_corpus.zip

Load & Preprocess Data
----------------------

The next step is to reformat our data file and load the data into
structures that we can work with.

The `Cornell Movie-Dialogs
Corpus <https://www.cs.cornell.edu/~cristian/Cornell_Movie-Dialogs_Corpus.html>`__
is a rich dataset of movie character dialog:

-  220,579 conversational exchanges between 10,292 pairs of movie
   characters
-  9,035 characters from 617 movies
-  304,713 total utterances

This dataset is large and diverse, and there is a great variation of
language formality, time periods, sentiment, etc. Our hope is that this
diversity makes our model robust to many forms of inputs and queries.

First, we’ll take a look at some lines of our datafile to see the
original format.




In [None]:
corpus_name = "cornell movie-dialogs corpus"
corpus = os.path.join("/content/", corpus_name)

def printLines(file, n=10):
    with open(file, 'rb') as datafile:
        lines = datafile.readlines()
    for line in lines[:n]:
        print(line)

printLines(os.path.join(corpus, "movie_lines.txt"))

b'L1045 +++$+++ u0 +++$+++ m0 +++$+++ BIANCA +++$+++ They do not!\n'
b'L1044 +++$+++ u2 +++$+++ m0 +++$+++ CAMERON +++$+++ They do to!\n'
b'L985 +++$+++ u0 +++$+++ m0 +++$+++ BIANCA +++$+++ I hope so.\n'
b'L984 +++$+++ u2 +++$+++ m0 +++$+++ CAMERON +++$+++ She okay?\n'
b"L925 +++$+++ u0 +++$+++ m0 +++$+++ BIANCA +++$+++ Let's go.\n"
b'L924 +++$+++ u2 +++$+++ m0 +++$+++ CAMERON +++$+++ Wow\n'
b"L872 +++$+++ u0 +++$+++ m0 +++$+++ BIANCA +++$+++ Okay -- you're gonna need to learn how to lie.\n"
b'L871 +++$+++ u2 +++$+++ m0 +++$+++ CAMERON +++$+++ No\n'
b'L870 +++$+++ u0 +++$+++ m0 +++$+++ BIANCA +++$+++ I\'m kidding.  You know how sometimes you just become this "persona"?  And you don\'t know how to quit?\n'
b'L869 +++$+++ u0 +++$+++ m0 +++$+++ BIANCA +++$+++ Like my fear of wearing pastels?\n'


Create formatted data file
~~~~~~~~~~~~~~~~~~~~~~~~~~

For convenience, we'll create a nicely formatted data file in which each line
contains a tab-separated *query sentence* and a *response sentence* pair.

The following functions facilitate the parsing of the raw
*movie_lines.txt* data file.

-  ``loadLines`` splits each line of the file into a dictionary of
   fields (lineID, characterID, movieID, character, text)
-  ``loadConversations`` groups fields of lines from ``loadLines`` into
   conversations based on *movie_conversations.txt*
-  ``extractSentencePairs`` extracts pairs of sentences from
   conversations




In [None]:
# Splits each line of the file into a dictionary of fields
def loadLines(fileName, fields):
    lines = {}
    with open(fileName, 'r', encoding='iso-8859-1') as f:
        for line in f:
            values = line.split(" +++$+++ ")
            # Extract fields
            lineObj = {}
            for i, field in enumerate(fields):
                lineObj[field] = values[i]
            lines[lineObj['lineID']] = lineObj
    return lines


# Groups fields of lines from `loadLines` into conversations based on *movie_conversations.txt*
def loadConversations(fileName, lines, fields):
    conversations = []
    with open(fileName, 'r', encoding='iso-8859-1') as f:
        for line in f:
            values = line.split(" +++$+++ ")
            # Extract fields
            convObj = {}
            for i, field in enumerate(fields):
                convObj[field] = values[i]
            # Convert string to list (convObj["utteranceIDs"] == "['L598485', 'L598486', ...]")
            utterance_id_pattern = re.compile('L[0-9]+')
            lineIds = utterance_id_pattern.findall(convObj["utteranceIDs"])
            # Reassemble lines
            convObj["lines"] = []
            for lineId in lineIds:
                convObj["lines"].append(lines[lineId])
            conversations.append(convObj)
    return conversations


# Extracts pairs of sentences from conversations
def extractSentencePairs(conversations):
    qa_pairs = []
    for conversation in conversations:
        # Iterate over all the lines of the conversation
        for i in range(len(conversation["lines"]) - 1):  # We ignore the last line (no answer for it)
            inputLine = conversation["lines"][i]["text"].strip()
            targetLine = conversation["lines"][i+1]["text"].strip()
            # Filter wrong samples (if one of the lists is empty)
            if inputLine and targetLine:
                qa_pairs.append([inputLine, targetLine])
    return qa_pairs

Now we’ll call these functions and create the file. We’ll call it
*formatted_movie_lines.txt*.




In [None]:
# Define path to new file
datafile = os.path.join(corpus, "formatted_movie_lines.txt")

delimiter = '\t'
# Unescape the delimiter
delimiter = str(codecs.decode(delimiter, "unicode_escape"))

# Initialize lines dict, conversations list, and field ids
lines = {}
conversations = []
MOVIE_LINES_FIELDS = ["lineID", "characterID", "movieID", "character", "text"]
MOVIE_CONVERSATIONS_FIELDS = ["character1ID", "character2ID", "movieID", "utteranceIDs"]

# Load lines and process conversations
print("\nProcessing corpus...")
lines = loadLines(os.path.join(corpus, "movie_lines.txt"), MOVIE_LINES_FIELDS)
print("\nLoading conversations...")
conversations = loadConversations(os.path.join(corpus, "movie_conversations.txt"),
                                  lines, MOVIE_CONVERSATIONS_FIELDS)

# Write new csv file
print("\nWriting newly formatted file...")
with open(datafile, 'w', encoding='utf-8') as outputfile:
    writer = csv.writer(outputfile, delimiter=delimiter, lineterminator='\n')
    for pair in extractSentencePairs(conversations):
        writer.writerow(pair)

# Print a sample of lines
print("\nSample lines from file:")
printLines(datafile)


Processing corpus...

Loading conversations...

Writing newly formatted file...

Sample lines from file:
b"Can we make this quick?  Roxanne Korrine and Andrew Barrett are having an incredibly horrendous public break- up on the quad.  Again.\tWell, I thought we'd start with pronunciation, if that's okay with you.\n"
b"Well, I thought we'd start with pronunciation, if that's okay with you.\tNot the hacking and gagging and spitting part.  Please.\n"
b"Not the hacking and gagging and spitting part.  Please.\tOkay... then how 'bout we try out some French cuisine.  Saturday?  Night?\n"
b"You're asking me out.  That's so cute. What's your name again?\tForget it.\n"
b"No, no, it's my fault -- we didn't have a proper introduction ---\tCameron.\n"
b"Cameron.\tThe thing is, Cameron -- I'm at the mercy of a particularly hideous breed of loser.  My sister.  I can't date until she does.\n"
b"The thing is, Cameron -- I'm at the mercy of a particularly hideous breed of loser.  My sister.  I can't dat

Load and trim data
~~~~~~~~~~~~~~~~~~

Our next order of business is to create a vocabulary and load
query/response sentence pairs into memory.

Note that we are dealing with sequences of **words**, which do not have
an implicit mapping to a discrete numerical space. Thus, we must create
one by mapping each unique word that we encounter in our dataset to an
index value.

For this we define a ``Voc`` class, which keeps a mapping from words to
indexes, a reverse mapping of indexes to words, a count of each word and
a total word count. The class provides methods for adding a word to the
vocabulary (``addWord``), adding all words in a sentence
(``addSentence``) and trimming infrequently seen words (``trim``). More
on trimming later.




In [None]:
# Default word tokens
PAD_token = 0  # Used for padding short sentences
SOS_token = 1  # Start-of-sentence token
EOS_token = 2  # End-of-sentence token

class Voc:
    def __init__(self, name):
        self.name = name
        self.trimmed = False
        self.word2index = {}
        self.word2count = {}
        self.index2word = {PAD_token: "PAD", SOS_token: "SOS", EOS_token: "EOS"}
        self.num_words = 3  # Count SOS, EOS, PAD

    def addSentence(self, sentence):
        for word in sentence.split(' '):
            self.addWord(word)

    def addWord(self, word):
        if word not in self.word2index:
            self.word2index[word] = self.num_words
            self.word2count[word] = 1
            self.index2word[self.num_words] = word
            self.num_words += 1
        else:
            self.word2count[word] += 1

    # Remove words below a certain count threshold
    def trim(self, min_count):
        if self.trimmed:
            return
        self.trimmed = True

        keep_words = []

        for k, v in self.word2count.items():
            if v >= min_count:
                keep_words.append(k)

        print('keep_words {} / {} = {:.4f}'.format(
            len(keep_words), len(self.word2index), len(keep_words) / len(self.word2index)
        ))

        # Reinitialize dictionaries
        self.word2index = {}
        self.word2count = {}
        self.index2word = {PAD_token: "PAD", SOS_token: "SOS", EOS_token: "EOS"}
        self.num_words = 3 # Count default tokens

        for word in keep_words:
            self.addWord(word)

Now we can assemble our vocabulary and query/response sentence pairs.
Before we are ready to use this data, we must perform some
preprocessing.

First, we must convert the Unicode strings to ASCII using
``unicodeToAscii``. Next, we should convert all letters to lowercase and
trim all non-letter characters except for basic punctuation
(``normalizeString``). Finally, to aid in training convergence, we will
filter out sentences with length greater than the ``MAX_LENGTH``
threshold (``filterPairs``).




In [None]:
MAX_LENGTH = 10  # Maximum sentence length to consider

# Turn a Unicode string to plain ASCII, thanks to
# https://stackoverflow.com/a/518232/2809427
def unicodeToAscii(s):
    return ''.join(
        c for c in unicodedata.normalize('NFD', s)
        if unicodedata.category(c) != 'Mn'
    )

# Lowercase, trim, and remove non-letter characters
def normalizeString(s):
    s = unicodeToAscii(s.lower().strip())
    s = re.sub(r"([.!?])", r" \1", s)
    s = re.sub(r"[^a-zA-Z.!?]+", r" ", s)
    s = re.sub(r"\s+", r" ", s).strip()
    return s

# Read query/response pairs and return a voc object
def readVocs(datafile, corpus_name):
    print("Reading lines...")
    # Read the file and split into lines
    lines = open(datafile, encoding='utf-8').\
        read().strip().split('\n')
    # Split every line into pairs and normalize
    pairs = [[normalizeString(s) for s in l.split('\t')] for l in lines]
    voc = Voc(corpus_name)
    return voc, pairs

# Returns True iff both sentences in a pair 'p' are under the MAX_LENGTH threshold
def filterPair(p):
    # Input sequences need to preserve the last word for EOS token
    return len(p[0].split(' ')) < MAX_LENGTH and len(p[1].split(' ')) < MAX_LENGTH

# Filter pairs using filterPair condition
def filterPairs(pairs):
    return [pair for pair in pairs if filterPair(pair)]

# Using the functions defined above, return a populated voc object and pairs list
def loadPrepareData(corpus, corpus_name, datafile, save_dir):
    print("Start preparing training data ...")
    voc, pairs = readVocs(datafile, corpus_name)
    print("Read {!s} sentence pairs".format(len(pairs)))
    pairs = filterPairs(pairs)
    print("Trimmed to {!s} sentence pairs".format(len(pairs)))
    print("Counting words...")
    for pair in pairs:
        voc.addSentence(pair[0])
        voc.addSentence(pair[1])
    print("Counted words:", voc.num_words)
    return voc, pairs


# Load/Assemble voc and pairs
save_dir = os.path.join("data", "save")
voc, pairs = loadPrepareData(corpus, corpus_name, datafile, save_dir)
# Print some pairs to validate
print("\npairs:")
for pair in pairs[:10]:
    print(pair)

Start preparing training data ...
Reading lines...
Read 221282 sentence pairs
Trimmed to 64271 sentence pairs
Counting words...
Counted words: 18008

pairs:
['there .', 'where ?']
['you have my word . as a gentleman', 'you re sweet .']
['hi .', 'looks like things worked out tonight huh ?']
['you know chastity ?', 'i believe we share an art instructor']
['have fun tonight ?', 'tons']
['well no . . .', 'then that s all you had to say .']
['then that s all you had to say .', 'but']
['but', 'you always been this selfish ?']
['do you listen to this crap ?', 'what crap ?']
['what good stuff ?', 'the real you .']


In [None]:
len(pairs)

64271

Another tactic that is beneficial to achieving faster convergence during
training is trimming rarely used words out of our vocabulary. Decreasing
the feature space will also soften the difficulty of the function that
the model must learn to approximate. We will do this as a two-step
process:

1) Trim words used under ``MIN_COUNT`` threshold using the ``voc.trim``
   function.

2) Filter out pairs with trimmed words.




In [None]:
MIN_COUNT = 3    # Minimum word count threshold for trimming

def trimRareWords(voc, pairs, MIN_COUNT):
    # Trim words used under the MIN_COUNT from the voc
    voc.trim(MIN_COUNT)
    # Filter out pairs with trimmed words
    keep_pairs = []
    for pair in pairs:
        input_sentence = pair[0]
        output_sentence = pair[1]
        keep_input = True
        keep_output = True
        # Check input sentence
        for word in input_sentence.split(' '):
            if word not in voc.word2index:
                keep_input = False
                break
        # Check output sentence
        for word in output_sentence.split(' '):
            if word not in voc.word2index:
                keep_output = False
                break

        # Only keep pairs that do not contain trimmed word(s) in their input or output sentence
        if keep_input and keep_output:
            keep_pairs.append(pair)

    print("Trimmed from {} pairs to {}, {:.4f} of total".format(len(pairs), len(keep_pairs), len(keep_pairs) / len(pairs)))
    return keep_pairs


# Trim voc and pairs
pairs = trimRareWords(voc, pairs, MIN_COUNT)

keep_words 7823 / 18005 = 0.4345
Trimmed from 64271 pairs to 53165, 0.8272 of total


In [None]:
def indexesFromSentence(voc, sentence):
    return [voc.word2index[word] for word in sentence.split(' ')] + [EOS_token]

def indexesFromSentenceOp(voc, sentence):
    return [SOS_token]+[voc.word2index[word] for word in sentence.split(' ')] + [EOS_token]


def zeroPadding(l, fillvalue=PAD_token):
    return list(itertools.zip_longest(*l, fillvalue=fillvalue))

def binaryMatrix(l, value=PAD_token):
    m = []
    for i, seq in enumerate(l):
        m.append([])
        for token in seq:
            if token == PAD_token:
                m[i].append(0)
            else:
                m[i].append(1)
    return m

# Returns padded input sequence tensor and lengths
def inputVar(l, voc):
    indexes_batch = [indexesFromSentence(voc, sentence) for sentence in l]
    lengths = torch.tensor([len(indexes) for indexes in indexes_batch])
    padList = zeroPadding(indexes_batch)
    padVar = torch.LongTensor(padList)
    return padVar, lengths

# Returns padded target sequence tensor, padding mask, and max target length
def outputVar(l, voc):
    indexes_batch = [indexesFromSentenceOp(voc, sentence) for sentence in l]
    #print(indexes_batch)
    #indexes_batch = [SOS_token]
    #indexes_batch.append([indexesFromSentence(voc, sentence) for sentence in l])
    max_target_len = max([len(indexes) for indexes in indexes_batch])
    padList = zeroPadding(indexes_batch)
    mask = binaryMatrix(padList)
    mask = torch.BoolTensor(mask)
    padVar = torch.LongTensor(padList)
    return padVar, mask, max_target_len

# Returns all items for a given batch of pairs
def batch2TrainData(voc, pair_batch):
    pair_batch.sort(key=lambda x: len(x[0].split(" ")), reverse=True)
    input_batch, output_batch = [], []
    for pair in pair_batch:
        input_batch.append(pair[0])
        output_batch.append(pair[1])
    inp, lengths = inputVar(input_batch, voc)
    output, mask, max_target_len = outputVar(output_batch, voc)
    return inp, lengths, output, mask, max_target_len


# Example for validation
small_batch_size = 5
batches = batch2TrainData(voc, [random.choice(pairs) for _ in range(small_batch_size)])
input_variable, lengths, target_variable, mask, max_target_len = batches

print("input_variable:", input_variable)
print("lengths:", lengths)
print("target_variable:", target_variable)
print("mask:", mask)
print("max_target_len:", max_target_len)

input_variable: tensor([[   7,   25,  170, 2639,  345],
        [ 212,  118,  706,    4,   66],
        [  40,   40,  115,    4,    2],
        [  53,   60, 7253,    4,    0],
        [ 532,  169,    6,    2,    0],
        [ 533,    7,    2,    0,    0],
        [   6,   66,    0,    0,    0],
        [   2,    2,    0,    0,    0]])
lengths: tensor([8, 8, 6, 5, 3])
target_variable: tensor([[   1,    1,    1,    1,    1],
        [   7, 1497, 7254, 2639,  334],
        [ 534,   66,    4, 5404, 2530],
        [ 208, 1500,    2,    4,   24],
        [ 488,   66,    0,   36,    6],
        [ 535,    2,    0,  601,    2],
        [   6,    0,    0,   37,    0],
        [   2,    0,    0,  187,    0],
        [   0,    0,    0,    4,    0],
        [   0,    0,    0,    2,    0]])
mask: tensor([[ True,  True,  True,  True,  True],
        [ True,  True,  True,  True,  True],
        [ True,  True,  True,  True,  True],
        [ True,  True,  True,  True,  True],
        [ True,  True, Fal

In [None]:
[SOS_token]

[1]

### Transformer Encoder

In [None]:
class TransEncoder(nn.Module):
    def __init__(self, 
                 input_dim, 
                 hid_dim, 
                 n_layers, 
                 n_heads, 
                 pf_dim,
                 dropout, 
                 device,
                 max_length = 100):
        super().__init__()

        self.device = device
        
        self.tok_embedding = nn.Embedding(input_dim, hid_dim)
        self.pos_embedding = nn.Embedding(max_length, hid_dim)
        
        self.layers = nn.ModuleList([TransEncoderLayer(hid_dim, 
                                                  n_heads, 
                                                  pf_dim,
                                                  dropout, 
                                                  device) 
                                     for _ in range(n_layers)])
        
        self.dropout = nn.Dropout(dropout)
        
        self.scale = torch.sqrt(torch.FloatTensor([hid_dim])).to(device)
        
    def forward(self, src, src_mask):
        
        #src = [batch size, src len]
        #src_mask = [batch size, 1, 1, src len]
        
        batch_size = src.shape[0]
        src_len = src.shape[1]
        
        pos = torch.arange(0, src_len).unsqueeze(0).repeat(batch_size, 1).to(self.device)
        
        #pos = [batch size, src len]
        
        src = self.dropout((self.tok_embedding(src) * self.scale) + self.pos_embedding(pos))
        
        #src = [batch size, src len, hid dim]
        
        for layer in self.layers:
            src = layer(src, src_mask)
            
        #src = [batch size, src len, hid dim]
            
        return src

In [None]:
class TransEncoderLayer(nn.Module):
    def __init__(self, 
                 hid_dim, 
                 n_heads, 
                 pf_dim,  
                 dropout, 
                 device):
        super().__init__()
        
        self.self_attn_layer_norm = nn.LayerNorm(hid_dim)
        self.ff_layer_norm = nn.LayerNorm(hid_dim)
        self.self_attention = MultiHeadAttentionLayer(hid_dim, n_heads, dropout, device)
        self.positionwise_feedforward = PositionwiseFeedforwardLayer(hid_dim, 
                                                                     pf_dim, 
                                                                     dropout)
        self.dropout = nn.Dropout(dropout)
        
    def forward(self, src, src_mask):
        
        #src = [batch size, src len, hid dim]
        #src_mask = [batch size, 1, 1, src len] 
                
        #self attention
        _src, _ = self.self_attention(src, src, src, src_mask)
        
        #dropout, residual connection and layer norm
        src = self.self_attn_layer_norm(src + self.dropout(_src))
        
        #src = [batch size, src len, hid dim]
        
        #positionwise feedforward
        _src = self.positionwise_feedforward(src)
        
        #dropout, residual and layer norm
        src = self.ff_layer_norm(src + self.dropout(_src))
        
        #src = [batch size, src len, hid dim]
        
        return src

### Multihead Attention Layer

In [None]:
class MultiHeadAttentionLayer(nn.Module):
    def __init__(self, hid_dim, n_heads, dropout, device):
        super().__init__()
        
        assert hid_dim % n_heads == 0
        
        self.hid_dim = hid_dim
        self.n_heads = n_heads
        self.head_dim = hid_dim // n_heads
        
        self.fc_q = nn.Linear(hid_dim, hid_dim)
        self.fc_k = nn.Linear(hid_dim, hid_dim)
        self.fc_v = nn.Linear(hid_dim, hid_dim)
        
        self.fc_o = nn.Linear(hid_dim, hid_dim)
        
        self.dropout = nn.Dropout(dropout)
        
        self.scale = torch.sqrt(torch.FloatTensor([self.head_dim])).to(device)
        
    def forward(self, query, key, value, mask = None):
        
        batch_size = query.shape[0]
        
        #query = [batch size, query len, hid dim]
        #key = [batch size, key len, hid dim]
        #value = [batch size, value len, hid dim]
                
        Q = self.fc_q(query)
        K = self.fc_k(key)
        V = self.fc_v(value)
        
        #Q = [batch size, query len, hid dim]
        #K = [batch size, key len, hid dim]
        #V = [batch size, value len, hid dim]
                
        Q = Q.view(batch_size, -1, self.n_heads, self.head_dim).permute(0, 2, 1, 3)
        K = K.view(batch_size, -1, self.n_heads, self.head_dim).permute(0, 2, 1, 3)
        V = V.view(batch_size, -1, self.n_heads, self.head_dim).permute(0, 2, 1, 3)
        
        #Q = [batch size, n heads, query len, head dim]
        #K = [batch size, n heads, key len, head dim]
        #V = [batch size, n heads, value len, head dim]
                
        energy = torch.matmul(Q, K.permute(0, 1, 3, 2)) / self.scale
        
        #energy = [batch size, n heads, query len, key len]
        
        if mask is not None:
            energy = energy.masked_fill(mask == 0, -1e10)
        
        attention = torch.softmax(energy, dim = -1)
                
        #attention = [batch size, n heads, query len, key len]
                
        x = torch.matmul(self.dropout(attention), V)
        
        #x = [batch size, n heads, query len, head dim]
        
        x = x.permute(0, 2, 1, 3).contiguous()
        
        #x = [batch size, query len, n heads, head dim]
        
        x = x.view(batch_size, -1, self.hid_dim)
        
        #x = [batch size, query len, hid dim]
        
        x = self.fc_o(x)
        
        #x = [batch size, query len, hid dim]
        
        return x, attention

### Position-wise Feedforward Network

In [None]:
class PositionwiseFeedforwardLayer(nn.Module):
    def __init__(self, hid_dim, pf_dim, dropout):
        super().__init__()
        
        self.fc_1 = nn.Linear(hid_dim, pf_dim)
        self.fc_2 = nn.Linear(pf_dim, hid_dim)
        
        self.dropout = nn.Dropout(dropout)
        
    def forward(self, x):
        
        #x = [batch size, seq len, hid dim]
        
        x = self.dropout(torch.relu(self.fc_1(x)))
        
        #x = [batch size, seq len, pf dim]
        
        x = self.fc_2(x)
        
        #x = [batch size, seq len, hid dim]
        
        return x

### Transformer Decoder

We add softmax to the final output layer to get softmax-probabilities instead of logits. The loss function defined later used `torch.log`.

In [None]:
class TransDecoder(nn.Module):
    def __init__(self, 
                 output_dim, 
                 hid_dim, 
                 n_layers, 
                 n_heads, 
                 pf_dim, 
                 dropout, 
                 device,
                 max_length = 100):
        super().__init__()
        
        self.device = device
        
        self.tok_embedding = nn.Embedding(output_dim, hid_dim)
        self.pos_embedding = nn.Embedding(max_length, hid_dim)
        
        self.layers = nn.ModuleList([TransDecoderLayer(hid_dim, 
                                                  n_heads, 
                                                  pf_dim, 
                                                  dropout, 
                                                  device)
                                     for _ in range(n_layers)])
        
        self.fc_out = nn.Linear(hid_dim, output_dim)
        
        self.dropout = nn.Dropout(dropout)
        
        self.scale = torch.sqrt(torch.FloatTensor([hid_dim])).to(device)
        
    def forward(self, trg, enc_src, trg_mask, src_mask):
        
        #trg = [batch size, trg len]
        #enc_src = [batch size, src len, hid dim]
        #trg_mask = [batch size, 1, trg len, trg len]
        #src_mask = [batch size, 1, 1, src len]
                
        batch_size = trg.shape[0]
        trg_len = trg.shape[1]
        
        pos = torch.arange(0, trg_len).unsqueeze(0).repeat(batch_size, 1).to(self.device)
                            
        #pos = [batch size, trg len]
            
        trg = self.dropout((self.tok_embedding(trg) * self.scale) + self.pos_embedding(pos))
                
        #trg = [batch size, trg len, hid dim]
        
        for layer in self.layers:
            trg, attention = layer(trg, enc_src, trg_mask, src_mask)
        
        #trg = [batch size, trg len, hid dim]
        #attention = [batch size, n heads, trg len, src len]
        
        output = self.fc_out(trg)
        output =F.softmax(output, dim=2)  
        
        #output = [batch size, trg len, output dim]
            
        return output, attention

In [None]:
predict = torch.FloatTensor([[0, 0.2, 0.7, 0.1, 0],
                             [0, 0.2, 0.7, 0.1, 0], 
                             [0, 0.2, 0.7, 0.1, 0]]).unsqueeze(1).unsqueeze(2)
print(predict.shape)
predict[1].shape

torch.Size([3, 1, 1, 5])


torch.Size([1, 1, 5])

In [None]:
class TransDecoderLayer(nn.Module):
    def __init__(self, 
                 hid_dim, 
                 n_heads, 
                 pf_dim, 
                 dropout, 
                 device):
        super().__init__()
        
        self.self_attn_layer_norm = nn.LayerNorm(hid_dim)
        self.enc_attn_layer_norm = nn.LayerNorm(hid_dim)
        self.ff_layer_norm = nn.LayerNorm(hid_dim)
        self.self_attention = MultiHeadAttentionLayer(hid_dim, n_heads, dropout, device)
        self.encoder_attention = MultiHeadAttentionLayer(hid_dim, n_heads, dropout, device)
        self.positionwise_feedforward = PositionwiseFeedforwardLayer(hid_dim, 
                                                                     pf_dim, 
                                                                     dropout)
        self.dropout = nn.Dropout(dropout)
        
    def forward(self, trg, enc_src, trg_mask, src_mask):
        
        #trg = [batch size, trg len, hid dim]
        #enc_src = [batch size, src len, hid dim]
        #trg_mask = [batch size, 1, trg len, trg len]
        #src_mask = [batch size, 1, 1, src len]
        
        #self attention
        _trg, _ = self.self_attention(trg, trg, trg, trg_mask)
        
        #dropout, residual connection and layer norm
        trg = self.self_attn_layer_norm(trg + self.dropout(_trg))
            
        #trg = [batch size, trg len, hid dim]
            
        #encoder attention
        _trg, attention = self.encoder_attention(trg, enc_src, enc_src, src_mask)
        # query, key, value
        
        #dropout, residual connection and layer norm
        trg = self.enc_attn_layer_norm(trg + self.dropout(_trg))
                    
        #trg = [batch size, trg len, hid dim]
        
        #positionwise feedforward
        _trg = self.positionwise_feedforward(trg)
        
        #dropout, residual and layer norm
        trg = self.ff_layer_norm(trg + self.dropout(_trg))
        
        #trg = [batch size, trg len, hid dim]
        #attention = [batch size, n heads, trg len, src len]
        
        return trg, attention

### Final Sequence to Sequence Model

In [None]:
class TransSeq2Seq(nn.Module):
    def __init__(self, 
                 encoder, 
                 decoder, 
                 src_pad_idx, 
                 trg_pad_idx, 
                 device):
        super().__init__()
        
        self.encoder = encoder
        self.decoder = decoder
        self.src_pad_idx = src_pad_idx
        self.trg_pad_idx = trg_pad_idx
        self.device = device
        
    def make_src_mask(self, src):
        
        #src = [batch size, src len]
        
        src_mask = (src != self.src_pad_idx).unsqueeze(1).unsqueeze(2)

        #src_mask = [batch size, 1, 1, src len]

        return src_mask
    
    def make_trg_mask(self, trg):
        
        #trg = [batch size, trg len]
        trg_pad_mask = (trg != self.trg_pad_idx).unsqueeze(1).unsqueeze(2) 
        """
            A boolean tensor of shape [batch size, 1, 1, trg len]
        """
        
        
        #trg_pad_mask = [batch size, 1, 1, trg len]
        
        trg_len = trg.shape[1]
        
        trg_sub_mask = torch.tril(torch.ones((trg_len, trg_len), device = self.device)).bool()
        
        #trg_sub_mask = [trg len, trg len]
            
        trg_mask = trg_pad_mask & trg_sub_mask
        
        #trg_mask = [batch size, 1, trg len, trg len]
        
        return trg_mask

    def forward(self, src, trg):
        
        #src = [batch size, src len]
        #trg = [batch size, trg len]
                
        src_mask = self.make_src_mask(src)
        trg_mask = self.make_trg_mask(trg)
        
        #src_mask = [batch size, 1, 1, src len]
        #trg_mask = [batch size, 1, trg len, trg len]
        
        enc_src = self.encoder(src, src_mask)
        
        #enc_src = [batch size, src len, hid dim]
                
        output, attention = self.decoder(trg, enc_src, trg_mask, src_mask)
        
        #output = [batch size, trg len, output dim]
        #attention = [batch size, n heads, trg len, src len]
        
        return output, attention

In [None]:
# INPUT_DIM = len(SRC.vocab)
# OUTPUT_DIM = len(TRG.vocab)
INPUT_DIM = voc.num_words
OUTPUT_DIM = voc.num_words
HID_DIM = 128
ENC_LAYERS = 3
DEC_LAYERS = 3
ENC_HEADS = 8
DEC_HEADS = 8
ENC_PF_DIM = 512
DEC_PF_DIM = 512
ENC_DROPOUT = 0.2
DEC_DROPOUT = 0.2

enc = TransEncoder(INPUT_DIM, 
              HID_DIM, 
              ENC_LAYERS, 
              ENC_HEADS, 
              ENC_PF_DIM, 
              ENC_DROPOUT, 
              device)

dec = TransDecoder(OUTPUT_DIM, 
              HID_DIM, 
              DEC_LAYERS, 
              DEC_HEADS, 
              DEC_PF_DIM, 
              DEC_DROPOUT, 
              device)

In [None]:
SRC_PAD_IDX = PAD_token #SRC.vocab.stoi[SRC.pad_token]
TRG_PAD_IDX = PAD_token #TRG.vocab.stoi[TRG.pad_token]

model = TransSeq2Seq(enc, dec, SRC_PAD_IDX, TRG_PAD_IDX, device).to(device)

In [None]:
def initialize_weights(m):
    if hasattr(m, 'weight') and m.weight.dim() > 1:
        nn.init.xavier_uniform_(m.weight.data)

In [None]:
def count_parameters(model):
    return sum(p.numel() for p in model.parameters() if p.requires_grad)

print(f'The model has {count_parameters(model):,} trainable parameters')

The model has 4,427,154 trainable parameters


In [None]:
model.apply(initialize_weights);
LEARNING_RATE = 0.0005

optimizer = torch.optim.Adam(model.parameters(), lr = LEARNING_RATE)


Define Training Procedure
-------------------------

Masked loss
~~~~~~~~~~~

Since we are dealing with batches of padded sequences, we cannot simply
consider all elements of the tensor when calculating loss. We define
``maskNLLLoss`` to calculate our loss based on our decoder’s output
tensor, the target tensor, and a binary mask tensor describing the
padding of the target tensor. This loss function calculates the average
negative log likelihood of the elements that correspond to a *1* in the
mask tensor.




In [None]:
def maskNLLLoss(inp, target, mask):
    nTotal = mask.sum()
    crossEntropy = -torch.log(torch.gather(inp, 1, target.view(-1, 1)).squeeze(1))
    loss = crossEntropy.masked_select(mask).mean()
    loss = loss.to(device)
    return loss, nTotal.item()

In [None]:
def train_transformer(input_variable, 
                      lengths,
                      target_variable, 
                      mask, 
                      max_target_len, 
                      model, 
                      optimizer, 
                      batch_size,                       
                      clip=1., 
                      max_length=MAX_LENGTH,
                      ):

    # Zero gradients
    optimizer.zero_grad()
    #decoder_optimizer.zero_grad()
    #model.train()

    # Set device options
    input_variable = input_variable.permute(1,0).to(device)
    target_variable = target_variable.permute(1,0).to(device)
    mask = mask.permute(1,0).to(device)
    batch_size = input_variable.shape[0]
    # Lengths for rnn packing should always be on the cpu
    lengths = lengths.to("cpu")

    # Initialize variables
    loss = 0
    print_losses = []
    n_totals = 0

    # output, _ = model(input_variable, target_variable[:,:-1])
    
    # for i in range(max_target_len-1):
    #     mask_loss, nTotal = maskNLLLoss(output[i], target_variable[i,:-1], mask[i, :-1])
    output, _ = model(input_variable, target_variable[:,:-1])
    #output = F.softmax(output, dim=2)
    
    # for i in range(max_target_len):
    #     mask_loss, nTotal = maskNLLLoss(output[i], target_variable[i,:], mask[i, :])

    #     loss += mask_loss
    #     n_totals += nTotal
    #     print_losses.append(mask_loss.item() * nTotal)
    #     n_totals += nTotal
    output_dim = output.shape[-1]
        
    output = output.contiguous().view(-1, output_dim)




    target_variable = target_variable[:,1:].contiguous().view(-1)
    #print(output.shape, target_variable.shape, mask[:,1:].shape)
    #loss = criterion(output, target_variable)
    loss, nTotal = maskNLLLoss2(output, target_variable, mask[:,1:].reshape(-1))
    # for i in range(batch_size):
    #     mask_loss, nTotal = maskNLLLoss2(output[i,:,:], 
    #                                  target_variable[i,1:], 
    #                                  mask[i,1:])
        
    #     loss += mask_loss
    #     print_losses.append(mask_loss.item() * nTotal)
    #     n_totals += nTotal
    #print(loss.item(), n_totals)
    loss.backward()

    # Clip gradients: gradients are modified in place
    #_ = nn.utils.clip_grad_norm_(model.parameters(), clip)
    #_ = nn.utils.clip_grad_norm_(decoder.parameters(), clip)
    torch.nn.utils.clip_grad_norm_(model.parameters(), clip)

    # Adjust model weights
    optimizer.step()
    #epoch_loss += loss.item()
    #return sum(print_losses) / n_totals
    return loss.item()#sum(print_losses)

In [None]:
def eval_transformer(input_variable, 
                     lengths, 
                     target_variable, 
                     mask, 
                     max_target_len, model, 
                     optimizer, 
                     batch_size,                       
                     clip=1., 
                     max_length=MAX_LENGTH,
                      ):

    # Zero gradients
    #optimizer.zero_grad()
    #decoder_optimizer.zero_grad()
    model.eval()

    # Set device options
    input_variable = input_variable.permute(1,0).to(device)
    target_variable = target_variable.permute(1,0).to(device)
    mask = mask.permute(1,0).to(device)
    batch_size = input_variable.shape[0]
    # Lengths for rnn packing should always be on the cpu
    lengths = lengths.to("cpu")

    # Initialize variables
    loss = 0
    print_losses = []
    n_totals = 0

    # output, _ = model(input_variable, target_variable[:,:-1])
    
    # for i in range(max_target_len-1):
    #     mask_loss, nTotal = maskNLLLoss(output[i], target_variable[i,:-1], mask[i, :-1])
    with torch.no_grad():
        output, _ = model(input_variable, target_variable[:,:-1])
        #output = F.softmax(output, dim=2)
        
        # for i in range(max_target_len):
        #     mask_loss, nTotal = maskNLLLoss(output[i], target_variable[i,:], mask[i, :])

        #     loss += mask_loss
        #     n_totals += nTotal
        #     print_losses.append(mask_loss.item() * nTotal)
        #     n_totals += nTotal
        output_dim = output.shape[-1]
            
        output = output.contiguous().view(-1, output_dim)




        target_variable = target_variable[:,1:].contiguous().view(-1)
        #print(output.shape, target_variable.shape, mask[:,1:].shape)
        #loss = criterion(output, target_variable)
        loss, nTotal = maskNLLLoss2(output, target_variable, mask[:,1:].reshape(-1))
        # for i in range(batch_size):
        #     mask_loss, nTotal = maskNLLLoss2(output[i,:,:], 
        #                                 target_variable[i,1:], 
        #                                 mask[i,1:])
            
        #     loss += mask_loss
        #     print_losses.append(mask_loss.item() * nTotal)
        #     n_totals += nTotal
    #print(loss.item(), n_totals)
    #loss.backward()

    # Clip gradients: gradients are modified in place
    #_ = nn.utils.clip_grad_norm_(model.parameters(), clip)
    #_ = nn.utils.clip_grad_norm_(decoder.parameters(), clip)
    #torch.nn.utils.clip_grad_norm_(model.parameters(), clip)

    # Adjust model weights
    #optimizer.step()
    #epoch_loss += loss.item()
    #return sum(print_losses) / n_totals
    return loss.item() #sum(print_losses)

In [None]:
'''
    Sample model output
'''

training_batches = [batch2TrainData(voc, [random.choice(pairs) for _ in range(64)])
                      for _ in range(200)]
input_variable, lengths, target_variable, mask, max_target_len = training_batches[10]
print(input_variable.shape,lengths.shape, target_variable.shape, mask.shape, max_target_len)
target_variable = target_variable.permute(1,0)
input_variable = input_variable.permute(1,0)
mask = mask.permute(1,0)
#optimizer.zero_grad()
model.eval()
with torch.no_grad():
    output, _ = model(input_variable.to(device), target_variable[:,:-1].to(device))

torch.Size([10, 64]) torch.Size([64]) torch.Size([11, 64]) torch.Size([11, 64]) 11


In [None]:
target_variable[1]

tensor([1084,   76, 3420,   53,  706,  115, 6887,    4,    2,    0])

In [None]:
trg_mask_local = model.make_trg_mask(target_variable[:,:-1].to(device))
src_mask_local = model.make_src_mask(target_variable[:,:-1].to(device))

trg_mask_local.shape, src_mask_local.shape

(torch.Size([64, 1, 10, 10]), torch.Size([64, 1, 1, 10]))

In [None]:
print(trg_mask_local[3])
print(src_mask_local[3])

tensor([[[ True, False, False, False, False, False, False, False, False, False],
         [ True,  True, False, False, False, False, False, False, False, False],
         [ True,  True,  True, False, False, False, False, False, False, False],
         [ True,  True,  True,  True, False, False, False, False, False, False],
         [ True,  True,  True,  True,  True, False, False, False, False, False],
         [ True,  True,  True,  True,  True,  True, False, False, False, False],
         [ True,  True,  True,  True,  True,  True,  True, False, False, False],
         [ True,  True,  True,  True,  True,  True,  True,  True, False, False],
         [ True,  True,  True,  True,  True,  True,  True,  True,  True, False],
         [ True,  True,  True,  True,  True,  True,  True,  True,  True, False]]],
       device='cuda:0')
tensor([[[ True,  True,  True,  True,  True,  True,  True,  True,  True, False]]],
       device='cuda:0')


In [None]:
max_out = output.argmax(2)
#max_out[1]
#offset = 2
#trg_tokens = [ voc.index2word[index.item()] for index in max_out[offset]]
#actual_trg_tokens = [ voc.index2word[index.item()] for index in target_variable[offset,:-1]]

for offset in range(4):    
    print([ voc.index2word[index.item()] for index in input_variable[offset,:]])
    print( [ voc.index2word[index.item()] for index in target_variable[offset,:]])
    print(mask[offset])
    print([ voc.index2word[index.item()] for index in max_out[offset]])
    print([ voc.index2word[index.item()] for index in max_out[offset]])
    print('---------')
#print()

['he', 'made', 'you', 'give', 'him', 'a', 'blow', 'job', '.', 'EOS']
['SOS', 'no', '.', 'EOS', 'PAD', 'PAD', 'PAD', 'PAD', 'PAD', 'PAD', 'PAD']
tensor([ True,  True,  True,  True, False, False, False, False, False, False,
        False])
['EOS', 'EOS', 'EOS', 'EOS', 'EOS', 'EOS', 'EOS', 'EOS', 'EOS', 'EOS']
['EOS', 'EOS', 'EOS', 'EOS', 'EOS', 'EOS', 'EOS', 'EOS', 'EOS', 'EOS']
---------
['this', 'is', 'becoming', 'a', 'serious', 'breach', 'of', 'security', '.', 'EOS']
['SOS', 'he', 'didn', 't', 'recognize', 'me', '.', 'EOS', 'PAD', 'PAD', 'PAD']
tensor([ True,  True,  True,  True,  True,  True,  True,  True, False, False,
        False])
['EOS', 'EOS', 'EOS', 'EOS', 'EOS', 'EOS', 'EOS', 'EOS', 'EOS', 'EOS']
['EOS', 'EOS', 'EOS', 'EOS', 'EOS', 'EOS', 'EOS', 'EOS', 'EOS', 'EOS']
---------
['crabtree', '?', 'he', 'doesn', 't', 'even', 'know', 'james', '.', 'EOS']
['SOS', 'who', 'does', '?', 'EOS', 'PAD', 'PAD', 'PAD', 'PAD', 'PAD', 'PAD']
tensor([ True,  True,  True,  True,  True, False, 

In [None]:
max_out.shape

torch.Size([64, 9])

In [None]:
trg_tokens

['.', 's', 'the', 'way', '.', '.', '.', 'EOS', 'EOS']

Training iterations
~~~~~~~~~~~~~~~~~~~

It is finally time to tie the full training procedure together with the
data. The ``trainIters`` function is responsible for running
``n_iterations`` of training given the passed models, optimizers, data,
etc. This function is quite self explanatory, as we have done the heavy
lifting with the ``train`` function.

One thing to note is that when we save our model, we save a tarball
containing the encoder and decoder state_dicts (parameters), the
optimizers’ state_dicts, the loss, the iteration, etc. Saving the model
in this way will give us the ultimate flexibility with the checkpoint.
After loading a checkpoint, we will be able to use the model parameters
to run inference, or we can continue training right where we left off.




In [None]:
def maskNLLLoss2(inp, target, mask):
    nTotal = mask.sum()
    crossEntropy = -torch.log(torch.gather(inp, 1, target.view(-1, 1)).squeeze(1))
    loss = crossEntropy.masked_select(mask).mean()
    loss = loss.to(device)
    return loss, nTotal.item()

In [None]:
def transform_trainIters(model_name, voc, pairs, 
                         model, 
                         optimizer, 
                         save_dir, 
                         n_iteration, 
                         batch_size, 
                         print_every, 
                         save_every, 
                         clip, 
                         corpus_name=None, 
                         loadFilename=None,
                         best_valid_loss = float('inf')):

    # Load batches for each iteration
    training_batches = [batch2TrainData(voc, [random.choice(pairs) for _ in range(batch_size)])
                      for _ in range(n_iteration)]

    # Initializations
    print('Initializing ...')
    start_iteration = 1
    print_loss = 0
    
    if loadFilename:
        start_iteration = checkpoint['iteration'] + 1
    model.train()
    # Training loop
    print("Training...")
    for iteration in range(start_iteration, n_iteration + 1):
        training_batch = training_batches[iteration - 1]
        # Extract fields from batch
        input_variable, lengths, target_variable, mask, max_target_len = training_batch

        # Run a training iteration with batch
        loss = train_transformer(input_variable,
                                 lengths, 
                                 target_variable, 
                                 mask, 
                                 max_target_len, 
                                 model, 
                                 optimizer, 
                                 batch_size, 
                                 clip)
        print_loss += loss

        ### Wont 

        if loss < best_valid_loss:
            #print_loss_avg = print_loss / print_every
            print("Iteration: {}; Percent complete: {:.1f}%; Best Average loss: {:.4f}".format(iteration, iteration / n_iteration * 100, loss))
            best_valid_loss = loss
            torch.save({
                'iteration': iteration,
                'model': model.state_dict(),
                'optim':optimizer.state_dict(),
                'loss': best_valid_loss,
                'voc_dict': voc.__dict__
            }, 'checkpoint_new2.pt')

        if iteration % print_every == 0:
            print_loss_avg = print_loss / print_every
            print("Iteration: {}; Percent complete: {:.1f}%; Average loss: {:.4f}".format(iteration, iteration / n_iteration * 100, print_loss_avg))
            print_loss = 0

In [None]:
test_batches = [batch2TrainData(voc, [random.choice(test_pairs) for _ in range(batch_size)])
                      for _ in range(n_iteration)]

In [None]:
def transform_train_evalIters(model_name, 
                         voc,
                         train_pairs,
                         test_pairs, 
                         model, 
                         optimizer, 
                         save_dir, 
                         n_iteration, 
                         batch_size, 
                         print_every, 
                         save_every, 
                         clip, 
                         corpus_name=None, 
                         loadFilename=None,
                         best_valid_loss = float('inf')):

    # Load batches for each iteration
    training_batches = [batch2TrainData(voc, [random.choice(train_pairs) for _ in range(batch_size)])
                      for _ in range(n_iteration)]
    test_batches = [batch2TrainData(voc, [random.choice(test_pairs) for _ in range(batch_size)])
                      for _ in range(n_iteration)]

    # Initializations
    print('Initializing ...')
    start_iteration = 1
    print_loss = 0
    print_val_loss = 0
    
    if loadFilename:
        start_iteration = checkpoint['iteration'] + 1
    
    # Training loop
    print("Training...")
    for iteration in range(start_iteration, n_iteration + 1):
        training_batch = training_batches[iteration - 1]
        test_batch = test_batches[iteration - 1]
        # Extract fields from batch
        input_variable, lengths, target_variable, mask, max_target_len = training_batch
        test_input_variable, test_lengths, test_target_variable, test_mask, test_max_target_len = test_batch

        # Run a training iteration with batch
        loss = train_transformer(input_variable,
                                 lengths, 
                                 target_variable, 
                                 mask, 
                                 max_target_len, 
                                 model, 
                                 optimizer, 
                                 batch_size, 
                                 clip)
        print_loss += loss
        val_loss = eval_transformer(test_input_variable,
                            test_lengths, 
                            test_target_variable, 
                            test_mask, 
                            test_max_target_len, 
                            model, 
                            optimizer, 
                            batch_size, 
                            clip)
        print_val_loss += val_loss
        ### Wont 

        if val_loss < best_valid_loss:
            #print_loss_avg = print_loss / print_every
            print("Iteration: {}; Percent complete: {:.1f}%; Train Loss: {:.4f}, Best Val loss: {:.4f}".format(iteration, iteration / n_iteration * 100, loss, val_loss))
            best_valid_loss = val_loss
            torch.save({
                'iteration': iteration,
                'model': model.state_dict(),
                'optim':optimizer.state_dict(),
                'loss': best_valid_loss,
                'voc_dict': voc.__dict__
            }, 'checkpoint_new2.pt')

        if iteration % print_every == 0:
            print_loss_avg = print_loss / print_every
            print_val_loss_avg = print_val_loss / print_every
            print("Iteration: {}; Percent complete: {:.1f}%; Train loss: {:.4f}; Val loss: {:.4f}".format(iteration, iteration / n_iteration * 100, print_loss_avg, print_val_loss_avg))
            print_loss = 0
            print_val_loss = 0

In [None]:
class GreedySearchDecoder(nn.Module):
    def __init__(self, encoder, decoder):
        super(GreedySearchDecoder, self).__init__()
        self.encoder = encoder
        self.decoder = decoder

    def forward(self, input_seq, input_length, max_length):
        # Forward input through encoder model
        encoder_outputs, encoder_hidden = self.encoder(input_seq, input_length)
        # Prepare encoder's final hidden layer to be first hidden input to the decoder
        decoder_hidden = encoder_hidden[:decoder.n_layers]
        # Initialize decoder input with SOS_token
        decoder_input = torch.ones(1, 1, device=device, dtype=torch.long) * SOS_token
        # Initialize tensors to append decoded words to
        all_tokens = torch.zeros([0], device=device, dtype=torch.long)
        all_scores = torch.zeros([0], device=device)
        # Iteratively decode one word token at a time
        for _ in range(max_length):
            # Forward pass through decoder
            decoder_output, decoder_hidden = self.decoder(decoder_input, decoder_hidden, encoder_outputs)
            # Obtain most likely word token and its softmax score
            decoder_scores, decoder_input = torch.max(decoder_output, dim=1)
            # Record token and score
            all_tokens = torch.cat((all_tokens, decoder_input), dim=0)
            all_scores = torch.cat((all_scores, decoder_scores), dim=0)
            # Prepare current token to be next decoder input (add a dimension)
            decoder_input = torch.unsqueeze(decoder_input, 0)
        # Return collections of word tokens and scores
        return all_tokens, all_scores

In [None]:
def evaluate(encoder, decoder, searcher, voc, sentence, max_length=MAX_LENGTH):
    ### Format input sentence as a batch
    # words -> indexes
    indexes_batch = [indexesFromSentence(voc, sentence)]
    # Create lengths tensor
    lengths = torch.tensor([len(indexes) for indexes in indexes_batch])
    # Transpose dimensions of batch to match models' expectations
    input_batch = torch.LongTensor(indexes_batch).transpose(0, 1)
    # Use appropriate device
    input_batch = input_batch.to(device)
    lengths = lengths.to(device)
    # Decode sentence with searcher
    tokens, scores = searcher(input_batch, lengths, max_length)
    # indexes -> words
    decoded_words = [voc.index2word[token.item()] for token in tokens]
    return decoded_words


def evaluateInput(encoder, decoder, searcher, voc):
    input_sentence = ''
    while(1):
        try:
            # Get input sentence
            input_sentence = input('> ')
            # Check if it is quit case
            if input_sentence == 'q' or input_sentence == 'quit': break
            # Normalize sentence
            input_sentence = normalizeString(input_sentence)
            # Evaluate sentence
            output_words = evaluate(encoder, decoder, searcher, voc, input_sentence)
            # Format and print response sentence
            output_words[:] = [x for x in output_words if not (x == 'EOS' or x == 'PAD')]
            print('Bot:', ' '.join(output_words))

        except KeyError:
            print("Error: Encountered unknown word.")

In [None]:
def evaluate_samples(model, voc, pairs, input_text=None, max_len = 10):
    voc2 = checkpoint['voc_dict']
    if input_text is None:
        input_sentence, exp_op = random.choice(pairs)
        print("Input:{}".format(input_sentence))
        print("Output:{}".format(exp_op))
    else:
        input_sentence = input_text

    #input_sentence = "the world is not enough"

    input_sentence = normalizeString(input_sentence)
    indexes_batch = [SOS_token]+indexesFromSentence(voc, input_sentence)+ [EOS_token]

    #indexes_batch = [indexesFromSentence(voc, input_sentence)]

    input_batch = torch.LongTensor(indexes_batch).unsqueeze(0).to(device)
    #print(input_sentence, indexes_batch, input_batch.shape)
    model.eval()
    #print(input_batch)
    src_mask = model.make_src_mask(input_batch).to(device)
    with torch.no_grad():
        enc_src = model.encoder(input_batch, src_mask)

    #trg_indexes = [trg_field.vocab.stoi[trg_field.init_token]]
    trg_indexes = [SOS_token]#]).to(device)
    
    for i in range(max_len):
        trg_tensor = torch.LongTensor(trg_indexes).unsqueeze(0).to(device)
        trg_mask = model.make_trg_mask(trg_tensor)
        with torch.no_grad():
            output, attention = model.decoder(trg_tensor, enc_src, trg_mask, src_mask)
        #print("Output",output.shape)
        pred_token = output.argmax(2)[:,-1].item()
        decoder_scores, decoder_input = torch.max(output, dim=1)
        #print(decoder_scores, decoder_input)
        trg_indexes.append(pred_token)
        #print(trg_indexes)
        if pred_token == EOS_token:
            break
    trg_tokens = [ voc.index2word[index] for index in trg_indexes]
    print(trg_tokens[1:])

Run Model
---------

Finally, it is time to run our model!

Regardless of whether we want to train or test the chatbot model, we
must initialize the individual encoder and decoder models. In the
following block, we set our desired configurations, choose to start from
scratch or set a checkpoint to load from, and build and initialize the
models. Feel free to play with different model configurations to
optimize performance.




Run Training
~~~~~~~~~~~~

Run the following block if you want to train the model.

First we set training parameters, then we initialize our optimizers, and
finally we call the ``trainIters`` function to run our training
iterations.




In [None]:
# Configure training/optimization
clip = 50.0
teacher_forcing_ratio = 1.0
learning_rate = 0.0001
decoder_learning_ratio = 5.0
n_iteration = 4000
print_every = 1
save_every = 500

# Ensure dropout layers are in train mode
model.train()
#decoder.train()

# Initialize optimizers
print('Building optimizers ...')
encoder_optimizer = optim.Adam(encoder.parameters(), lr=learning_rate)
decoder_optimizer = optim.Adam(decoder.parameters(), lr=learning_rate * decoder_learning_ratio)

if loadFilename:
    encoder_optimizer.load_state_dict(encoder_optimizer_sd)
    decoder_optimizer.load_state_dict(decoder_optimizer_sd)

# If you have cuda, configure cuda to call
for state in encoder_optimizer.state.values():
    for k, v in state.items():
        if isinstance(v, torch.Tensor):
            state[k] = v.cuda()

for state in decoder_optimizer.state.values():
    for k, v in state.items():
        if isinstance(v, torch.Tensor):
            state[k] = v.cuda()
    
# Run training iterations
print("Starting Training!")
trainIters(model_name, voc, pairs, model, optimizer,
           embedding, encoder_n_layers, decoder_n_layers, save_dir, n_iteration, batch_size,
           print_every, save_every, clip, corpus_name, loadFilename)

In [None]:
def maskNLLLoss2(inp, target, mask):
    nTotal = mask.sum()
    crossEntropy = -torch.log(torch.gather(inp, 1, target.view(-1, 1)).squeeze(1))
    #print(crossEntropy.shape)
    loss = crossEntropy.masked_select(mask).mean()
    loss = loss.to(device)
    return loss, nTotal.item()

In [None]:
# Configure training/optimization
model_name = 'cb_model'
attn_model = 'dot'
#attn_model = 'general'
#attn_model = 'concat'
batch_size = 64
model.apply(initialize_weights);
LEARNING_RATE = 0.0005

optimizer = torch.optim.Adam(model.parameters(), lr = LEARNING_RATE)

clip = 50.0
n_iteration = 1000
print_every = 1
save_every = 5

# Ensure dropout layers are in train mode
model.train()
#decoder.train()

# Initialize optimizers
print('Building optimizers ...')
    
# Run training iterations
print("Starting Training!")
loadFilename="none"


#model = torch.load("/content/tut13-model.pt")
# Load batches for each iteration
train_size = int(len(pairs) * 0.9)
train_pairs = pairs[:train_size]
test_pairs = pairs[train_size:]

# transform_trainIters(model_name, voc, pairs, model, optimizer, 
#            embedding, encoder_n_layers, decoder_n_layers, save_dir, n_iteration, batch_size,
#            print_every, save_every, 1)
# transform_train_evalIters(model_name,
#                      voc, 
#                      train_pairs, 
#                      test_pairs,
#                      model, 
#                      optimizer, 
#                      save_dir, 
#                      n_iteration, 
#                      batch_size,
#                      print_every, 
#                      save_every, 
#                      clip=1)

transform_trainIters(model_name,
                     voc, 
                     pairs,
                     model, 
                     optimizer, 
                     save_dir, 
                     n_iteration, 
                     batch_size,
                     print_every, 
                     save_every, 
                     clip=1)

Building optimizers ...
Starting Training!
Initializing ...
Training...
Iteration: 1; Percent complete: 0.1%; Best Average loss: 9.0234
Iteration: 1; Percent complete: 0.1%; Average loss: 9.0234
Iteration: 2; Percent complete: 0.2%; Best Average loss: 8.8124
Iteration: 2; Percent complete: 0.2%; Average loss: 8.8124
Iteration: 3; Percent complete: 0.3%; Best Average loss: 8.6647
Iteration: 3; Percent complete: 0.3%; Average loss: 8.6647
Iteration: 4; Percent complete: 0.4%; Best Average loss: 8.5551
Iteration: 4; Percent complete: 0.4%; Average loss: 8.5551
Iteration: 5; Percent complete: 0.5%; Best Average loss: 8.4930
Iteration: 5; Percent complete: 0.5%; Average loss: 8.4930
Iteration: 6; Percent complete: 0.6%; Best Average loss: 8.4162
Iteration: 6; Percent complete: 0.6%; Average loss: 8.4162
Iteration: 7; Percent complete: 0.7%; Best Average loss: 8.3157
Iteration: 7; Percent complete: 0.7%; Average loss: 8.3157
Iteration: 8; Percent complete: 0.8%; Best Average loss: 8.2725
Ite

In [None]:
checkpoint = torch.load("/content/checkpoint_new2.pt")
print(checkpoint["loss"])
# torch.save({
#         'iteration': 1000,
#         'model': model.state_dict(),
#         'optim':optimizer.state_dict(),
#         'loss': 2.7950,
#         'voc_dict': voc.__dict__
#     }, 'checkpoint_new3.pt')


3.0985634326934814


In [None]:
print(checkpoint["loss"])


2.7950243949890137


In [None]:
LEARNING_RATE = 0.0005

optimizer = torch.optim.Adam(model.parameters(), lr = LEARNING_RATE)
optimizer.load_state_dict(checkpoint['optim'])
for param_group in optimizer.param_groups:
    param_group['lr'] = LEARNING_RATE

In [None]:
model.load_state_dict(checkpoint['model'])
n_iteration=3000
batch_size=64
print_every=5
transform_trainIters(model_name,
                     voc, 
                     pairs,
                     model, 
                     optimizer, 
                     save_dir, 
                     n_iteration, 
                     batch_size,
                     print_every, 
                     save_every, 
                     clip=1,
                     best_valid_loss=checkpoint["loss"])

Initializing ...
Training...
Iteration: 1; Percent complete: 0.0%; Best Average loss: 3.5849
Iteration: 2; Percent complete: 0.1%; Best Average loss: 3.5243
Iteration: 4; Percent complete: 0.1%; Best Average loss: 3.4722
Iteration: 5; Percent complete: 0.2%; Average loss: 3.5904
Iteration: 10; Percent complete: 0.3%; Average loss: 3.6853
Iteration: 11; Percent complete: 0.4%; Best Average loss: 3.3410
Iteration: 15; Percent complete: 0.5%; Average loss: 3.5901
Iteration: 20; Percent complete: 0.7%; Average loss: 3.6492
Iteration: 25; Percent complete: 0.8%; Average loss: 3.6666
Iteration: 30; Percent complete: 1.0%; Average loss: 3.6341
Iteration: 35; Percent complete: 1.2%; Average loss: 3.5993
Iteration: 40; Percent complete: 1.3%; Average loss: 3.6017
Iteration: 43; Percent complete: 1.4%; Best Average loss: 3.2856
Iteration: 45; Percent complete: 1.5%; Average loss: 3.5700
Iteration: 47; Percent complete: 1.6%; Best Average loss: 3.2595
Iteration: 50; Percent complete: 1.7%; Averag

In [None]:
checkpoint = torch.load("/content/checkpoint_new2.pt")
print(checkpoint["loss"])

2.932177782058716


In [None]:
LEARNING_RATE = 0.0001

optimizer = torch.optim.Adam(model.parameters(), lr = LEARNING_RATE)
optimizer.load_state_dict(checkpoint['optim'])
for param_group in optimizer.param_groups:
    param_group['lr'] = LEARNING_RATE

In [None]:
model.load_state_dict(checkpoint['model'])

n_iteration=2000
batch_size=64
print_every=5
transform_trainIters(model_name,
                     voc, 
                     pairs,
                     model, 
                     optimizer, 
                     save_dir, 
                     n_iteration, 
                     batch_size,
                     print_every, 
                     save_every, 
                     clip=3,
                     best_valid_loss=checkpoint["loss"])

Initializing ...
Training...
Iteration: 5; Percent complete: 0.2%; Average loss: 3.2711
Iteration: 10; Percent complete: 0.5%; Average loss: 3.2683
Iteration: 15; Percent complete: 0.8%; Average loss: 3.2562
Iteration: 20; Percent complete: 1.0%; Average loss: 3.3796
Iteration: 25; Percent complete: 1.2%; Average loss: 3.2859
Iteration: 30; Percent complete: 1.5%; Average loss: 3.3440
Iteration: 35; Percent complete: 1.8%; Average loss: 3.3275
Iteration: 40; Percent complete: 2.0%; Average loss: 3.2597
Iteration: 45; Percent complete: 2.2%; Average loss: 3.3482
Iteration: 50; Percent complete: 2.5%; Average loss: 3.2720
Iteration: 55; Percent complete: 2.8%; Average loss: 3.4324
Iteration: 60; Percent complete: 3.0%; Average loss: 3.2800
Iteration: 65; Percent complete: 3.2%; Average loss: 3.3008
Iteration: 70; Percent complete: 3.5%; Average loss: 3.2758
Iteration: 75; Percent complete: 3.8%; Average loss: 3.3632
Iteration: 80; Percent complete: 4.0%; Average loss: 3.1681
Iteration: 8

In [None]:
checkpoint = torch.load('/content/checkpoint_new2.pt')
print(checkpoint['loss'])
for param_group in optimizer.param_groups:
    print(param_group['lr'])# = LEARNING_RATE
#model.load_state_dict(checkpoint['model'])

2.6540706157684326
5e-05


In [None]:
checkpoint = torch.load('/content/checkpoint_new2.pt')
print(checkpoint['loss'])
LEARNING_RATE = 0.0005

optimizer = torch.optim.Adam(model.parameters(), lr = LEARNING_RATE)
optimizer.load_state_dict(checkpoint['optim'])
for param_group in optimizer.param_groups:
    param_group['lr'] = LEARNING_RATE

2.6540706157684326


In [None]:
model.load_state_dict(checkpoint['model'])

n_iteration=2000
batch_size=128
print_every=5
# transform_trainIters(model_name,
#                      voc, 
#                      pairs,
#                      model, 
#                      optimizer, 
#                      save_dir, 
#                      n_iteration, 
#                      batch_size,
#                      print_every, 
#                      save_every, 
#                      clip=1,
#                      best_valid_loss=checkpoint["loss"])
transform_train_evalIters(model_name,
                     voc, 
                     train_pairs,
                     test_pairs,
                     model, 
                     optimizer, 
                     save_dir, 
                     n_iteration, 
                     batch_size,
                     print_every, 
                     save_every, 
                     clip=1,
                     best_valid_loss=checkpoint["loss"])

Initializing ...
Training...
Iteration: 5; Percent complete: 0.2%; Train loss: 2.9311; Val loss: 3.1837
Iteration: 10; Percent complete: 0.5%; Train loss: 3.1354; Val loss: 3.2074
Iteration: 15; Percent complete: 0.8%; Train loss: 3.0906; Val loss: 3.1579
Iteration: 20; Percent complete: 1.0%; Train loss: 3.0778; Val loss: 3.2811
Iteration: 25; Percent complete: 1.2%; Train loss: 3.0510; Val loss: 3.2019
Iteration: 30; Percent complete: 1.5%; Train loss: 3.0187; Val loss: 3.1897
Iteration: 35; Percent complete: 1.8%; Train loss: 3.0893; Val loss: 3.1903
Iteration: 40; Percent complete: 2.0%; Train loss: 3.0995; Val loss: 3.2411
Iteration: 45; Percent complete: 2.2%; Train loss: 3.0954; Val loss: 3.2139
Iteration: 50; Percent complete: 2.5%; Train loss: 3.1144; Val loss: 3.2345
Iteration: 55; Percent complete: 2.8%; Train loss: 3.0699; Val loss: 3.0811
Iteration: 60; Percent complete: 3.0%; Train loss: 3.0450; Val loss: 3.2084
Iteration: 65; Percent complete: 3.2%; Train loss: 2.9859; V

In [None]:
checkpoint = torch.load('/content/checkpoint_new2.pt')
checkpoint['loss']

2.4181485176086426

In [None]:
for param_group in optimizer.param_groups:
    print(param_group['lr'])

0.0005


In [None]:
checkpoint = torch.load('/content/checkpoint_new2.pt')
print(checkpoint['loss'])
LEARNING_RATE = 0.00005

optimizer = torch.optim.Adam(model.parameters(), lr = LEARNING_RATE)
optimizer.load_state_dict(checkpoint['optim'])
for param_group in optimizer.param_groups:
    param_group['lr'] = LEARNING_RATE

2.799795389175415


In [None]:
#model.load_state_dict(checkpoint['model'])
n_iteration=3000
batch_size=64
print_every=5
transform_trainIters(model_name,
                     voc, 
                     pairs,
                     model, 
                     optimizer, 
                     save_dir, 
                     n_iteration, 
                     batch_size,
                     print_every, 
                     save_every, 
                     clip=1,
                     best_valid_loss=checkpoint["loss"])
# transform_train_evalIters(model_name,
#                      voc, 
#                      train_pairs,
#                      test_pairs,
#                      model, 
#                      optimizer, 
#                      save_dir, 
#                      n_iteration, 
#                      batch_size,
#                      print_every, 
#                      save_every, 
#                      clip=1,
#                      best_valid_loss=checkpoint["loss"])

Initializing ...
Training...
Iteration: 5; Percent complete: 0.2%; Average loss: 3.3589
Iteration: 10; Percent complete: 0.3%; Average loss: 3.4537
Iteration: 15; Percent complete: 0.5%; Average loss: 3.2718
Iteration: 20; Percent complete: 0.7%; Average loss: 3.1882
Iteration: 25; Percent complete: 0.8%; Average loss: 3.1847
Iteration: 30; Percent complete: 1.0%; Average loss: 3.2884
Iteration: 35; Percent complete: 1.2%; Average loss: 3.2566
Iteration: 40; Percent complete: 1.3%; Average loss: 3.1727
Iteration: 45; Percent complete: 1.5%; Average loss: 3.1963
Iteration: 50; Percent complete: 1.7%; Average loss: 3.1633
Iteration: 55; Percent complete: 1.8%; Average loss: 3.2836
Iteration: 60; Percent complete: 2.0%; Average loss: 3.1956
Iteration: 65; Percent complete: 2.2%; Average loss: 3.1270
Iteration: 70; Percent complete: 2.3%; Average loss: 3.1445
Iteration: 75; Percent complete: 2.5%; Average loss: 3.1037
Iteration: 80; Percent complete: 2.7%; Average loss: 3.1270
Iteration: 8

In [None]:
torch.save({
        'iteration': 1000,
        'model': model.state_dict(),
        'optim':optimizer.state_dict(),
        'loss': 2.7950,
        'voc_dict': voc.__dict__
    }, 'checkpoint_new3_128_2.2.pt')

In [None]:
checkpoint = torch.load('/content/checkpoint_new2.pt')
print(checkpoint['loss'])
model.load_state_dict(checkpoint['model'])
n_iteration=3000
batch_size=64
print_every=5
transform_trainIters(model_name,
                     voc, 
                     pairs,
                     model, 
                     optimizer, 
                     save_dir, 
                     n_iteration, 
                     batch_size,
                     print_every, 
                     save_every, 
                     clip=1,
                     best_valid_loss=checkpoint["loss"])

2.4181485176086426
Initializing ...
Training...
Iteration: 5; Percent complete: 0.2%; Average loss: 2.8181
Iteration: 10; Percent complete: 0.3%; Average loss: 2.7084
Iteration: 15; Percent complete: 0.5%; Average loss: 2.7435
Iteration: 20; Percent complete: 0.7%; Average loss: 2.8540
Iteration: 25; Percent complete: 0.8%; Average loss: 2.7350
Iteration: 30; Percent complete: 1.0%; Average loss: 2.7103
Iteration: 35; Percent complete: 1.2%; Average loss: 2.7876
Iteration: 40; Percent complete: 1.3%; Average loss: 2.7587
Iteration: 45; Percent complete: 1.5%; Average loss: 2.7734
Iteration: 50; Percent complete: 1.7%; Average loss: 2.7445
Iteration: 55; Percent complete: 1.8%; Average loss: 2.6868
Iteration: 60; Percent complete: 2.0%; Average loss: 2.8906
Iteration: 65; Percent complete: 2.2%; Average loss: 2.7560
Iteration: 70; Percent complete: 2.3%; Average loss: 2.8406
Iteration: 75; Percent complete: 2.5%; Average loss: 2.8034
Iteration: 80; Percent complete: 2.7%; Average loss: 

In [None]:
checkpoint = torch.load('/content/checkpoint_new2.pt')
print(checkpoint['loss'])
model.load_state_dict(checkpoint['model'])
n_iteration=3000
batch_size=128
print_every=5
transform_trainIters(model_name,
                     voc, 
                     pairs,
                     model, 
                     optimizer, 
                     save_dir, 
                     n_iteration, 
                     batch_size,
                     print_every, 
                     save_every, 
                     clip=1,
                     best_valid_loss=checkpoint["loss"])

2.200247287750244
Initializing ...
Training...
Iteration: 5; Percent complete: 0.2%; Average loss: 2.5826
Iteration: 10; Percent complete: 0.3%; Average loss: 2.5759
Iteration: 15; Percent complete: 0.5%; Average loss: 2.5952
Iteration: 20; Percent complete: 0.7%; Average loss: 2.5368
Iteration: 25; Percent complete: 0.8%; Average loss: 2.6054
Iteration: 30; Percent complete: 1.0%; Average loss: 2.5911
Iteration: 35; Percent complete: 1.2%; Average loss: 2.5704
Iteration: 40; Percent complete: 1.3%; Average loss: 2.5368
Iteration: 45; Percent complete: 1.5%; Average loss: 2.5473
Iteration: 50; Percent complete: 1.7%; Average loss: 2.5108
Iteration: 55; Percent complete: 1.8%; Average loss: 2.5643
Iteration: 60; Percent complete: 2.0%; Average loss: 2.5583
Iteration: 65; Percent complete: 2.2%; Average loss: 2.5362
Iteration: 70; Percent complete: 2.3%; Average loss: 2.5348
Iteration: 75; Percent complete: 2.5%; Average loss: 2.5741
Iteration: 80; Percent complete: 2.7%; Average loss: 2

In [None]:
checkpoint = torch.load('/content/checkpoint_new2.pt')
print(checkpoint['loss'])
model.load_state_dict(checkpoint['model'])
n_iteration=3000
batch_size=1024
print_every=10
transform_trainIters(model_name,
                     voc, 
                     pairs,
                     model, 
                     optimizer, 
                     save_dir, 
                     n_iteration, 
                     batch_size,
                     print_every, 
                     save_every, 
                     clip=1,
                     best_valid_loss=checkpoint["loss"])

1.8213812112808228
Initializing ...
Training...
Iteration: 10; Percent complete: 0.3%; Average loss: 1.9729
Iteration: 20; Percent complete: 0.7%; Average loss: 1.9696
Iteration: 30; Percent complete: 1.0%; Average loss: 1.9628
Iteration: 40; Percent complete: 1.3%; Average loss: 1.9451
Iteration: 50; Percent complete: 1.7%; Average loss: 1.9691
Iteration: 60; Percent complete: 2.0%; Average loss: 1.9503
Iteration: 70; Percent complete: 2.3%; Average loss: 1.9415
Iteration: 80; Percent complete: 2.7%; Average loss: 1.9200
Iteration: 90; Percent complete: 3.0%; Average loss: 1.9221
Iteration: 100; Percent complete: 3.3%; Average loss: 1.9352
Iteration: 110; Percent complete: 3.7%; Average loss: 1.9359
Iteration: 120; Percent complete: 4.0%; Average loss: 1.9361
Iteration: 130; Percent complete: 4.3%; Average loss: 1.9253
Iteration: 140; Percent complete: 4.7%; Average loss: 1.9210
Iteration: 150; Percent complete: 5.0%; Average loss: 1.9357
Iteration: 160; Percent complete: 5.3%; Averag

In [None]:
checkpoint = torch.load('/content/checkpoint_new2.pt')
print(checkpoint['loss'])
model.load_state_dict(checkpoint['model'])
n_iteration=3000
batch_size=1024
print_every=20
transform_trainIters(model_name,
                     voc, 
                     pairs,
                     model, 
                     optimizer, 
                     save_dir, 
                     n_iteration, 
                     batch_size,
                     print_every, 
                     save_every, 
                     clip=2,
                     best_valid_loss=checkpoint["loss"])

1.5827564001083374
Initializing ...
Training...
Iteration: 20; Percent complete: 0.7%; Average loss: 1.6567
Iteration: 40; Percent complete: 1.3%; Average loss: 1.6457
Iteration: 60; Percent complete: 2.0%; Average loss: 1.6535
Iteration: 80; Percent complete: 2.7%; Average loss: 1.6529
Iteration: 100; Percent complete: 3.3%; Average loss: 1.6470
Iteration: 120; Percent complete: 4.0%; Average loss: 1.6453
Iteration: 140; Percent complete: 4.7%; Average loss: 1.6553
Iteration: 150; Percent complete: 5.0%; Best Average loss: 1.5749
Iteration: 160; Percent complete: 5.3%; Average loss: 1.6515
Iteration: 180; Percent complete: 6.0%; Average loss: 1.6521
Iteration: 200; Percent complete: 6.7%; Average loss: 1.6568
Iteration: 220; Percent complete: 7.3%; Average loss: 1.6400
Iteration: 240; Percent complete: 8.0%; Average loss: 1.6535
Iteration: 260; Percent complete: 8.7%; Average loss: 1.6487
Iteration: 280; Percent complete: 9.3%; Average loss: 1.6490
Iteration: 297; Percent complete: 9.

In [None]:
!cp /content/checkpoint_new2.pt /content/drive/MyDrive/EVA4/ENDS9/checkpoint_new3_128_1.44.pt

In [None]:
!cp /content/checkpoint_new3_128_2.2.pt /content/drive/MyDrive/EVA4/ENDS9
#!cp /content/tut6-model.pt /content/drive/MyDrive/EVA4/ENDS9/tut6_2.32-model.pt

In [None]:
!cp /content/checkpoint_new2_1_05.pt /content/drive/MyDrive/EVA4/ENDS9/
checkpoint = torch.load('/content/checkpoint_new2_1_05.pt')
print(checkpoint['loss'])

1.0572753842173535


In [None]:
model.load_state_dict(checkpoint['model'])
n_iteration=6000
batch_size=512
print_every=5
transform_trainIters(model_name, voc, pairs, model, optimizer, 
           embedding, encoder_n_layers, decoder_n_layers, save_dir, n_iteration, batch_size,
           print_every, save_every, 1)

Run Evaluation
~~~~~~~~~~~~~~

To chat with your model, run the following block.




In [None]:
# Set dropout layers to eval mode
encoder.eval()
decoder.eval()

# Initialize search module
searcher = GreedySearchDecoder(encoder, decoder)

# Begin chatting (uncomment and run the following line to begin)
# evaluateInput(encoder, decoder, searcher, voc)

In [None]:
checkpoint=torch.load("/content/checkpoint_new2.pt")
print(checkpoint['loss'])
model.load_state_dict(checkpoint['model'])

1.4438446760177612


<All keys matched successfully>

In [None]:
for _ in range(20):
    #input_text = input(">")
    #evaluate_samples(model, voc, pairs, input_text=input_text)
    evaluate_samples(model, voc, pairs)#, input_text=input_text)

Input:ale . i don t mind .
Output:right miss .
['right', 'miss', '.', 'EOS']
Input:sun glasses ?
Output:to hide a black eye .
['to', 'hide', 'a', 'black', 'eye', '.', 'EOS']
Input:i know . jean pierre did .
Output:you were behind the door ?
['you', 'were', 'behind', 'the', 'door', '?', 'EOS']
Input:china .
Output:when s he coming back . . . ?
['he', 's', 'a', 'fever', '.', 'EOS']
Input:please . . .
Output:which way do you want to go ?
['.', '.', '.', 'yes', '.', 'EOS']
Input:it was fifteen seconds .
Output:i don t think so .
['i', 'don', 't', 'know', '!', 'EOS']
Input:can i give you a hand beautiful ?
Output:i m just going to my car ?
['i', 'm', 'not', 'going', 'to', 'kill', 'her', '.', 'EOS']
Input:well tell the asshole to shut up .
Output:right . hey shut up . okay sir .
['what', '?', 'EOS']
Input:you re not serious .
Output:i m always serious .
['i', 'know', '.', 'EOS']
Input:hello ?
Output:it s me . . . johana .
['my', 'name', 's', 'libbets', '.', 'EOS']
Input:i m going home donny 

Conclusion
----------

That’s all for this one, folks. Congratulations, you now know the
fundamentals to building a generative chatbot model! If you’re
interested, you can try tailoring the chatbot’s behavior by tweaking the
model and training parameters and customizing the data that you train
the model on.

Check out the other tutorials for more cool deep learning applications
in PyTorch!


