# LSTM Bot

## Project Overview

In this project, you will build a chatbot that can converse with you at the command line. The chatbot will use a Sequence to Sequence text generation architecture with an LSTM as it's memory unit. You will also learn to use pretrained word embeddings to improve the performance of the model. At the conclusion of the project, you will be able to show your chatbot to potential employers.

Additionally, you have the option to use pretrained word embeddings in your model. We have loaded Brown Embeddings from Gensim in the starter code below. You can compare the performance of your model with pre-trained embeddings against a model without the embeddings.



---



A sequence to sequence model (Seq2Seq) has two components:
- An Encoder consisting of an embedding layer and LSTM unit.
- A Decoder consisting of an embedding layer, LSTM unit, and linear output unit.

The Seq2Seq model works by accepting an input into the Encoder, passing the hidden state from the Encoder to the Decoder, which the Decoder uses to output a series of token predictions.

## Dependencies

- Pytorch
- Numpy
- Pandas
- NLTK
- Gzip
- Gensim


Please choose a dataset from the Torchtext website. We recommend looking at the Squad dataset first. Here is a link to the website where you can view your options:

- https://pytorch.org/text/stable/datasets.html





### Installation

In [1]:
!pip install torch==1.8.0 torchtext==0.9.0

Defaulting to user installation because normal site-packages is not writeable
Collecting torch==1.8.0
  Downloading torch-1.8.0-cp37-cp37m-manylinux1_x86_64.whl (735.5 MB)
[K     |████████████████████████████████| 735.5 MB 9.3 kB/s  eta 0:00:01
[?25hCollecting torchtext==0.9.0
  Downloading torchtext-0.9.0-cp37-cp37m-manylinux1_x86_64.whl (7.1 MB)
[K     |████████████████████████████████| 7.1 MB 33.5 MB/s eta 0:00:01
[31mERROR: torchvision 0.10.0 has requirement torch==1.9.0, but you'll have torch 1.8.0 which is incompatible.[0m
Installing collected packages: torch, torchtext
Successfully installed torch-1.8.0 torchtext-0.9.0


In [2]:
!pip install numpy pandas

Defaulting to user installation because normal site-packages is not writeable


In [3]:
!pip install scipy gensim nltk 

Defaulting to user installation because normal site-packages is not writeable


In [4]:
!pip install -U pip setuptools wheel

Defaulting to user installation because normal site-packages is not writeable
Collecting pip
  Downloading pip-23.1.2-py3-none-any.whl (2.1 MB)
[K     |████████████████████████████████| 2.1 MB 5.2 MB/s eta 0:00:01
[?25hCollecting setuptools
  Downloading setuptools-67.7.2-py3-none-any.whl (1.1 MB)
[K     |████████████████████████████████| 1.1 MB 34.2 MB/s eta 0:00:01
[?25hCollecting wheel
  Downloading wheel-0.40.0-py3-none-any.whl (64 kB)
[K     |████████████████████████████████| 64 kB 3.8 MB/s  eta 0:00:01
[?25hInstalling collected packages: pip, setuptools, wheel
Successfully installed pip-23.1.2 setuptools-67.7.2 wheel-0.40.0


In [5]:
!pip install -U spacy

Defaulting to user installation because normal site-packages is not writeable
Collecting spacy
  Downloading spacy-3.5.2-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (6.5 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.5/6.5 MB[0m [31m37.3 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25hCollecting spacy-legacy<3.1.0,>=3.0.11 (from spacy)
  Downloading spacy_legacy-3.0.12-py2.py3-none-any.whl (29 kB)
Collecting spacy-loggers<2.0.0,>=1.0.0 (from spacy)
  Downloading spacy_loggers-1.0.4-py3-none-any.whl (11 kB)
Collecting murmurhash<1.1.0,>=0.28.0 (from spacy)
  Downloading murmurhash-1.0.9-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (21 kB)
Collecting cymem<2.1.0,>=2.0.2 (from spacy)
  Downloading cymem-2.0.7-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (36 kB)
Collecting preshed<3.1.0,>=3.0.2 (from spacy)
  Downloading preshed-3.0.8-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_

In [6]:
!python -m spacy download en_core_web_sm

Defaulting to user installation because normal site-packages is not writeable
Collecting en-core-web-sm==3.5.0
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.5.0/en_core_web_sm-3.5.0-py3-none-any.whl (12.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.8/12.8 MB[0m [31m47.9 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
Installing collected packages: en-core-web-sm
Successfully installed en-core-web-sm-3.5.0
[0m[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_sm')


In [7]:
!python -m spacy download de_core_news_sm

Defaulting to user installation because normal site-packages is not writeable
Collecting de-core-news-sm==3.5.0
  Downloading https://github.com/explosion/spacy-models/releases/download/de_core_news_sm-3.5.0/de_core_news_sm-3.5.0-py3-none-any.whl (14.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m14.6/14.6 MB[0m [31m48.9 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
Installing collected packages: de-core-news-sm
Successfully installed de-core-news-sm-3.5.0
[0m[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('de_core_news_sm')


### Libraries

In [1]:
import pandas as pd
import numpy as np
import gzip
from typing import List
import random
import time
import math


import torch
import torch.nn as nn
import torch.optim as optim
from torch.optim.lr_scheduler import ReduceLROnPlateau
from sklearn.metrics import accuracy_score

import torchtext
from torchtext.legacy.data import Field, BucketIterator
from torchtext.legacy.datasets import Multi30k

import gensim
import spacy

import nltk
from nltk.corpus import brown

nltk.download("brown")  # data files for bigram collocation

[nltk_data] Downloading package brown to /root/nltk_data...
[nltk_data]   Unzipping corpora/brown.zip.


True

In [2]:
SEED = 47  # for reproducibility

random.seed(SEED)
np.random.seed(SEED)
torch.manual_seed(SEED)
torch.cuda.manual_seed(SEED)
torch.backends.cudnn.deterministic = True

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

### Data Preprocessing 

In [3]:
# Output, save, and load brown embeddings

model = gensim.models.Word2Vec(brown.sents())
model.save("brown.embedding")

w2v = gensim.models.Word2Vec.load("brown.embedding")

In [4]:
## Tokenization using Spacy
spacy_de = spacy.load("de_core_news_sm")
spacy_en = spacy.load("en_core_web_sm")


def tokenize_de(text):
    return [tok.text for tok in spacy_de.tokenizer(text)][::-1]


def tokenize_en(text):
    return [tok.text for tok in spacy_en.tokenizer(text)]

In [5]:
# define english preprocessing pipeline (after tokenization)
def en_prepareText(tokens):
    STOP_WORDS = spacy_en.Defaults.stop_words

    # remove stopwords
    tokens = [token for token in tokens if token not in STOP_WORDS]

    # lemmatize the tokens
    doc = spacy_en(" ".join(tokens))
    tokens = [token.lemma_ for token in doc]
    return tokens


# define german preprocessing pipeline (after tokenization)
def ger_prepareText(tokens):
    GER_STOP_WORDS = spacy_de.Defaults.stop_words

    # remove stopwords
    tokens = [token for token in tokens if token not in GER_STOP_WORDS]

    # lemmatize the tokens
    doc = spacy_de(" ".join(tokens))
    tokens = [token.lemma_ for token in doc]

    return tokens

In [6]:
# German Datafield
SRC = Field(
    tokenize=tokenize_de,
    init_token="<sos>",
    eos_token="<eos>",
    lower=True,
    # preprocessing=ger_prepareText,
)
# English Datafield
TRG = Field(
    tokenize=tokenize_en,
    init_token="<sos>",
    eos_token="<eos>",
    lower=True,
    # preprocessing=en_prepareText,
)


def loadDF(SRC, TRG):
    """

    You will use this function to load the dataset into a Pandas Dataframe for processing.

    Args:
        split_set: the dataset split you want to load into a Pandas Dataframe
    """

    train_data, valid_data, test_data = Multi30k.splits(
        exts=(".de", ".en"), fields=(SRC, TRG)
    )

    return train_data, valid_data, test_data

In [7]:
def buildVocab(SRC, TRG, train_dataset):
    """
    Input: SRC, our list of German texts from the dataset
            TRG, our list of English texts from the dataset

    Output: SRC and TRG vocabularies

    """

    # Build the vocabulary for the source and target languages
    # Selecting only words that appear more than once
    SRC.build_vocab(train_dataset, min_freq=2)
    TRG.build_vocab(train_dataset, min_freq=2)

    # Print the number of unique tokens in the source and target vocabularies
    print("Source vocabulary size:", len(SRC.vocab))
    print("Target vocabulary size:", len(TRG.vocab))

    # Print the 10 most common tokens in the source vocabulary
    print(SRC.vocab.freqs.most_common(10))

    # Print the 10 most common tokens in the target vocabulary
    print(TRG.vocab.freqs.most_common(10))

    return SRC.vocab, TRG.vocab

In [8]:
def split_into_batches(dataset, BATCH_SIZE):
    """
    Creating batches of data.
    The BucketIterator will ensure that the sentences of similar length are batched together.

    Input: dataset (Tuple), the dataset to split into batches
            batch_size, the size of each batch

    Output: return a batch of data with a src and trg attribute
    """
    train_iterator, valid_iterator, test_iterator = BucketIterator.splits(
        dataset,
        batch_size=BATCH_SIZE,
        sort_within_batch=True,
        sort_key=lambda x: len(x.src),
        device=device,
    )

    return train_iterator, valid_iterator, test_iterator

In [9]:
train_data, valid_data, test_data = loadDF(SRC, TRG)

In [10]:
print(f"Number of training examples: {len(train_data.examples)}")
print(f"Number of validation examples: {len(valid_data.examples)}")
print(f"Number of testing examples: {len(test_data.examples)}")

Number of training examples: 29000
Number of validation examples: 1014
Number of testing examples: 1000


In [11]:
DATASET = (train_data, valid_data, test_data)
BATCH_SIZE = 64  # 128

train_batch, valid_batch, test_batch = split_into_batches(DATASET, BATCH_SIZE)

### Model Architecture

#### Encoder

In [12]:
class Encoder(nn.Module):
    """
    Input :
        - source batch
    Layer :
        source batch -> Embedding -> LSTM
    Output :
        - outputs: the top-layer hidden state for each time step
        - LSTM hidden state: the final hidden state for each layer, stacked on top of each other
        - LSTM cell state: the final cell state for each layer, stacked on top of each other

    Parmeters
    ---------
    input_size : int
        Input dimension, should equal to the source vocab size.

    embd_size : int
        Embedding layer's dimension.

    hidden_size : int
        LSTM Hidden/Cell state's dimension.

    n_layers : int
        Number of LSTM layers.

    dropout : float
        Dropout for the LSTM layer.
    """

    def __init__(self, input_size, embd_size, hidden_size, n_layers, drop):
        super(Encoder, self).__init__()

        self.input_size = input_size
        self.embd_size = embd_size
        self.hidden_size = hidden_size
        self.n_layers = n_layers
        self.dropout = nn.Dropout(drop)

        # self.embedding provides a vector representation of the inputs to our model
        self.embedding = nn.Embedding(input_size, embd_size)

        # self.lstm, accepts the vectorized input and passes a hidden state
        self.lstm = nn.LSTM(embd_size, hidden_size, n_layers, dropout=drop)

    def forward(self, src):
        """
        Parameters
        ------
        src : the source vector [batch size, src length]
        embedded: [batch size, src length,  embedding size]

        Outputs
        ------
        Outputs: the encoder outputs from the top layer
                [src length, batch size, hidden size * n_directions]
        hidden: the hidden state, [n_layers * n_directions, batch size, hidden size]
        cell: the cell state, [n_layers * n_directions, batch size, hidden size]
        """
        embedded = self.dropout(self.embedding(src))

        outputs, (hidden, cell) = self.lstm(embedded)

        return hidden, cell

#### Decoder

In [13]:
class Decoder(nn.Module):
    def __init__(self, input_size, embd_size, hidden_size, output_size, n_layers, drop):
        """
        Input :
            - first token in the target batch
            - LSTM hidden state from the encoder
            - LSTM cell state from the encoder
        Layer :
            target batch -> Embedding --
                                        |
            encoder hidden state ------ |--> LSTM -> Linear
                                        |
            encoder cell state   -------

        Output :
            - prediction
            - LSTM hidden state
            - LSTM cell state

        Parmeters
        ---------
        output_size : int
            Output dimension, should equal to the target vocab size.

        embd_size : int
            Embedding layer's dimension.

        hidden_size : int
            LSTM Hidden/Cell state's dimension.

        n_layers : int
            Number of LSTM layers.

        dropout : float
            Dropout for the LSTM layer.
        """

        super(Decoder, self).__init__()
        self.input_size = input_size
        self.output_size = output_size
        self.embd_size = embd_size
        self.hidden_size = hidden_size
        self.n_layers = n_layers
        self.dropout = nn.Dropout(drop)

        # self.embedding provides a vector representation of the target to our model
        self.embedding = nn.Embedding(input_size, embd_size)

        # self.lstm, accepts the embeddings and outputs a hidden state
        self.lstm = nn.LSTM(embd_size, hidden_size, n_layers, dropout=drop)

        # self.ouput, predicts on the hidden state via a linear output layer
        self.fcLayer = nn.Linear(hidden_size, output_size)

    def forward(self, target, hidden, cell):
        """

        Parameters
        ----------
        target : 1D torch.LongTensor
            Batched tokenized source sentence of shape [batch size].

        hidden, cell : 3D torch.FloatTensor
            Hidden and cell state of the LSTM layer. Each state's shape
            [n layers * n directions, batch size, hidden dim]

        Returns
        -------
        prediction : 2D torch.LongTensor
            For each token in the batch, the predicted target vobulary.
            [batch size, output dim]

        hidden, cell : 3D torch.FloatTensor
            Hidden and cell state of the LSTM layer. Each state's shape
            [n layers * n directions, batch size, hidden dim]
        """

        # [1, batch size, emb dim], the 1 serves as sent len
        target = target.unsqueeze(0)
        embedded = self.dropout(self.embedding(target))

        outputs, (hidden, cell) = self.lstm(embedded, (hidden, cell))

        prediction = self.fcLayer(outputs.squeeze(0))

        return prediction, hidden, cell

#### Seq2Seq

In [14]:
class Seq2Seq(nn.Module):
    """
    Parmeters
    ---------
    encoder : 
        Produces the context vectors for the decoder
    
    decoder : 
        Produces the predicted output sentence 
        
    device : 
        Places the tensors on the GPU if it is available else CPU
    """

    def __init__(self, encoder, decoder, device):
        super(Seq2Seq, self).__init__()

        self.encoder = encoder
        self.decoder = decoder
        self.device = device

        assert (
            encoder.hidden_size == decoder.hidden_size
        ), "Hidden dimensions of encoder and decoder must be equal!"
        assert (
            encoder.n_layers == decoder.n_layers
        ), "Encoder and decoder must have equal number of layers!"

    def forward(self, src, trg, teacher_forcing_ratio=0.5):
        trg_len, batch_size = trg.shape
        trg_vocab_size = self.decoder.output_size  # len(TRG.vocab)

        # 3D tensor to storing the decoder outputs
        outputs = torch.zeros(trg_len, batch_size, trg_vocab_size).to(self.device)

        # decoder initial hidden and cell state = last encoder's hidden and cell state
        hidden, cell = self.encoder(src)

        # first input to the decoder is the <sos> token
        input = trg[0, :]

        for t in range(1, trg_len):
            # inputs: input token embedding, previous hidden and previous cell states
            # outputs: prediction, hidden state, cell state
            prediction, hidden, cell = self.decoder(input, hidden, cell)

            # store the decoder result in the outputs tensor
            outputs[t] = prediction

            # applying the teacher force method based on the teacher_forcing_ratio
            teacher_force = np.random.random() < teacher_forcing_ratio

            if teacher_force:
                # use the actual next token as the next input
                input = trg[t]
            else:
                # select only the highest predicted token from the predictions
                top1 = prediction.argmax(1)
                input = top1

        return outputs

#### Model Parameters

In [15]:
source_vocab, target_vocab = buildVocab(SRC, TRG, train_data)

Source vocabulary size: 7853
Target vocabulary size: 5893
[('.', 28809), ('ein', 18851), ('einem', 13711), ('in', 11895), ('eine', 9909), (',', 8938), ('und', 8925), ('mit', 8843), ('auf', 8745), ('mann', 7805)]
[('a', 49165), ('.', 27623), ('in', 14886), ('the', 10955), ('on', 8035), ('man', 7781), ('is', 7525), ('and', 7379), ('of', 6871), ('with', 6179)]


In [16]:
# adjustable parameters
INPUT_DIM_ENC = len(source_vocab)
INPUT_DIM_DEC = len(target_vocab)
OUTPUT_DIM = len(target_vocab)
ENC_EMB_DIM = 300  # 256
DEC_EMB_DIM = 300  # 256
HIDDEN_SIZE = 1024  # 512
N_LAYERS = 2

ENC_DROPOUT = 0.5
DEC_DROPOUT = 0.5


learning_rate = 0.001 #0.01
NUM_EPOCHS = 40 #10
CLIP = 1

best_valid_loss = float("inf")  # initialize a best validation loss to beat
#PATIENCE = 0
#MAX_PATIENCE = 5

In [17]:
encoder = Encoder(INPUT_DIM_ENC, ENC_EMB_DIM, HIDDEN_SIZE, N_LAYERS, ENC_DROPOUT)
decoder = Decoder(
    INPUT_DIM_DEC, DEC_EMB_DIM, HIDDEN_SIZE, OUTPUT_DIM, N_LAYERS, DEC_DROPOUT
)
seq2seq = Seq2Seq(encoder, decoder, device).to(device)

In [18]:
seq2seq

Seq2Seq(
  (encoder): Encoder(
    (dropout): Dropout(p=0.5, inplace=False)
    (embedding): Embedding(7853, 300)
    (lstm): LSTM(300, 1024, num_layers=2, dropout=0.5)
  )
  (decoder): Decoder(
    (dropout): Dropout(p=0.5, inplace=False)
    (embedding): Embedding(5893, 300)
    (lstm): LSTM(300, 1024, num_layers=2, dropout=0.5)
    (fcLayer): Linear(in_features=1024, out_features=5893, bias=True)
  )
)

In [19]:
def model_params(model):
    return sum(p.numel() for p in model.parameters() if p.requires_grad)


print(f"The model has {model_params(seq2seq):,} trainable parameters")

The model has 37,820,317 trainable parameters


#### Training

In [20]:
# Optimizer
optimizer = optim.Adam(seq2seq.parameters(), lr=learning_rate)

#Learning rate scheduler 
#scheduler = ReduceLROnPlateau(optimizer, factor=0.1, patience=5, verbose=True)

# Loss function
TRG_PAD_IDX = TRG.vocab.stoi[TRG.pad_token]
criterion = nn.CrossEntropyLoss(ignore_index=TRG_PAD_IDX)

In [21]:
def train(model, batch_iterator, optimizer, criterion, clip):
    """ """
    model.train()

    epoch_loss = 0

    for _, batch in enumerate(batch_iterator):
        # getting the source and target sentences from the batch
        src = batch.src.to(device)  # [ batch size, src len]
        trg = batch.trg.to(device)  # [ batch size, trg len]

        output = model(src, trg)  # [ batch size, trg len, output size]

        # flattening the output and getting only the first column for calculating the loss
        flatten_output = output[1:].view(
            -1, output.shape[-1]
        )  # [trg len * batch size, output size]
        flatten_trg = trg[1:].view(-1)  # [trg len * batch size]

        optimizer.zero_grad()

        # calculate the loss
        loss = criterion(flatten_output, flatten_trg)

        # backward pass
        loss.backward()

        # clip the gradient to prevent exploding gradient problem
        nn.utils.clip_grad_norm_(model.parameters(), clip)

        # update the parameters
        optimizer.step()

        epoch_loss += loss.item()

    return epoch_loss / len(batch_iterator)

#### Evaluation

In [22]:
def evaluate(model, batch_iterator, criterion):
    model.eval()

    val_epoch_loss = 0

    with torch.no_grad():
        for _, batch in enumerate(batch_iterator):
            src = batch.src
            trg = batch.trg

            output = model(src, trg, 0)  # removing the teacher forcing

            # flattening the output and getting only the first column for calculating the loss
            flatten_output = output[1:].view(
                -1, output.shape[-1]
            )  # [trg len * batch size, output size]
            flatten_trg = trg[1:].view(-1)  # [trg len * batch size]

            # calculate the loss
            loss = criterion(flatten_output, flatten_trg)

            val_epoch_loss += loss.item()

    return val_epoch_loss / len(batch_iterator)

In [23]:
def save_checkpoint(state, filename="my_checkpoint.pth.tar"):
    print("=> Saving checkpoint")
    torch.save(state, filename)


def load_checkpoint(checkpoint, model, optimizer):
    print("=> Loading checkpoint")
    model.load_state_dict(checkpoint["state_dict"])
    optimizer.load_state_dict(checkpoint["optimizer"])

In [24]:
def epoch_time(start_time, end_time):
    elapsed_time = end_time - start_time
    elapsed_mins = int(elapsed_time / 60)
    elapsed_secs = int(elapsed_time - (elapsed_mins * 60))
    return elapsed_mins, elapsed_secs

In [25]:
for epoch in range(NUM_EPOCHS):
    start_time = time.time()

    train_loss = train(seq2seq, train_batch, optimizer, criterion, CLIP)

    valid_loss = evaluate(seq2seq, valid_batch, criterion)

    end_time = time.time()

    epoch_mins, epoch_secs = epoch_time(start_time, end_time)

    if valid_loss < best_valid_loss:
       best_valid_loss = valid_loss
       torch.save(seq2seq.state_dict(), "seq2seq_model.pt")

    print(f"Epoch: {epoch+1:02} | Time: {epoch_mins}m {epoch_secs}s")
    print(f"\tTrain Loss: {train_loss:.3f} | Train PPL: {math.exp(train_loss):7.3f}")
    #print(f"\t Val. Loss: {valid_loss:.3f} |  Val. PPL: {math.exp(valid_loss):7.3f}")

Epoch: 01 | Time: 2m 12s
	Train Loss: 4.557 | Train PPL:  95.330
Epoch: 02 | Time: 2m 12s
	Train Loss: 3.818 | Train PPL:  45.525
Epoch: 03 | Time: 2m 13s
	Train Loss: 3.458 | Train PPL:  31.742
Epoch: 04 | Time: 2m 13s
	Train Loss: 3.200 | Train PPL:  24.525
Epoch: 05 | Time: 2m 13s
	Train Loss: 2.987 | Train PPL:  19.833
Epoch: 06 | Time: 2m 12s
	Train Loss: 2.823 | Train PPL:  16.821
Epoch: 07 | Time: 2m 13s
	Train Loss: 2.685 | Train PPL:  14.661
Epoch: 08 | Time: 2m 13s
	Train Loss: 2.542 | Train PPL:  12.706
Epoch: 09 | Time: 2m 12s
	Train Loss: 2.411 | Train PPL:  11.149
Epoch: 10 | Time: 2m 13s
	Train Loss: 2.286 | Train PPL:   9.832
Epoch: 11 | Time: 2m 13s
	Train Loss: 2.189 | Train PPL:   8.926
Epoch: 12 | Time: 2m 13s
	Train Loss: 2.083 | Train PPL:   8.026
Epoch: 13 | Time: 2m 13s
	Train Loss: 1.998 | Train PPL:   7.377
Epoch: 14 | Time: 2m 14s
	Train Loss: 1.899 | Train PPL:   6.680
Epoch: 15 | Time: 2m 14s
	Train Loss: 1.821 | Train PPL:   6.179
Epoch: 16 | Time: 2m 12s


In [26]:
# Evaluating the trained model
seq2seq.load_state_dict(torch.load("seq2seq_model.pt"))

test_loss = evaluate(seq2seq, test_batch, criterion)

print(f"| Test Loss: {test_loss:.3f} | Test PPL: {math.exp(test_loss):7.3f} |")

| Test Loss: 3.646 | Test PPL:  38.334 |


### Results using the trained model

In [27]:
example_index = 17
example = test_data.examples[example_index]


print("source sentence: ", " ".join(example.src))
print("target sentence: ", " ".join(example.trg))

source sentence:  . schlange der in geschäft einem in steht baseballmütze schwarzen einer mit und pulli grauen einem in frau eine
target sentence:  a woman in a gray sweater and black baseball cap is standing in line at a shop .


In [28]:
src_tensor = SRC.process([example.src]).to(device)
trg_tensor = TRG.process([example.trg]).to(device)
print(trg_tensor.shape)

seq2seq.eval()
with torch.no_grad():
    outputs = seq2seq(src_tensor, trg_tensor, teacher_forcing_ratio=0)

outputs.shape

torch.Size([20, 1])


torch.Size([20, 1, 5893])

In [29]:
output_idx = outputs[1:].squeeze(1).argmax(1)
" ".join([TRG.vocab.itos[idx] for idx in output_idx])

'a woman in a black coat and a black hat is standing in a doorway in a city .'

### Code and Tutorial attribution

- Ben Trevett tutorial on understanding torchtext library and the machine translation task. https://github.com/bentrevett/pytorch-seq2seq
- Machine Translation using Recurrent Neural Network and PyTorch by Avinash Barnwal https://medium.com/analytics-vidhya/machine-translation-using-recurrent-neural-network-and-pytorch-implementation-5e59ca919e85