# Creating Sequence to Sequence Models

----------------------------------

Here we show how to implement sequence to sequence models. Specifically, we will build an English to German translation model.

The code for this section has been upgraded to use the "Neural Machine Translation" models provided by the official TensorFlow repositories here:

https://github.com/tensorflow/nmt

This project will show you how to download, use/modify/add to the hyperparameters, and configure your own data to use the project files.

While the official tutorials show you how to do this via the commandline, this tutorial will show you how to use the internal code provided to train your own model from scratch.

We start by loading the necessary libraries:

In [1]:
import os
import re
import sys
import json
import math
import time
import string
import requests
import io
import numpy as np
import collections
import random
import pickle
import string
import matplotlib.pyplot as plt
import tensorflow as tf
from zipfile import ZipFile
from collections import Counter
from tensorflow.python.ops import lookup_ops
from tensorflow.python.framework import ops
ops.reset_default_graph()

local_repository = 'temp/seq2seq'

  return f(*args, **kwds)


The following block of code will import the whole NMT `models` repository into the temp folder.

In [2]:
# models can be retrieved from github: https://github.com/tensorflow/models.git
# put the models dir under python search lib path.

if not os.path.exists(local_repository):
    from git import Repo
    tf_model_repository = 'https://github.com/tensorflow/nmt/'
    Repo.clone_from(tf_model_repository, local_repository)
    sys.path.insert(0, 'temp/seq2seq/nmt/')

# May also try to use 'attention model' by importing the attention model:
# from temp.seq2seq.nmt import attention_model as attention_model
from temp.seq2seq.nmt import model as model
from temp.seq2seq.nmt.utils import vocab_utils as vocab_utils
import temp.seq2seq.nmt.model_helper as model_helper
import temp.seq2seq.nmt.utils.iterator_utils as iterator_utils
import temp.seq2seq.nmt.utils.misc_utils as utils
import temp.seq2seq.nmt.train as train

Next we set some parameters about the vocabulary size, what punctuation we'll remove and where the data will be stored.

In [3]:
# Model Parameters
vocab_size = 10000
punct = string.punctuation

# Data Parameters
data_dir = 'temp'
data_file = 'eng_ger.txt'
model_path = 'seq2seq_model'
full_model_dir = os.path.join(data_dir, model_path)

We will use the 'hyper-parameter' format that TensorFlow provides.  This type of parameter storage (in external json or xml files) allows us to iterate through different types of architectures (in different files) programatically. For this demonstration, we will use the wmt16.json provided to us and make a few changes below:

In [9]:
# Load hyper-parameters for translation model. (Good defaults are provided in Repository).
hparams = tf.contrib.training.HParams()
param_file = 'temp/seq2seq/nmt/standard_hparams/wmt16.json'
# Can also try: (For different architectures)
# 'temp/seq2seq/nmt/standard_hparams/iwslt15.json'
# 'temp/seq2seq/nmt/standard_hparams/wmt16_gnmt_4_layer.json',
# 'temp/seq2seq/nmt/standard_hparams/wmt16_gnmt_8_layer.json',

with open(param_file, "r") as f:
    params_json = json.loads(f.read())

for key, value in params_json.items():
    hparams.add_hparam(key, value)
hparams.add_hparam('num_gpus', 0)
hparams.add_hparam('num_encoder_layers', hparams.num_layers)
hparams.add_hparam('num_decoder_layers', hparams.num_layers)
hparams.add_hparam('num_encoder_residual_layers', 0)
hparams.add_hparam('num_decoder_residual_layers', 0)
hparams.add_hparam('init_op', 'uniform')
hparams.add_hparam('random_seed', None)
hparams.add_hparam('num_embeddings_partitions', 0)
hparams.add_hparam('warmup_steps', 0)
hparams.add_hparam('length_penalty_weight', 0)
hparams.add_hparam('sampling_temperature', 0.0)
hparams.add_hparam('num_translations_per_input', 1)
hparams.add_hparam('warmup_scheme', 't2t')
hparams.add_hparam('epoch_step', 0)
hparams.num_train_steps = 5000

# Not use any pretrained embeddings
hparams.add_hparam('src_embed_file', '')
hparams.add_hparam('tgt_embed_file', '')
hparams.add_hparam('num_keep_ckpts', 5)
hparams.add_hparam('avg_ckpts', False)

# Remove attention
hparams.attention = None

Make the model and data directories if they do not exist already.

In [4]:
# Make Model Directory
if not os.path.exists(full_model_dir):
    os.makedirs(full_model_dir)

# Make data directory
if not os.path.exists(data_dir):
    os.makedirs(data_dir)

Next we load the english-german translation data.  We either load from disk, or from the internet if it doesn't exist on disk. (And save it for future use).

In [5]:
print('Loading English-German Data')
# Check for data, if it doesn't exist, download it and save it
if not os.path.isfile(os.path.join(data_dir, data_file)):
    print('Data not found, downloading Eng-Ger sentences from www.manythings.org')
    sentence_url = 'http://www.manythings.org/anki/deu-eng.zip'
    r = requests.get(sentence_url)
    z = ZipFile(io.BytesIO(r.content))
    file = z.read('deu.txt')
    # Format Data
    eng_ger_data = file.decode()
    eng_ger_data = eng_ger_data.encode('ascii', errors='ignore')
    eng_ger_data = eng_ger_data.decode().split('\n')
    # Write to file
    with open(os.path.join(data_dir, data_file), 'w') as out_conn:
        for sentence in eng_ger_data:
            out_conn.write(sentence + '\n')
else:
    eng_ger_data = []
    with open(os.path.join(data_dir, data_file), 'r') as in_conn:
        for row in in_conn:
            eng_ger_data.append(row[:-1])
print('Done!')

Loading English-German Data
Data not found, downloading Eng-Ger sentences from www.manythings.org
Done!


Now we remove punctuation and split up the translation data into lists of words for both the english and german sentences.

In [6]:
# Remove punctuation
eng_ger_data = [''.join(char for char in sent if char not in punct) for sent in eng_ger_data]
# Split each sentence by tabs    
eng_ger_data = [x.split('\t') for x in eng_ger_data if len(x) >= 1]
[english_sentence, german_sentence] = [list(x) for x in zip(*eng_ger_data)]
english_sentence = [x.lower().split() for x in english_sentence]
german_sentence = [x.lower().split() for x in german_sentence]

In order to use the faster, data-pipeline functions from TensorFlow, we will want to write the formatted data to disk in an appropriate format.

The format that the translation models expect are in the form:

 - train_prefix.source_suffix = train.en
 - train_prefix.target_suffix = train.de
 - etc.. the suffix will determine the language (en = english, de = deutsch), and the prefix determines the type of dataset (train, test).

In [10]:
# We need to write them to separate text files for the text-line-dataset operations.
train_prefix = 'train'
src_suffix = 'en'  # English
tgt_suffix = 'de'  # Deutsch (German)
source_txt_file = train_prefix + '.' + src_suffix
hparams.add_hparam('src_file', source_txt_file)
target_txt_file = train_prefix + '.' + tgt_suffix
hparams.add_hparam('tgt_file', target_txt_file)
with open(source_txt_file, 'w') as f:
    for sent in english_sentence:
        f.write(' '.join(sent) + '\n')

with open(target_txt_file, 'w') as f:
    for sent in german_sentence:
        f.write(' '.join(sent) + '\n')

Next we need to parse off some (~100) testing sentence translations.  We arbitrarily choose around 100 sentences.  Then we also write them to the appropriate files.

In [11]:
# Partition some sentences off for testing files
test_prefix = 'test_sent'
hparams.add_hparam('dev_prefix', test_prefix)
hparams.add_hparam('train_prefix', train_prefix)
hparams.add_hparam('test_prefix', test_prefix)
hparams.add_hparam('src', src_suffix)
hparams.add_hparam('tgt', tgt_suffix)

num_sample = 100
total_samples = len(english_sentence)
# Get around 'num_sample's every so often in the src/tgt sentences
ix_sample = [x for x in range(total_samples) if x % (total_samples // num_sample) == 0]
test_src = [' '.join(english_sentence[x]) for x in ix_sample]
test_tgt = [' '.join(german_sentence[x]) for x in ix_sample]

# Write test sentences to file
with open(test_prefix + '.' + src_suffix, 'w') as f:
    for eng_test in test_src:
        f.write(eng_test + '\n')

with open(test_prefix + '.' + tgt_suffix, 'w') as f:
    for ger_test in test_src:
        f.write(ger_test + '\n')

Next we process the vocabularies of both the english and german sentences.  Then we save the vocabulary lists to the appropriate files.

In [12]:
print('Processing the vocabularies.')
# Process the English Vocabulary
all_english_words = [word for sentence in english_sentence for word in sentence]
all_english_counts = Counter(all_english_words)
eng_word_keys = [x[0] for x in all_english_counts.most_common(vocab_size-3)]  # -3 because UNK, S, /S is also in there
eng_vocab2ix = dict(zip(eng_word_keys, range(1, vocab_size)))
eng_ix2vocab = {val: key for key, val in eng_vocab2ix.items()}
english_processed = []
for sent in english_sentence:
    temp_sentence = []
    for word in sent:
        try:
            temp_sentence.append(eng_vocab2ix[word])
        except KeyError:
            temp_sentence.append(0)
    english_processed.append(temp_sentence)


# Process the German Vocabulary
all_german_words = [word for sentence in german_sentence for word in sentence]
all_german_counts = Counter(all_german_words)
ger_word_keys = [x[0] for x in all_german_counts.most_common(vocab_size-3)]  # -3 because UNK, S, /S is also in there
ger_vocab2ix = dict(zip(ger_word_keys, range(1, vocab_size)))
ger_ix2vocab = {val: key for key, val in ger_vocab2ix.items()}
german_processed = []
for sent in german_sentence:
    temp_sentence = []
    for word in sent:
        try:
            temp_sentence.append(ger_vocab2ix[word])
        except KeyError:
            temp_sentence.append(0)
    german_processed.append(temp_sentence)


# Save vocab files for data processing
source_vocab_file = 'vocab' + '.' + src_suffix
hparams.add_hparam('src_vocab_file', source_vocab_file)
eng_word_keys = ['<unk>', '<s>', '</s>'] + eng_word_keys

target_vocab_file = 'vocab' + '.' + tgt_suffix
hparams.add_hparam('tgt_vocab_file', target_vocab_file)
ger_word_keys = ['<unk>', '<s>', '</s>'] + ger_word_keys

# Write out all unique english words
with open(source_vocab_file, 'w') as f:
    for eng_word in eng_word_keys:
        f.write(eng_word + '\n')

# Write out all unique german words
with open(target_vocab_file, 'w') as f:
    for ger_word in ger_word_keys:
        f.write(ger_word + '\n')

# Add vocab size to hyper parameters
hparams.add_hparam('src_vocab_size', vocab_size)
hparams.add_hparam('tgt_vocab_size', vocab_size)

# Add out-directory
out_dir = 'temp/seq2seq/nmt_out'
hparams.add_hparam('out_dir', out_dir)
if not tf.gfile.Exists(out_dir):
    tf.gfile.MakeDirs(out_dir)


Processing the vocabularies.


We will be creating the training, inferring, and evaluation graphs separately next.

First we create the training graph.  We do this with a class and make the arguments a named-tuple.  This code is from the nmt repository. See the file in the repository named 'model_helper.py' for more.

In [13]:
class TrainGraph(collections.namedtuple("TrainGraph", ("graph", "model", "iterator", "skip_count_placeholder"))):
    pass


def create_train_graph(scope=None):
    graph = tf.Graph()
    with graph.as_default():
        src_vocab_table, tgt_vocab_table = vocab_utils.create_vocab_tables(hparams.src_vocab_file,
                                                                           hparams.tgt_vocab_file,
                                                                           share_vocab=False)

        src_dataset = tf.data.TextLineDataset(hparams.src_file)
        tgt_dataset = tf.data.TextLineDataset(hparams.tgt_file)
        skip_count_placeholder = tf.placeholder(shape=(), dtype=tf.int64)

        iterator = iterator_utils.get_iterator(src_dataset, tgt_dataset, src_vocab_table, tgt_vocab_table,
                                               batch_size=hparams.batch_size,
                                               sos=hparams.sos,
                                               eos=hparams.eos,
                                               random_seed=None,
                                               num_buckets=hparams.num_buckets,
                                               src_max_len=hparams.src_max_len,
                                               tgt_max_len=hparams.tgt_max_len,
                                               skip_count=skip_count_placeholder)
        final_model = model.Model(hparams,
                                  iterator=iterator,
                                  mode=tf.contrib.learn.ModeKeys.TRAIN,
                                  source_vocab_table=src_vocab_table,
                                  target_vocab_table=tgt_vocab_table,
                                  scope=scope)

    return TrainGraph(graph=graph, model=final_model, iterator=iterator, skip_count_placeholder=skip_count_placeholder)


train_graph = create_train_graph()

# creating train graph ...
  num_bi_layers = 2, num_bi_residual_layers=0
  cell 0  LSTM, forget_bias=1  DropoutWrapper, dropout=0.2   DeviceWrapper, device=/cpu:0
  cell 1  LSTM, forget_bias=1  DropoutWrapper, dropout=0.2   DeviceWrapper, device=/cpu:0
  cell 0  LSTM, forget_bias=1  DropoutWrapper, dropout=0.2   DeviceWrapper, device=/cpu:0
  cell 1  LSTM, forget_bias=1  DropoutWrapper, dropout=0.2   DeviceWrapper, device=/cpu:0
Instructions for updating:
seq_dim is deprecated, use seq_axis instead
Instructions for updating:
batch_dim is deprecated, use batch_axis instead
  cell 0  LSTM, forget_bias=1  DropoutWrapper, dropout=0.2   DeviceWrapper, device=/cpu:0
  cell 1  LSTM, forget_bias=1  DropoutWrapper, dropout=0.2   DeviceWrapper, device=/cpu:0
  cell 2  LSTM, forget_bias=1  DropoutWrapper, dropout=0.2   DeviceWrapper, device=/cpu:0
  cell 3  LSTM, forget_bias=1  DropoutWrapper, dropout=0.2   DeviceWrapper, device=/cpu:0
  learning_rate=1, warmup_steps=0, warmup_scheme=t2t
  decay_

We now do a very similar creation of the evaluation graph below.

In [14]:
# Create the evaluation graph
class EvalGraph(collections.namedtuple("EvalGraph", ("graph", "model", "src_file_placeholder", "tgt_file_placeholder",
                                                     "iterator"))):
    pass


def create_eval_graph(scope=None):
    graph = tf.Graph()

    with graph.as_default():
        src_vocab_table, tgt_vocab_table = vocab_utils.create_vocab_tables(
            hparams.src_vocab_file, hparams.tgt_vocab_file, hparams.share_vocab)
        src_file_placeholder = tf.placeholder(shape=(), dtype=tf.string)
        tgt_file_placeholder = tf.placeholder(shape=(), dtype=tf.string)
        src_dataset = tf.data.TextLineDataset(src_file_placeholder)
        tgt_dataset = tf.data.TextLineDataset(tgt_file_placeholder)
        iterator = iterator_utils.get_iterator(
            src_dataset,
            tgt_dataset,
            src_vocab_table,
            tgt_vocab_table,
            hparams.batch_size,
            sos=hparams.sos,
            eos=hparams.eos,
            random_seed=hparams.random_seed,
            num_buckets=hparams.num_buckets,
            src_max_len=hparams.src_max_len_infer,
            tgt_max_len=hparams.tgt_max_len_infer)
        final_model = model.Model(hparams,
                                  iterator=iterator,
                                  mode=tf.contrib.learn.ModeKeys.EVAL,
                                  source_vocab_table=src_vocab_table,
                                  target_vocab_table=tgt_vocab_table,
                                  scope=scope)
    return EvalGraph(graph=graph,
                     model=final_model,
                     src_file_placeholder=src_file_placeholder,
                     tgt_file_placeholder=tgt_file_placeholder,
                     iterator=iterator)


eval_graph = create_eval_graph()

# creating eval graph ...
  num_bi_layers = 2, num_bi_residual_layers=0
  cell 0  LSTM, forget_bias=1  DeviceWrapper, device=/cpu:0
  cell 1  LSTM, forget_bias=1  DeviceWrapper, device=/cpu:0
  cell 0  LSTM, forget_bias=1  DeviceWrapper, device=/cpu:0
  cell 1  LSTM, forget_bias=1  DeviceWrapper, device=/cpu:0
  cell 0  LSTM, forget_bias=1  DeviceWrapper, device=/cpu:0
  cell 1  LSTM, forget_bias=1  DeviceWrapper, device=/cpu:0
  cell 2  LSTM, forget_bias=1  DeviceWrapper, device=/cpu:0
  cell 3  LSTM, forget_bias=1  DeviceWrapper, device=/cpu:0
# Trainable variables
  embeddings/encoder/embedding_encoder:0, (10000, 1024), /device:GPU:0
  embeddings/decoder/embedding_decoder:0, (10000, 1024), /device:GPU:0
  dynamic_seq2seq/encoder/bidirectional_rnn/fw/multi_rnn_cell/cell_0/basic_lstm_cell/kernel:0, (2048, 4096), /device:CPU:0
  dynamic_seq2seq/encoder/bidirectional_rnn/fw/multi_rnn_cell/cell_0/basic_lstm_cell/bias:0, (4096,), /device:CPU:0
  dynamic_seq2seq/encoder/bidirectional_rnn/f

And now the same for the inference graph:

In [15]:
# Inference graph
class InferGraph(
    collections.namedtuple("InferGraph", ("graph", "model", "src_placeholder", "batch_size_placeholder", "iterator"))):
    pass


def create_infer_graph(scope=None):
    graph = tf.Graph()
    with graph.as_default():
        src_vocab_table, tgt_vocab_table = vocab_utils.create_vocab_tables(hparams.src_vocab_file,
                                                                           hparams.tgt_vocab_file,
                                                                           hparams.share_vocab)
        reverse_tgt_vocab_table = lookup_ops.index_to_string_table_from_file(hparams.tgt_vocab_file,
                                                                             default_value=vocab_utils.UNK)

        src_placeholder = tf.placeholder(shape=[None], dtype=tf.string)
        batch_size_placeholder = tf.placeholder(shape=[], dtype=tf.int64)
        src_dataset = tf.data.Dataset.from_tensor_slices(src_placeholder)
        iterator = iterator_utils.get_infer_iterator(src_dataset,
                                                     src_vocab_table,
                                                     batch_size=batch_size_placeholder,
                                                     eos=hparams.eos,
                                                     src_max_len=hparams.src_max_len_infer)
        final_model = model.Model(hparams,
                                  iterator=iterator,
                                  mode=tf.contrib.learn.ModeKeys.INFER,
                                  source_vocab_table=src_vocab_table,
                                  target_vocab_table=tgt_vocab_table,
                                  reverse_target_vocab_table=reverse_tgt_vocab_table,
                                  scope=scope)
    return InferGraph(graph=graph,
                      model=final_model,
                      src_placeholder=src_placeholder,
                      batch_size_placeholder=batch_size_placeholder,
                      iterator=iterator)


infer_graph = create_infer_graph()

# creating infer graph ...
  num_bi_layers = 2, num_bi_residual_layers=0
  cell 0  LSTM, forget_bias=1  DeviceWrapper, device=/cpu:0
  cell 1  LSTM, forget_bias=1  DeviceWrapper, device=/cpu:0
  cell 0  LSTM, forget_bias=1  DeviceWrapper, device=/cpu:0
  cell 1  LSTM, forget_bias=1  DeviceWrapper, device=/cpu:0
  cell 0  LSTM, forget_bias=1  DeviceWrapper, device=/cpu:0
  cell 1  LSTM, forget_bias=1  DeviceWrapper, device=/cpu:0
  cell 2  LSTM, forget_bias=1  DeviceWrapper, device=/cpu:0
  cell 3  LSTM, forget_bias=1  DeviceWrapper, device=/cpu:0
# Trainable variables
  embeddings/encoder/embedding_encoder:0, (10000, 1024), /device:GPU:0
  embeddings/decoder/embedding_decoder:0, (10000, 1024), /device:GPU:0
  dynamic_seq2seq/encoder/bidirectional_rnn/fw/multi_rnn_cell/cell_0/basic_lstm_cell/kernel:0, (2048, 4096), /device:CPU:0
  dynamic_seq2seq/encoder/bidirectional_rnn/fw/multi_rnn_cell/cell_0/basic_lstm_cell/bias:0, (4096,), /device:CPU:0
  dynamic_seq2seq/encoder/bidirectional_rnn/

For more illustrative output during training, we provide a short list of arbitrary source/target translations that will be output during training iterations.

In [17]:
# Create sample data for evaluation
sample_ix = [25, 125, 240, 450]
sample_src_data = [' '.join(english_sentence[x]) for x in sample_ix]
sample_tgt_data = [' '.join(german_sentence[x]) for x in sample_ix]
print([x for x in zip(sample_src_data, sample_tgt_data)])

[('hug me', 'drck mich'), ('goodbye', 'auf wiedersehen'), ('get lost', 'mach dich fort'), ('come over', 'komm hierher')]


Next we load the training graph.

In [18]:
config_proto = utils.get_config_proto()

train_sess = tf.Session(config=config_proto, graph=train_graph.graph)
eval_sess = tf.Session(config=config_proto, graph=eval_graph.graph)
infer_sess = tf.Session(config=config_proto, graph=infer_graph.graph)

# Load the training graph
with train_graph.graph.as_default():
    loaded_train_model, global_step = model_helper.create_or_load_model(train_graph.model,
                                                                        hparams.out_dir,
                                                                        train_sess,
                                                                        "train")


summary_writer = tf.summary.FileWriter(os.path.join(hparams.out_dir, 'Training'), train_graph.graph)

  created train model with fresh parameters, time 0.36s


Load the evaluation graph:

In [19]:
for metric in hparams.metrics:
    hparams.add_hparam("best_" + metric, 0)
    best_metric_dir = os.path.join(hparams.out_dir, "best_" + metric)
    hparams.add_hparam("best_" + metric + "_dir", best_metric_dir)
    tf.gfile.MakeDirs(best_metric_dir)


eval_output = train.run_full_eval(hparams.out_dir, infer_graph, infer_sess, eval_graph, eval_sess,
                                  hparams, summary_writer, sample_src_data, sample_tgt_data)

eval_results, _, acc_blue_scores = eval_output

  created infer model with fresh parameters, time 0.27s
  # 0
    src: hug me
    ref: drck mich
    nmt: kaninchen kaninchen kaninchen notwendig
  created eval model with fresh parameters, time 0.30s
  eval dev: perplexity 10735.97, time 2s, Mon Jul 30 20:20:23 2018.
  eval test: perplexity 10735.97, time 2s, Mon Jul 30 20:20:25 2018.
  created infer model with fresh parameters, time 0.30s


Now we can initialize the training:

 - set the global training step.
 - initialize the training time.
 - initialize the training graph.

In [20]:
# Training Initialization
last_stats_step = global_step
last_eval_step = global_step
last_external_eval_step = global_step

steps_per_eval = 10 * hparams.steps_per_stats
steps_per_external_eval = 5 * steps_per_eval

avg_step_time = 0.0
step_time, checkpoint_loss, checkpoint_predict_count = 0.0, 0.0, 0.0
checkpoint_total_count = 0.0
speed, train_ppl = 0.0, 0.0

utils.print_out("# Start step %d, lr %g, %s" %
                (global_step, loaded_train_model.learning_rate.eval(session=train_sess),
                 time.ctime()))
skip_count = hparams.batch_size * hparams.epoch_step
utils.print_out("# Init train iterator, skipping %d elements" % skip_count)

train_sess.run(train_graph.iterator.initializer,
              feed_dict={train_graph.skip_count_placeholder: skip_count})

# Start step 0, lr 1, Mon Jul 30 20:21:56 2018
# Init train iterator, skipping 0 elements


Now we start the training!!  This may take a while to run (~ 12 hours run time on a Intel Core 1-7 CPU, 16GB RAM), but may run faster on a GPU setup.

In [None]:
# Run training
while global_step < hparams.num_train_steps:
    start_time = time.time()
    try:
        step_result = loaded_train_model.train(train_sess)
        (_, step_loss, step_predict_count, step_summary, global_step, step_word_count,
         batch_size, __, ___) = step_result
        hparams.epoch_step += 1
    except tf.errors.OutOfRangeError:
        # Next Epoch
        hparams.epoch_step = 0
        utils.print_out("# Finished an epoch, step %d. Perform external evaluation" % global_step)
        train.run_sample_decode(infer_graph,
                                infer_sess,
                                hparams.out_dir,
                                hparams,
                                summary_writer,
                                sample_src_data,
                                sample_tgt_data)
        dev_scores, test_scores, _ = train.run_external_eval(infer_graph,
                                                             infer_sess,
                                                             hparams.out_dir,
                                                             hparams,
                                                             summary_writer)
        train_sess.run(train_graph.iterator.initializer, feed_dict={train_graph.skip_count_placeholder: 0})
        continue

    summary_writer.add_summary(step_summary, global_step)

    # Statistics
    step_time += (time.time() - start_time)
    checkpoint_loss += (step_loss * batch_size)
    checkpoint_predict_count += step_predict_count
    checkpoint_total_count += float(step_word_count)

    # print statistics
    if global_step - last_stats_step >= hparams.steps_per_stats:
        last_stats_step = global_step
        avg_step_time = step_time / hparams.steps_per_stats
        train_ppl = utils.safe_exp(checkpoint_loss / checkpoint_predict_count)
        speed = checkpoint_total_count / (1000 * step_time)

        utils.print_out("  global step %d lr %g "
                        "step-time %.2fs wps %.2fK ppl %.2f %s" %
                        (global_step,
                         loaded_train_model.learning_rate.eval(session=train_sess),
                         avg_step_time, speed, train_ppl, train._get_best_results(hparams)))

        if math.isnan(train_ppl):
            break

        # Reset timer and loss.
        step_time, checkpoint_loss, checkpoint_predict_count = 0.0, 0.0, 0.0
        checkpoint_total_count = 0.0

    if global_step - last_eval_step >= steps_per_eval:
        last_eval_step = global_step
        utils.print_out("# Save eval, global step %d" % global_step)
        utils.add_summary(summary_writer, global_step, "train_ppl", train_ppl)

        # Save checkpoint
        loaded_train_model.saver.save(train_sess, os.path.join(hparams.out_dir, "translate.ckpt"),
                                      global_step=global_step)

        # Evaluate on dev/test
        train.run_sample_decode(infer_graph,
                                infer_sess,
                                out_dir,
                                hparams,
                                summary_writer,
                                sample_src_data,
                                sample_tgt_data)
        dev_ppl, test_ppl = train.run_internal_eval(eval_graph,
                                                    eval_sess,
                                                    out_dir,
                                                    hparams,
                                                    summary_writer)

    if global_step - last_external_eval_step >= steps_per_external_eval:
        last_external_eval_step = global_step

        # Save checkpoint
        loaded_train_model.saver.save(train_sess, os.path.join(hparams.out_dir, "translate.ckpt"),
                                      global_step=global_step)

        train.run_sample_decode(infer_graph,
                                infer_sess,
                                out_dir,
                                hparams,
                                summary_writer,
                                sample_src_data,
                                sample_tgt_data)
        dev_scores, test_scores, _ = train.run_external_eval(infer_graph,
                                                             infer_sess,
                                                             out_dir,
                                                             hparams,
                                                             summary_writer)

You may look for similar output to the following:
```
  global step 102 lr 1 step-time 6.48s wps 0.24K ppl 1661.30 bleu 0.00
  global step 202 lr 1 step-time 6.48s wps 0.25K ppl 282.66 bleu 0.00
  global step 302 lr 1 step-time 6.71s wps 0.26K ppl 205.97 bleu 0.00
  global step 402 lr 1 step-time 7.47s wps 0.24K ppl 170.30 bleu 0.00
  global step 502 lr 1 step-time 7.51s wps 0.24K ppl 135.71 bleu 0.00
  global step 602 lr 1 step-time 7.59s wps 0.24K ppl 116.17 bleu 0.00
  global step 702 lr 1 step-time 7.55s wps 0.24K ppl 97.85 bleu 0.00
  global step 802 lr 1 step-time 7.76s wps 0.24K ppl 86.67 bleu 0.00
  global step 902 lr 1 step-time 7.94s wps 0.23K ppl 72.19 bleu 0.00
  global step 1002 lr 1 step-time 7.75s wps 0.24K ppl 66.03 bleu 0.00
# Save eval, global step 1002
INFO:tensorflow:Restoring parameters from temp/seq2seq/nmt_out/translate.ckpt-1002
2018-07-30 00:22:18.155019: I tensorflow/core/kernels/lookup_util.cc:373] Table trying to initialize from file vocab.en is already initialized.
2018-07-30 00:22:18.155026: I tensorflow/core/kernels/lookup_util.cc:373] Table trying to initialize from file vocab.de is already initialized.
2018-07-30 00:22:18.155053: I tensorflow/core/kernels/lookup_util.cc:373] Table trying to initialize from file vocab.de is already initialized.
  loaded infer model parameters from temp/seq2seq/nmt_out/translate.ckpt-1002, time 0.31s
  # 3
    src: come over
    ref: komm hierher
    nmt: komm auf
INFO:tensorflow:Restoring parameters from temp/seq2seq/nmt_out/translate.ckpt-1002
2018-07-30 00:22:18.637235: I tensorflow/core/kernels/lookup_util.cc:373] Table trying to initialize from file vocab.de is already initialized.
2018-07-30 00:22:18.637272: I tensorflow/core/kernels/lookup_util.cc:373] Table trying to initialize from file vocab.en is already initialized.
  loaded eval model parameters from temp/seq2seq/nmt_out/translate.ckpt-1002, time 0.31s
  eval dev: perplexity 68.29, time 2s, Mon Jul 30 00:22:21 2018.
  eval test: perplexity 68.29, time 2s, Mon Jul 30 00:22:23 2018.
  global step 1102 lr 1 step-time 7.60s wps 0.24K ppl 57.65 bleu 0.00
  global step 1202 lr 1 step-time 7.77s wps 0.24K ppl 54.63 bleu 0.00
  global step 1302 lr 1 step-time 7.86s wps 0.23K ppl 46.55 bleu 0.00
# Finished an epoch, step 1332. Perform external evaluation
INFO:tensorflow:Restoring parameters from temp/seq2seq/nmt_out/translate.ckpt-1002
2018-07-30 01:04:55.469826: I tensorflow/core/kernels/lookup_util.cc:373] Table trying to initialize from file vocab.de is already initialized.
2018-07-30 01:04:55.469826: I tensorflow/core/kernels/lookup_util.cc:373] Table trying to initialize from file vocab.de is already initialized.
2018-07-30 01:04:55.469827: I tensorflow/core/kernels/lookup_util.cc:373] Table trying to initialize from file vocab.en is already initialized.
  loaded infer model parameters from temp/seq2seq/nmt_out/translate.ckpt-1002, time 0.32s
  # 2
    src: get lost
    ref: mach dich fort
    nmt: <unk> sie
INFO:tensorflow:Restoring parameters from temp/seq2seq/nmt_out/translate.ckpt-1002
2018-07-30 01:04:55.921978: I tensorflow/core/kernels/lookup_util.cc:373] Table trying to initialize from file vocab.de is already initialized.
  loaded infer model parameters from temp/seq2seq/nmt_out/translate.ckpt-1002, time 0.32s
# External evaluation, global step 1002
2018-07-30 01:04:55.921978: I tensorflow/core/kernels/lookup_util.cc:373] Table trying to initialize from file vocab.de is already initialized.
2018-07-30 01:04:55.921980: I tensorflow/core/kernels/lookup_util.cc:373] Table trying to initialize from file vocab.en is already initialized.
  decoding to output temp/seq2seq/nmt_out/output_dev.
  done, num sentences 101, num translations per input 1, time 12s, Mon Jul 30 01:05:08 2018.
  bleu dev: 0.0
  saving hparams to temp/seq2seq/nmt_out/hparams
# External evaluation, global step 1002
  decoding to output temp/seq2seq/nmt_out/output_test.
  done, num sentences 101, num translations per input 1, time 14s, Mon Jul 30 01:05:22 2018.
  bleu test: 0.0
  saving hparams to temp/seq2seq/nmt_out/hparams
  global step 1402 lr 1 step-time 6.79s wps 0.23K ppl 31.64 bleu 0.00
  global step 1502 lr 1 step-time 6.48s wps 0.25K ppl 26.35 bleu 0.00
  global step 1602 lr 1 step-time 7.00s wps 0.24K ppl 26.87 bleu 0.00
  global step 1702 lr 1 step-time 7.47s wps 0.24K ppl 30.74 bleu 0.00
  global step 1802 lr 1 step-time 7.86s wps 0.23K ppl 31.18 bleu 0.00
  global step 1902 lr 1 step-time 7.84s wps 0.24K ppl 30.12 bleu 0.00
  global step 2002 lr 1 step-time 7.76s wps 0.23K ppl 26.81 bleu 0.00
# Save eval, global step 2002
INFO:tensorflow:Restoring parameters from temp/seq2seq/nmt_out/translate.ckpt-2002
2018-07-30 02:26:54.592550: I tensorflow/core/kernels/lookup_util.cc:373] Table trying to initialize from file vocab.en is already initialized.
2018-07-30 02:26:54.592550: I tensorflow/core/kernels/lookup_util.cc:373] Table trying to initialize from file vocab.de is already initialized.
2018-07-30 02:26:54.592550: I tensorflow/core/kernels/lookup_util.cc:373] Table trying to initialize from file vocab.de is already initialized.
  loaded infer model parameters from temp/seq2seq/nmt_out/translate.ckpt-2002, time 0.27s
  # 2
    src: get lost
    ref: mach dich fort
    nmt: <unk> dich
INFO:tensorflow:Restoring parameters from temp/seq2seq/nmt_out/translate.ckpt-2002
2018-07-30 02:26:55.015783: I tensorflow/core/kernels/lookup_util.cc:373] Table trying to initialize from file vocab.de is already initialized.
2018-07-30 02:26:55.015783: I tensorflow/core/kernels/lookup_util.cc:373] Table trying to initialize from file vocab.en is already initialized.
  loaded eval model parameters from temp/seq2seq/nmt_out/translate.ckpt-2002, time 0.26s
  eval dev: perplexity 164.59, time 3s, Mon Jul 30 02:26:58 2018.
  eval test: perplexity 164.59, time 3s, Mon Jul 30 02:27:01 2018.
  global step 2102 lr 1 step-time 7.63s wps 0.24K ppl 25.65 bleu 0.00
  global step 2202 lr 1 step-time 7.86s wps 0.23K ppl 25.78 bleu 0.00
  global step 2302 lr 1 step-time 7.84s wps 0.23K ppl 23.49 bleu 0.00
  global step 2402 lr 1 step-time 7.82s wps 0.24K ppl 23.29 bleu 0.00
  global step 2502 lr 1 step-time 7.63s wps 0.24K ppl 20.36 bleu 0.00
  global step 2602 lr 1 step-time 7.89s wps 0.23K ppl 20.28 bleu 0.00
# Finished an epoch, step 2662. Perform external evaluation
INFO:tensorflow:Restoring parameters from temp/seq2seq/nmt_out/translate.ckpt-2002
  loaded infer model parameters from temp/seq2seq/nmt_out/translate.ckpt-2002, time 0.30s
  # 2
2018-07-30 03:52:25.999086: I tensorflow/core/kernels/lookup_util.cc:373] Table trying to initialize from file vocab.de is already initialized.
2018-07-30 03:52:25.999092: I tensorflow/core/kernels/lookup_util.cc:373] Table trying to initialize from file vocab.de is already initialized.
2018-07-30 03:52:25.999119: I tensorflow/core/kernels/lookup_util.cc:373] Table trying to initialize from file vocab.en is already initialized.
    src: get lost
    ref: mach dich fort
    nmt: <unk> dich
INFO:tensorflow:Restoring parameters from temp/seq2seq/nmt_out/translate.ckpt-2002
2018-07-30 03:52:26.473327: I tensorflow/core/kernels/lookup_util.cc:373] Table trying to initialize from file vocab.de is already initialized.
2018-07-30 03:52:26.473327: I tensorflow/core/kernels/lookup_util.cc:373] Table trying to initialize from file vocab.de is already initialized.
2018-07-30 03:52:26.473331: I tensorflow/core/kernels/lookup_util.cc:373] Table trying to initialize from file vocab.en is already initialized.
  loaded infer model parameters from temp/seq2seq/nmt_out/translate.ckpt-2002, time 0.30s
# External evaluation, global step 2002
  decoding to output temp/seq2seq/nmt_out/output_dev.
  done, num sentences 101, num translations per input 1, time 16s, Mon Jul 30 03:52:42 2018.
  bleu dev: 0.0
  saving hparams to temp/seq2seq/nmt_out/hparams
# External evaluation, global step 2002
  decoding to output temp/seq2seq/nmt_out/output_test.
  done, num sentences 101, num translations per input 1, time 16s, Mon Jul 30 03:52:59 2018.
  bleu test: 0.0
  saving hparams to temp/seq2seq/nmt_out/hparams
  global step 2702 lr 1 step-time 7.29s wps 0.23K ppl 14.87 bleu 0.00
  global step 2802 lr 0.5 step-time 6.50s wps 0.24K ppl 9.26 bleu 0.00
  global step 2902 lr 0.5 step-time 6.65s wps 0.25K ppl 9.15 bleu 0.00
  global step 3002 lr 0.25 step-time 7.27s wps 0.24K ppl 10.38 bleu 0.00
# Save eval, global step 3002
INFO:tensorflow:Restoring parameters from temp/seq2seq/nmt_out/translate.ckpt-3002
2018-07-30 04:31:33.978539: I tensorflow/core/kernels/lookup_util.cc:373] Table trying to initialize from file vocab.en is already initialized.
2018-07-30 04:31:33.978539: I tensorflow/core/kernels/lookup_util.cc:373] Table trying to initialize from file vocab.de is already initialized.
2018-07-30 04:31:33.978539: I tensorflow/core/kernels/lookup_util.cc:373] Table trying to initialize from file vocab.de is already initialized.
  loaded infer model parameters from temp/seq2seq/nmt_out/translate.ckpt-3002, time 0.27s
  # 3
    src: come over
    ref: komm hierher
    nmt: kommt
INFO:tensorflow:Restoring parameters from temp/seq2seq/nmt_out/translate.ckpt-3002
2018-07-30 04:31:34.377060: I tensorflow/core/kernels/lookup_util.cc:373] Table trying to initialize from file vocab.en is already initialized.
2018-07-30 04:31:34.377101: I tensorflow/core/kernels/lookup_util.cc:373] Table trying to initialize from file vocab.de is already initialized.
  loaded eval model parameters from temp/seq2seq/nmt_out/translate.ckpt-3002, time 0.27s
  eval dev: perplexity 296.65, time 3s, Mon Jul 30 04:31:37 2018.
  eval test: perplexity 296.65, time 2s, Mon Jul 30 04:31:40 2018.
  global step 3102 lr 0.25 step-time 7.68s wps 0.24K ppl 10.89 bleu 0.00
  global step 3202 lr 0.25 step-time 7.81s wps 0.24K ppl 11.07 bleu 0.00
  global step 3302 lr 0.125 step-time 7.64s wps 0.24K ppl 9.78 bleu 0.00
  global step 3402 lr 0.125 step-time 7.85s wps 0.24K ppl 10.30 bleu 0.00
  global step 3502 lr 0.0625 step-time 7.76s wps 0.23K ppl 9.66 bleu 0.00
  global step 3602 lr 0.0625 step-time 7.64s wps 0.24K ppl 9.43 bleu 0.00
  global step 3702 lr 0.0625 step-time 7.83s wps 0.24K ppl 10.13 bleu 0.00
  global step 3802 lr 0.03125 step-time 7.63s wps 0.24K ppl 9.35 bleu 0.00
  global step 3902 lr 0.03125 step-time 7.65s wps 0.24K ppl 9.89 bleu 0.00
# Finished an epoch, step 3992. Perform external evaluation
INFO:tensorflow:Restoring parameters from temp/seq2seq/nmt_out/translate.ckpt-3002
  loaded infer model parameters from temp/seq2seq/nmt_out/translate.ckpt-3002, time 0.33s
  # 1
2018-07-30 06:39:49.759226: I tensorflow/core/kernels/lookup_util.cc:373] Table trying to initialize from file vocab.en is already initialized.
2018-07-30 06:39:49.759226: I tensorflow/core/kernels/lookup_util.cc:373] Table trying to initialize from file vocab.de is already initialized.
2018-07-30 06:39:49.759226: I tensorflow/core/kernels/lookup_util.cc:373] Table trying to initialize from file vocab.de is already initialized.
    src: goodbye
    ref: auf wiedersehen
    nmt: <unk>
INFO:tensorflow:Restoring parameters from temp/seq2seq/nmt_out/translate.ckpt-3002
2018-07-30 06:39:50.172528: I tensorflow/core/kernels/lookup_util.cc:373] Table trying to initialize from file vocab.en is already initialized.
2018-07-30 06:39:50.172528: I tensorflow/core/kernels/lookup_util.cc:373] Table trying to initialize from file vocab.de is already initialized.
  loaded infer model parameters from temp/seq2seq/nmt_out/translate.ckpt-3002, time 0.32s
2018-07-30 06:39:50.172528: I tensorflow/core/kernels/lookup_util.cc:373] Table trying to initialize from file vocab.de is already initialized.
# External evaluation, global step 3002
  decoding to output temp/seq2seq/nmt_out/output_dev.
  done, num sentences 101, num translations per input 1, time 14s, Mon Jul 30 06:40:05 2018.
  bleu dev: 0.0
  saving hparams to temp/seq2seq/nmt_out/hparams
# External evaluation, global step 3002
  decoding to output temp/seq2seq/nmt_out/output_test.
  done, num sentences 101, num translations per input 1, time 13s, Mon Jul 30 06:40:19 2018.
  bleu test: 0.0
  saving hparams to temp/seq2seq/nmt_out/hparams
  global step 4002 lr 0.015625 step-time 8.15s wps 0.22K ppl 9.38 bleu 0.00
# Save eval, global step 4002
INFO:tensorflow:Restoring parameters from temp/seq2seq/nmt_out/translate.ckpt-4002
  loaded infer model parameters from temp/seq2seq/nmt_out/translate.ckpt-4002, time 0.29s
2018-07-30 06:41:33.303050: I tensorflow/core/kernels/lookup_util.cc:373] Table trying to initialize from file vocab.en is already initialized.
  # 1
2018-07-30 06:41:33.303078: I tensorflow/core/kernels/lookup_util.cc:373] Table trying to initialize from file vocab.de is already initialized.
2018-07-30 06:41:33.303080: I tensorflow/core/kernels/lookup_util.cc:373] Table trying to initialize from file vocab.de is already initialized.
    src: goodbye
    ref: auf wiedersehen
    nmt: <unk>
INFO:tensorflow:Restoring parameters from temp/seq2seq/nmt_out/translate.ckpt-4002
2018-07-30 06:41:33.653274: I tensorflow/core/kernels/lookup_util.cc:373] Table trying to initialize from file vocab.en is already initialized.
  loaded eval model parameters from temp/seq2seq/nmt_out/translate.ckpt-4002, time 0.26s
2018-07-30 06:41:33.653296: I tensorflow/core/kernels/lookup_util.cc:373] Table trying to initialize from file vocab.de is already initialized.
  eval dev: perplexity 342.19, time 3s, Mon Jul 30 06:41:36 2018.
  eval test: perplexity 342.19, time 3s, Mon Jul 30 06:41:40 2018.
```