# Neural Machine Translation with TensorFlow

[The TensorFlow team recently released a tutorial](https://github.com/tensorflow/nmt) on [neural machine translation](https://en.wikipedia.org/wiki/Neural_machine_translation). Their tutorial shows off some of the functionality in the [TensorFlow seq2seq library](https://www.tensorflow.org/api_guides/python/contrib.seq2seq).

The code presented in the tutorial is "lightweight, high-quality, production-ready, and incorporated with the latest research ideas." With a pitch like that, how could we not be interested?

This notebook will show you how to work with that model in Datalab.

## Getting access to the code

To begin with, we need to get their code from the [tensorflow/nmt](https://github.com/tensorflow/nmt) repository to the persistent disk attached to the [GCE instance](https://cloud.google.com/compute/docs/instances/) hosting this notebook.

Fortunately, any functionality that's available in a Jupyter notebooks is also available in Datalab. That means that we can access a shell on the instance through the notebook using the `!` symbol:

Let us clone [tensorflow/nmt](https://github.com/tensorflow/nmt) into the directory specified as below:

In [2]:
TFNMT_DIR = '/tmp/tf-nmt'

In [3]:
!git clone https://github.com/tensorflow/nmt $TFNMT_DIR

Cloning into '/tmp/tf-nmt'...
remote: Counting objects: 694, done.[K
remote: Compressing objects: 100% (20/20), done.[K
remote: Total 694 (delta 9), reused 7 (delta 3), pack-reused 671[K
Receiving objects: 100% (694/694), 940.15 KiB | 2.71 MiB/s, done.
Resolving deltas: 100% (458/458), done.


In order to run the code in that tutorial as-is from this notebook, we have to change the working directory to `tf-nmt`. This is done as follows:

## Running the translation models

Let us being with a modest goal: to get their pre-defined models working in this environment.

Once we have managed to do so, we can start thinking about messing with the guts of their models and defining our own.

We will do the following things:

1. Obtain training and test data

2. Train an inattentive model defined by tensorflow/nmt

3. Use the model we trained to perform inference

### Getting data

Fortunately, the tutorial authors have provided a script we can use to download the training data:

In [4]:
!cat $TFNMT_DIR/nmt/scripts/download_iwslt15.sh

#!/bin/sh
# Download small-scale IWSLT15 Vietnames to English translation data for NMT
# model training.
#
# Usage:
#   ./download_iwslt15.sh path-to-output-dir
#
# If output directory is not specified, "./iwslt15" will be used as the default
# output directory.
OUT_DIR="${1:-iwslt15}"
SITE_PREFIX="https://nlp.stanford.edu/projects/nmt/data"

mkdir -v -p $OUT_DIR

# Download iwslt15 small dataset from standford website.
echo "Download training dataset train.en and train.vi."
curl -o "$OUT_DIR/train.en" "$SITE_PREFIX/iwslt15.en-vi/train.en"
curl -o "$OUT_DIR/train.vi" "$SITE_PREFIX/iwslt15.en-vi/train.vi"

echo "Download dev dataset tst2012.en and tst2012.vi."
curl -o "$OUT_DIR/tst2012.en" "$SITE_PREFIX/iwslt15.en-vi/tst2012.en"
curl -o "$OUT_DIR/tst2012.vi" "$SITE_PREFIX/iwslt15.en-vi/tst2012.vi"

echo "Download test dataset tst2013.en and tst2013.vi."
curl -o "$OUT_DIR/tst2013.en" "$SITE_PREFIX/iwslt15.en-vi/tst2013.en"
curl -o "$OUT_DIR/tst2013.vi" "$SITE_PRE

Let us make a directory in which to store this data and then use the script to download it there:

In [5]:
DATA_DIR = '{}/data'.format(TFNMT_DIR)

In [7]:
!$TFNMT_DIR/nmt/scripts/download_iwslt15.sh $DATA_DIR

mkdir: created directory ‘/tmp/tf-nmt/data’
Download training dataset train.en and train.vi.
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 12.9M  100 12.9M    0     0  35.1M      0 --:--:-- --:--:-- --:--:-- 35.2M
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 17.2M  100 1   0    0     0      0      0 --:--:-- --:--:-- --:--:--     07.2M    0     0  53.8M      0 --:--:-- --:--:-- --:--:-- 54.0M
Download dev dataset tst2012.en and tst2012.vi.
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  136k  100  136k    0     0  2851k      0 --:--:-- --:--:-- --:--:-- 2914k
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
      

In [8]:
!ls $DATA_DIR

train.en  tst2012.en  tst2013.en  vocab.en
train.vi  tst2012.vi  tst2013.vi  vocab.vi


We now have the data we need to continue.

### Training

We begin by designating a directory into which TensorFlow can store the model checkpoints:

In [9]:
MODEL_DIR = '{}/model'.format(TFNMT_DIR)

Now we can run the training job as indicated in the NMT tutorial, with only a few modifications:

In [11]:
cd $TFNMT_DIR

/tmp/tf-nmt


In [12]:
!python -m nmt.nmt \
    --src=vi --tgt=en \
    --vocab_prefix=$DATA_DIR/vocab  \
    --train_prefix=$DATA_DIR/train \
    --dev_prefix=$DATA_DIR/tst2012  \
    --test_prefix=$DATA_DIR/tst2013 \
    --out_dir=$MODEL_DIR \
    --num_train_steps=12000 \
    --steps_per_stats=100 \
    --num_layers=2 \
    --num_units=128 \
    --dropout=0.2 \
    --metrics=bleu

# Job id 0
# hparams:
  src=vi
  tgt=en
  train_prefix=/tmp/tf-nmt/data/train
  dev_prefix=/tmp/tf-nmt/data/tst2012
  test_prefix=/tmp/tf-nmt/data/tst2013
  out_dir=/tmp/tf-nmt/model
# Vocab file /tmp/tf-nmt/data/vocab.vi exists
# Vocab file /tmp/tf-nmt/data/vocab.en exists
  saving hparams to /tmp/tf-nmt/model/hparams
  saving hparams to /tmp/tf-nmt/model/best_bleu/hparams
  attention=
  attention_architecture=standard
  batch_size=128
  beam_width=0
  best_bleu=0
  best_bleu_dir=/tmp/tf-nmt/model/best_bleu
  bpe_delimiter=None
  check_special_token=True
  colocate_gradients_with_ops=True
  decay_factor=0.98
  decay_steps=10000
  dev_prefix=/tmp/tf-nmt/data/tst2012
  dropout=0.2
  encoder_type=uni
  eos=</s>
  epoch_step=0
  forget_bias=1.0
  infer_batch_size=32
  init_op=uniform
  init_weight=0.1
  learning_rate=1.0
  learning_rate_warmup_factor=1.0
  learning_rate_warmup_steps=0
  length_penalty_weight=0.0
  log_device_placement=False
  max_gradient_norm=5.0
  max_train=0
  metrics=