# Use OpenNMT-py to learn an NMT model for the dataset in tutorial 8

### 1. Transform to OpenNMT input format:

- one sentence per line
- seperate for input and output

In [15]:
from data_helper import *
train_path="data/fra_cleaned.txt"
valid_path="data/fra_cleaned.txt"

max_length = 10
train_data = read_translation_pairs_from_file(train_path, max_length=max_length)
valid_data = read_translation_pairs_from_file(valid_path, max_length=max_length)
valid_data = valid_data[::10]

In [19]:
import os
os.mkdir("translation_data_for_opennmt")
with open("translation_data_for_opennmt/train_input.txt", "w") as writer_in, open("translation_data_for_opennmt/train_output.txt", "w") as writer_out:
    for inputs, outputs in train_data:
        writer_in.writelines(" ".join(inputs)+"\n")
        writer_out.writelines(" ".join(outputs)+"\n")
with open("translation_data_for_opennmt/valid_input.txt", "w") as writer_in, open("translation_data_for_opennmt/valid_output.txt", "w") as writer_out:
    for inputs, outputs in valid_data:
        writer_in.writelines(" ".join(inputs)+"\n")
        writer_out.writelines(" ".join(outputs)+"\n")        

### 2. OpenNMT commands

The commands are the APIs for the OpenNMT-py in 2019.10's release.

Could be a little bit different than it is now for OpenNMT-py-2.0. The current framework prefers yaml files for defining the inputs.

In [None]:
python preprocess.py  \
        -train_src dataset/translation_data_for_opennmt/train_input.txt \
        -train_tgt dataset/translation_data_for_opennmt/train_output.txt \
        -valid_src dataset/translation_data_for_opennmt/valid_output.txt \
        -valid_tgt dataset/translation_data_for_opennmt/valid_input.txt \
        -save_data dataset/translation_data_for_opennmt/translation \
        -src_seq_length 10000 \
        -tgt_seq_length 10000 \
        -src_seq_length_trunc 20 \
        -tgt_seq_length_trunc 20 \
        -shard_size 100000 \
        -src_vocab_size 20000 \
        -tgt_vocab_size 20000 \
        -overwrite

- BiRNN encoder and RNN decoder
- attention
- bridge

In [None]:
CUDA_VISIBLE_DEVICES=3 python -u train.py -save_model models/translation_test \
           -data dataset/translation_data_for_opennmt/translation \
           -global_attention mlp \
           -word_vec_size 128 \
           -rnn_size 256 \
           -layers 1 \
           -encoder_type brnn \
           -train_steps 10000 \
           -max_grad_norm 2 \
           -dropout 0. \
           -batch_size 16 \
           -valid_batch_size 16 \
           -optim adagrad \
           -learning_rate 0.15 \
           -adagrad_accumulator_init 0.1 \
           -bridge \
           -seed 229 \
           -world_size 1 \
           -gpu_ranks 0 \
           -valid_steps 500

In [None]:
python translate.py -gpu 3 \
     -batch_size 20 \
     -beam_size 4 \
     -model models/translation_test_step_10000.pt \
     -src dataset/translation_data_for_opennmt/valid_input.txt \
     -output valid_decoding.txt \
     -min_length 1 \
     -max_length 15 \
     -verbose 

### 3. Checking the decoding results

valid_decoding.txt

In [28]:
val_decoding = [line.strip() for line in open("./valid_decoding.txt").readlines()]
val_ground = [line.strip() for line in open("./translation_data_for_opennmt/valid_output.txt").readlines()]
import nltk
def evaluate_bleu(target, output, weights=(0.25, 0.25, 0.25, 0.25)):
    assert len(target) == len(output)
    N = len(target)
    
    sum_bleu = 0.0
    for i in range(N):
        bleu = nltk.translate.bleu_score.sentence_bleu([target[i]], output[i], weights=weights)
        sum_bleu += bleu
    return sum_bleu / N
print(evaluate_bleu(val_ground, val_decoding))

0.8390290942622496


In [29]:
print(evaluate_bleu(val_ground, val_ground))

1.0


In [32]:
print("BLEU-4", evaluate_bleu(val_ground, val_decoding, weights=(0, 0, 0, 1)))

BLEU-4 0.8030787963970216


Way better than our naive implementation!

Reasons:
- The codes are better written
- Dedicated attention with better padding and masking mechanisms.
- More suitable optimizer: Adagrad turns out to be more efficient than Adam in rnn-based seq2seq models. For transformers, it's better to train with Adam with warming up steps.
- bridges: apply an MLP to the last output of the encoder as the input of the decoder.

Also, the hyperparameters are *The Chosen Params*.