<a href="https://colab.research.google.com/github/gorkemozkaya/nmt-en-tr/blob/master/Turkish_English_NMT.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Running the pre-trained NMT models in Python

This notebook illustrates how one can load the pre-trained models and run it on new Turkish or English sentences for translation. It is mostly based on [this](https://colab.research.google.com/github/tensorflow/tensor2tensor/blob/master/tensor2tensor/notebooks/hello_t2t.ipynb) tensor2tensor notebook from the official [tensor2tensor](https://github.com/tensorflow/tensor2tensor) repository. 

In [1]:
%tensorflow_version 1.x

TensorFlow 1.x selected.


In [4]:
!pip install --quiet tensorflow==1.14 tensor2tensor==1.13.4 tensorflow_datasets==3.2.1

[?25l[K     |                                | 10 kB 22.3 MB/s eta 0:00:01[K     |▏                               | 20 kB 25.2 MB/s eta 0:00:01[K     |▎                               | 30 kB 12.5 MB/s eta 0:00:01[K     |▍                               | 40 kB 4.9 MB/s eta 0:00:01[K     |▌                               | 51 kB 4.7 MB/s eta 0:00:01[K     |▋                               | 61 kB 5.5 MB/s eta 0:00:01[K     |▊                               | 71 kB 5.5 MB/s eta 0:00:01[K     |▊                               | 81 kB 5.6 MB/s eta 0:00:01[K     |▉                               | 92 kB 6.1 MB/s eta 0:00:01[K     |█                               | 102 kB 5.3 MB/s eta 0:00:01[K     |█                               | 112 kB 5.3 MB/s eta 0:00:01[K     |█▏                              | 122 kB 5.3 MB/s eta 0:00:01[K     |█▎                              | 133 kB 5.3 MB/s eta 0:00:01[K     |█▍                              | 143 kB 5.3 MB/s eta 0:00:01[K  

In [5]:
import os

# Enable TF Eager execution
import tensorflow as tf
tf.logging.set_verbosity(tf.logging.ERROR)
tfe = tf.contrib.eager
tfe.enable_eager_execution() 

import numpy as np
from tensor2tensor import problems
from tensor2tensor import models
from tensor2tensor import problems
from tensor2tensor.utils import trainer_lib
from tensor2tensor.data_generators import text_encoder

from tensor2tensor.utils import t2t_model
from tensor2tensor.utils import registry

import textwrap

## Clone the repository and import the module

In [6]:
!test -d nmt-en-tr || git clone https://github.com/gorkemozkaya/nmt-en-tr.git

Cloning into 'nmt-en-tr'...
remote: Enumerating objects: 126, done.[K
remote: Counting objects: 100% (3/3), done.[K
remote: Compressing objects: 100% (3/3), done.[K
remote: Total 126 (delta 0), reused 0 (delta 0), pack-reused 123[K
Receiving objects: 100% (126/126), 26.05 KiB | 2.17 MiB/s, done.
Resolving deltas: 100% (43/43), done.


In [7]:
import sys
sys.path.append("nmt-en-tr")
import nmt_en_tr

## Downloading and loading the pretrained model

In [8]:
!wget -nc -q https://github.com/gorkemozkaya/nmt-en-tr/releases/download/pretrained_model/en2tr.zip
!unzip -n -qq en2tr.zip

In [9]:
model_path = 'en2tr'

data_dir = os.path.join(model_path, 'data')
ckpt_dir = os.path.join(model_path, 'model')

en2tr_problem = problems.problem("translate_en_tr")
encoders = en2tr_problem.feature_encoders(data_dir)

ckpt_path = tf.train.latest_checkpoint(ckpt_dir)



In [10]:
# Setup helper functions for encoding and decoding
def encode(input_str, output_str=None):
  """Input str to features dict, ready for inference"""
  inputs = encoders["inputs"].encode(input_str) + [1]  # add EOS id 
  batch_inputs = tf.reshape(inputs, [1, -1, 1])  # Make it 3D.
  return {"inputs": batch_inputs}

def decode(integers):
  """List of ints to str"""
  integers = list(np.squeeze(integers))
  if 1 in integers:
    integers = integers[:integers.index(1)]
  return encoders["inputs"].decode(np.squeeze(integers)) 

In [11]:
# Create hparams and the model
model_name = "transformer"
hparams_set = "transformer_tpu"

# Other setup
Modes = tf.estimator.ModeKeys

hparams = trainer_lib.create_hparams(hparams_set, data_dir=data_dir, problem_name="translate_en_tr")

# NOTE: Only create the model once when restoring from a checkpoint; it's a
# Layer and so subsequent instantiations will have different variable scopes
# that will not match the checkpoint.
translate_model = registry.model(model_name)(hparams, Modes.EVAL)

In [12]:
# Restore and translate!
def translate(inputs, beam_size = 5, alpha = 0.6, **kwargs):
  encoded_inputs = encode(inputs)
  with tfe.restore_variables_on_create(ckpt_path):
    model_output = translate_model.infer(encoded_inputs, **kwargs)["outputs"]
  if len(model_output.shape) == 2:
    return decode(model_output)
  else:
    return [decode(x) for x in model_output[0]]
  
def translate_and_display(input):
  output = translate(input)
  print('\n  '.join(textwrap.wrap("Input: {}".format(input), 80)))
  print()
  print('\n  '.join(textwrap.wrap("Output: {}".format(output), 80)))

## Translation Examples

In [13]:
inputs = "If Turkey provides a competitive, safe, and predictable business and \
investment environment, it can reach high growth rates and development levels, \
with its alternative tourism opportunities, agriculture, young, educated \
population, and entrepreneurial spirit."

translate_and_display(inputs)

Input: If Turkey provides a competitive, safe, and predictable business and
  investment environment, it can reach high growth rates and development levels,
  with its alternative tourism opportunities, agriculture, young, educated
  population, and entrepreneurial spirit.

Output: Türkiye rekabetçi, güvenli ve öngörülebilir bir iş ve yatırım ortamı
  sağladığı takdirde, alternatif turizm fırsatları, tarım, genç, eğitimli genç ve
  girişimcilik ruhuyla yüksek büyüme oranları ve kalkınma seviyelerine ulaşabilir.


In [15]:
inputs = "The businessman Arron Banks and the unofficial Brexit campaign Leave.EU have \
issued a legal threat against streaming giant Netflix in relation to The Great Hack, \
a new documentary about the Cambridge Analytica scandal and the abuse of personal data."

translate_and_display(inputs)

Input: The businessman Arron Banks and the unofficial Brexit campaign Leave.EU
  have issued a legal threat against streaming giant Netflix in relation to The
  Great Hack, a new documentary about the Cambridge Analytica scandal and the
  abuse of personal data.

Output: İşadamı Arron Banks ve gayrı resmi Brexit kampanyası devam ediyor. AB,
  Analytica skandalı ve kişisel verilerin kötüye kullanımıyla ilgili yeni bir
  belgesel olan Büyük Hack ile ilgili olarak sokak devir Netflix'i kaldırmaya
  yönelik yasal tehdit yayınladı.


In [16]:
inputs = "The threat comes as press freedom campaigners and charity groups warn \
the government in an open letter that UK courts are being used to “intimidate \
and silence” journalists working in the public interest."

translate_and_display(inputs)

Input: The threat comes as press freedom campaigners and charity groups warn the
  government in an open letter that UK courts are being used to “intimidate and
  silence” journalists working in the public interest.

Output: Söz konusu tehdit, basın özgürlüğü kampanyacıları ve yardım örgütlerinin
  hükümeti, İngiliz mahkemelerinin, kamu yararına çalışan gazetecileri "sindirmeye
  ve susturmaya" alıştığı açık bir mektupta uyarmaları üzerine geldi.


In [17]:
inputs = "Alexandria Ocasio-Cortez called for a “9/11-style commission” to \
investigate child separation on the border with Mexico on Saturday, and said \
the US government has a life-long responsibility to children it severed from \
their parents, to provide them with mental health support."


translate_and_display(inputs)

Input: Alexandria Ocasio-Cortez called for a “9/11-style commission” to
  investigate child separation on the border with Mexico on Saturday, and said the
  US government has a life-long responsibility to children it severed from their
  parents, to provide them with mental health support.

Output: Alexandria Ocasio-Cortez Cumartesi günü Meksika sınırındaki çocuk
  ayrımını araştırması için "9/11 tarzı bir komisyon" çağrısında bulundu ve ABD
  hükümetinin ailelerinden gelen çocuklara olan sağlığına yönelik ömür boyu bir
  sorumluluk taşıdığını söyledi.
