<a href="https://colab.research.google.com/github/gorkemozkaya/nmt-en-tr/blob/master/Turkish_English_NMT_tf2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Running predictions with the pre-trained NMT models in Python
This notebook illustrates how one can load the pre-trained models shared in this repo and run them on new Turkish or English sentences for translation. The models are trained using the template provided in TensorFlow 2's official models repository. 

First let's install the compatible versions of the dependencies and clone our repository, which includes the customized `models` and `datasets` packages as a dependency. 

In [1]:
%%sh
pip install -q tensorflow==2.8.2 tensorflow-text==2.8.2 tensorflow-addons==0.17.1
[ -d nmt-en-tr ] || git clone -q --recurse-submodules -j8 https://github.com/gorkemozkaya/nmt-en-tr.git
pip3 install -q --user -r /content/nmt-en-tr/models/official/requirements.txt
pip3 install -q -e /content/nmt-en-tr/datasets
[ -e pretrained_v2.zip ] || wget -nc -q https://github.com/gorkemozkaya/nmt-en-tr/releases/download/pretrained_model_v2/pretrained_v2.zip
[ -d pretrained_v2 ] ||  unzip -n -qq pretrained_v2.zip



Update system path

In [2]:
import sys
sys.path = ['/content/nmt-en-tr/datasets', '/content/nmt-en-tr/models'] + sys.path
sys.path = ['/root/.local/lib/python3.7/site-packages', '/root/.local/bin'] + sys.path

**Load the tokenizer**

In [8]:
import tensorflow_text as tftxt
import tensorflow as tf
tokenizer= tftxt.SentencepieceTokenizer(
          model=tf.io.gfile.GFile("pretrained_v2/sentencepiece_en_tr.model", "rb").read(),
          add_eos=True)

Load the keras models

In [19]:
!pip install tensorflow-addons

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [4]:
from official.core import exp_factory, task_factory
from official.nlp.configs import wmt_transformer_experiments as wmt_te

task_config = exp_factory.get_exp_config('transformer_tr_en_blended/base').task
task_config.sentencepiece_model_path = 'pretrained_v2/sentencepiece_en_tr.model'

translation_task = task_factory.get_task(task_config)
model_en_tr = translation_task.build_model()
model_tr_en = translation_task.build_model() # we can use the same task 

**Translation wrapper:** Function that does tokenization, translation and detokenization. 

In [6]:
def translate(input_text, model):
  tokenized = tokenizer.tokenize(input_text)
  translated = model({'inputs' : tf.reshape(tokenized, [1, -1])})
  return tokenizer.detokenize(translated['outputs']).numpy()[0].decode('utf-8')

We need to do a dry-run before we can load the weights.

In [10]:
ignore = translate("test", model_en_tr)
ignore = translate("test", model_tr_en)

In [12]:
model_en_tr.load_weights("pretrained_v2/en_tr/en_tr")

<tensorflow.python.training.tracking.util.CheckpointLoadStatus at 0x7f74f04a4590>

In [17]:
model_tr_en.load_weights("pretrained_v2/tr_en/tr_en")

<tensorflow.python.training.tracking.util.CheckpointLoadStatus at 0x7f74f0333a90>

In [15]:
input = "If Turkey provides a competitive, safe, and predictable business and \
investment environment, it can reach high growth rates and development levels, \
with its alternative tourism opportunities, agriculture, young, educated \
population, and entrepreneurial spirit."

translate(input, model_en_tr)

'Türkiye rekabetçi, güvenli ve öngörülebilir bir iş ve yatırım ortamı sağlarsa, alternatif turizm olanakları, tarım, genç, eğitimli nüfus ve girişimci ruhuyla yüksek büyüme oranları ve kalkınma seviyelerine ulaşabilir.'

In [18]:
input = "CHP Genel Başkanı Kemal Kılıçdaroğlu, İngiltere'de başbakanlık için \
yarışan Dışişleri Bakanı Liz Truss'ın sığınmacıların Ruanda'ya gönderileceği \
programa Türkiye gibi ülkeleri de ekleyerek genişletmeyi planladığının öne \
sürülmesine tepki gösterdi."

translate(input, model_tr_en)

"Republican People's Party (CHP) Leader Kemal Kilicdaroglu reacted against the idea that Foreign Minister Liz Truss, who competed for prime minister in Britain, planned to expand countries such as Turkey in the programme where asylum seekers will be sent to Rwanda."