# Machine translation demo
This demo utilizes Intel's pre-trained model that translates from English to German. More information about the model can be found [here](https://github.com/openvinotoolkit/open_model_zoo/blob/master/models/intel/machine-translation-nar-en-de-0002/README.md).

## Downloading model
The following command will download the model to the current directory. Please, make sure you have run `pip install openvino-dev` beforehand.

In [1]:
! omz_downloader --name  machine-translation-nar-en-de-0002

################|| Downloading machine-translation-nar-en-de-0002 ||################

... 100%, 330 KB, 419 KB/s, 0 seconds passed

... 100%, 543 KB, 766 KB/s, 0 seconds passed

... 100%, 311 KB, 468 KB/s, 0 seconds passed

... 100%, 523 KB, 566 KB/s, 0 seconds passed

... 100%, 982 KB, 1020 KB/s, 0 seconds passed

... 100%, 277878 KB, 21316 KB/s, 13 seconds passed

... 100%, 1149 KB, 1063 KB/s, 1 seconds passed

... 100%, 138939 KB, 16071 KB/s, 8 seconds passed



In [2]:
import time
from openvino.runtime import Core
import numpy as np
import itertools
from tokenizers import SentencePieceBPETokenizer

## Loading and configuring the model
The model we use here is available under the `intel/` folder, so we load it and configure its inputs and outputs.

In [3]:
core = Core()
model = core.read_model('intel/machine-translation-nar-en-de-0002/FP32/machine-translation-nar-en-de-0002.xml')
compiled_model = core.compile_model(model)
infer_request = compiled_model.create_infer_request()
input_name = "tokens"
output_name = "pred"
model.output(output_name)
max_tokens = model.input(input_name).shape[1]

## Loading tokenizers
Before we feed our models with an input sentence, it needs to be transformed into tokens it understands. Likewise, we must transform the output into a sentence we can read.

We initialize here the tokenizer for the input `src_tokenizer` and the tokenizer for the output `tgt_tokenizer`.

In [4]:
src_tokenizer = SentencePieceBPETokenizer.from_file(
    'intel/machine-translation-nar-en-de-0002/tokenizer_src/vocab.json',
    'intel/machine-translation-nar-en-de-0002/tokenizer_src/merges.txt'
)
tgt_tokenizer = SentencePieceBPETokenizer.from_file(
    'intel/machine-translation-nar-en-de-0002/tokenizer_tgt/vocab.json',
    'intel/machine-translation-nar-en-de-0002/tokenizer_tgt/merges.txt'
)

## Performing translation
The following function received a sentence in English and translates it to German.

In [5]:
def translate(sentence):
    # Removes leading and trailing whitespaces
    sentence = sentence.strip()
    assert len(sentence) > 0
    tokens = src_tokenizer.encode(sentence).ids
    # Here we transform the tokenized sentence into the model's input format
    tokens = [src_tokenizer.token_to_id('<s>')] + tokens + [src_tokenizer.token_to_id('</s>')]
    pad_length = max_tokens - len(tokens)

    # If the sentence size is smaller the maximum allowed tokens, we fill the remaining tokens with '<pad>'.
    if pad_length > 0:
        tokens = tokens + [src_tokenizer.token_to_id('<pad>')]*pad_length
    assert len(tokens) == max_tokens, "input sentence is too long"
    encoded_sentence = np.array(tokens).reshape(1, -1)

    # Perform inference
    infer_request.infer({input_name: encoded_sentence})
    enc_translated = infer_request.get_tensor(output_name).data[:]

    # Decode the sentence
    sentence = tgt_tokenizer.decode(enc_translated[0])

    # Remove <pad> tokens, as well as '<s>' and '</s>' tokens which mark the beginning and ending of the sentence.
    for s in ['</s>', '<s>', '<pad>']:
        sentence = sentence.replace(s, '')

    # Transform sentence into lower case and join words by a white space
    sentence = sentence.lower().split()
    sentence = " ".join(key for key, _ in itertools.groupby(sentence))
    return sentence

## Translating sentence
The following function is a basic loop to keep translating sentences.

In [6]:
def run_translator():
    while True:
        input_sentence = input()
        if input_sentence == "":
            break

        start_time = time.perf_counter()
        translated = translate(input_sentence)
        end_time = time.perf_counter()
        print(f'Translated: {translated}')
        print(f'Time: {end_time - start_time:.2f}s')

In [7]:
run_translator()

It is rainning now.
Translated: es regnetnet jetzt jetzt.
Time: 0.34s
The weather is good
Translated: das wetter ist gut.
Time: 0.20s


KeyboardInterrupt: Interrupted by user