Notebook [3]

Perfect. Let's take a look at an example of machine translation, a form of text generation.

In machine translation (obviously) we map an input text (a sequence of tokens) from a source language to an output text (a sequence of tokens) in a target language. As an example, translate the sentence "Tim Cook presented the new iPhone in Las Vegas on Tuesday." from English to German. 

For this, we could use the model Helsinki-NLP/opus-mt-en-de, a transformer encoder–decoder trained for English–German translation on the dataset OPUS. OPUS is a collection of parallel corpora, where each example is expressed in two languages e.g., an English sentence paired with its human translation in German. To be clear here, Helsinki-NLP/opus-mt-en-de is a task-specific neural machine translation model, not a general-purpose LLM.

Let’s code this.

In [13]:
from transformers import pipeline

MODEL_NAME = "Helsinki-NLP/opus-mt-en-de"


def main() -> None:
    translator = pipeline(
        task="translation",
        model=MODEL_NAME,
        framework="pt",
    )

    text = "Tim Cook presented the new iPhone in Las Vegas on Tuesday."
    result = translator(text)  # pipeline returns a list of dictionaries

    for output in result:
        print(f"Translation: {output['translation_text']}")


if __name__ == "__main__":
    main()


Device set to use mps:0


Translation: Tim Cook präsentierte das neue iPhone am Dienstag in Las Vegas.


Great! To translate the generated German sentence back to English, we require a separate model (at least in classical machine translation). LLMs learn translation implicitly from multilingual data, but (for now) we restrict ourselves to classical machine translation where this is not the case.

In classical machine translation models are directional by construction: each model is trained on sentence pairs in a fixed direction and learns a specific conditional distribution e.g., $P(\text{German} \mid \text{English})
$. The inverse distribution $P(\text{English} \mid \text{German})$ is a different learning problem and therefore requires a separately trained model.

Let's code this using the model Helsinki-NLP/opus-mt-de-en. (de-en instead of en-de).

In [14]:
from transformers import pipeline

MODEL_NAME = "Helsinki-NLP/opus-mt-de-en"


def main() -> None:
    translator = pipeline(
        task="translation",
        model=MODEL_NAME,
        framework="pt",
    )

    text = "Tim Cook präsentierte das neue iPhone am Dienstag in Las Vegas."
    result = translator(text)  # pipeline returns a list of dictionaries

    for output in result:
        print(f"Translation: {output['translation_text']}")


if __name__ == "__main__":
    main()


Device set to use mps:0


Translation: Tim Cook presented the new iPhone on Tuesday in Las Vegas.


Nice. There is an additional and potentially obvious insight here. 

As explained above, in classical machine translation each model learns a direction-specific conditional distribution. Consequently, translating a sentence from English to German and then back to English does not guarantee recovery of the original sentence (i.e., the exact sequence of tokens). During back-translation, the model is given only the German sentence and generates an English sentence according to the conditional distribution $P(\text{English} \mid \text{German})$. Because the original English wording is not recoverable from the German sentence alone, the model produces one of several equally valid English realisations (realisation is a concrete surface form, represented by a specific sequence of tokens).

Continue.