<a href="https://colab.research.google.com/github/mayaschwarz/cs175--lfric-to-Albert/blob/main/Demo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Ælfric to Albert Demo Notebook

This was designed to be run directly from Google Colab, your mileage may vary if run from elsewhere.

## Setup

This code will use a model and some helper files that are stored in a GitHub repository. The repository will be cloned temporarily.

In [None]:
!git clone https://github.com/mayaschwarz/cs175--lfric-to-Albert.git git/
%cd git/

In [None]:
# This might take a few minutes
!pip install texttable > /dev/null
!pip install contractions > /dev/null
!pip install transformers > /dev/null
!pip install datasets==1.4.1 > /dev/null 2> /dev/null
!pip install sentencepiece > /dev/null
!pip install cltk==0.1.121 > /dev/null
!pip install nltk==3.5 > /dev/null
!pip install gdown > /dev/null
!pip install pyyaml==5.3.1 > /dev/null
!pip install torchvision==0.7.0 > /dev/null 2> /dev/null
!pip install OpenNMT-py > /dev/null 2> /dev/null

## A quick look at the data

One of the big limitations of our project was the limited corpus size. The bible corpus contains only around 30k verses, split between the Old and New Testaments.

In [None]:
from summarize_data import *
print_testament_table()

Different books for the Bible can vary stylistically. They may be written in vastly different time periods, perspectives, and genres.

In [None]:
print_genre_table()

For accurate testing, it was important for us to get a roughly even spread of the different bible genres for our test dataset.

In [None]:
print_genre_data_split_table()

Different Modern and Middle English Bible versions may differ slightly in the exact verses provided, but are largely similar. On the other hand, the two Old English Bible versions we used, Aelfric's Old Testament and the West-Saxon Gospels, only contain a small subset of the total Bible books, let alone verses.

In [None]:
print_testament_table('t_alf')
print()
print_testament_table('t_wsg')

While these two versions vary drastically, we combined their verses in order to use as much data as possible.

In [None]:
print_genre_table('t_alf_wsg')

## Preprocessing

As shown above, different Bible versions can differ in which verses and books they contain. In order to train any sequence-to-sequence model, we first need to pair together all the Bible verses shared by the different relevant Bible versions. To do this, we use our `create_datasets` function. This function is our is our swiss army knife for data preprocessing. It does the following:

 - Pairs all Bible verses shared between the given Bible versions
 - Runs any number of specified text pre-processing operations
 - Sets aside the verses from pre-defined test books into a test set
 - Splits the remaining verses into training and validation sets depending on the requested training split
 - Saves the datasets to files if requested
 - Shuffles the datasets if requested
 - Returns the datasets in an easy-to-use dictionary format

In [None]:
from src.data_manager import *
versions = get_bible_versions_by_file_name(['t_kjv', 't_bbe'])
datasets = create_datasets(
    bible_versions = versions,
    training_fraction = 0.85,
    preprocess_operations = [
        preprocess_expand_contractions(),
        preprocess_filter_num_words(max_num_words = 35, min_num_words = 4),
        preprocess_filter_num_sentences(max_num_sentences = 1),
        preprocess_remove_punctuation(preserve_periods = True),
        preprocess_lowercase()
    ],
    write_files = True,
    shuffle = False
)

In [None]:
datasets['test']['t_kjv'][:3]

Without any pre-process operations, the results would contain much more content:

In [None]:
datasets = create_datasets(
    bible_versions = versions,
    training_fraction = 0.85,
    write_files = True,
    shuffle = False
)

In [None]:
datasets['test']['t_kjv'][:4]

By saving the split datasets to files, the same data can be used for consistent results and repreducability. The data can be loaded quickly.

In [None]:
!wc -l data/split/*
print()
datasets = load_datasets()
datasets['test']['t_kjv'][:4]

## Encoder-Decoder Model

### Setting up the Environment

In [None]:
# standard library
from argparse import Namespace

import cltk
import onmt
import onmt.inputters
import onmt.translate
import onmt.model_builder
from onmt.translate.translator import build_translator
import pyonmttok
import texttable
import torch

# local libraries
from src.data_manager import *
from src.paths import *

In [None]:
# Define hyperparameter constants across all the models for translation
MIN_SENTENCE_LENGTH = 1
MAX_SENTENCE_LENGTH = 60
BEAM_SIZE = 5

#### Tokenization and Data Processing

When training the models, we found that the models learned best, when it was given sentences that had been tokenized and replacing special characters with the common modern English equivalent (normalization, canonical form).

**OLD ENGLISH**
> Ðonne beoð swilce gedreccednyssa ... -> Donne beod swilce gedreccednyssa ...

**MIDDLE ENGLISH**
> as 3e lykeþ best -> as ye liketh best.

The models also performed better (2-4% improvement in score) by encoding case information and punctation. We enable case encoding using the `case_markup` flag with `pyonmttok.Tokenizer`.

In [None]:
from cltk.corpus.middle_english.alphabet import normalize_middle_english
from cltk.phonology.old_english.phonology import Word

TOKENIZER = pyonmttok.Tokenizer("aggressive", case_markup=True)

def _normalize(text: str, language_code: str):
    """
    Given the language code, applies appropriate data normalization to the text.
    """
    if language_code == 'ang':
        # old english
        DONT_NORMALIZE = '!?.&,:;"'
        normalized_words = list()
        for word in text.split():
            if len(word) == 0:
                continue

            if word[-1] in DONT_NORMALIZE:
                normalized_words.append(Word(word[:-1]).ascii_encoding() + word[-1])
            else:
                normalized_words.append(Word(word).ascii_encoding())

        return ' '.join(normalized_words)
    elif language_code == 'enm':
        # middle english
        return normalize_middle_english(text, to_lower=False, alpha_conv=True, punct=False)
    return text

def preprocess_data(data: [str], lang_code: str) -> [str]:
    """
    Tokenizes a list of sentences and returns each as a space-separated token string.
    Format is the preprocessing step before passing to the OpenNMT-py models.

    Arguments:
      data{[str]} -- list of sentences to be tokenized
      lang_code{str} -- language code representing the data (eng, enm, ang), applies normalization

    Returns:
      [str] -- list of tokenized sentences
    """
    return [' '.join(TOKENIZER.tokenize(_normalize(sent, lang_code))[0]) for sent in data]

def postprocess_data(data: [str]) -> [str]:
    """
    Detokenizes a list of space-separated token strings and returns each as a sentence.

    Arguments:
      data{[str]} -- list of list of space-separated token strings.

    Returns:
      [str] -- list of sentences
    """
    return [TOKENIZER.detokenize(sent.split(' ')) for sent in data]

#### Translation
OpenNMT-py uses a translation script, `onmt_translate` that loads the model alongside translation parameters (beam_size, max length, etc.). 

We're going to use the python interface to build the translator.

In [None]:
from onmt.translate.translator import build_translator
from argparse import Namespace

def gen_model_translator(model_path: str) -> onmt.translate.Translator:
    """
    Generates a translator object for an OpenNMT-py model, given the model path
    and global translation parameters.

    Arguments:
      model_path {str} -- path to the model

    Returns:
      onmt.translate.Translator -- Wrapper class for translation of the given model
    """
    opt = Namespace(fix_word_vecs_dec=False,
                    fix_word_vecs_enc=False,
                    alpha=0.0, 
                    ban_unk_token=False,
                    batch_type='sents', 
                    beam_size=BEAM_SIZE, 
                    beta=-0.0, 
                    block_ngram_repeat=0, 
                    coverage_penalty='none', 
                    data_type='text', 
                    dump_beam='', 
                    fp32=False, 
                    gpu=-1, 
                    int8=False,
                    ignore_when_blocking=[], 
                    length_penalty='none', 
                    max_length=MAX_SENTENCE_LENGTH, 
                    max_sent_length=None, 
                    min_length=MIN_SENTENCE_LENGTH, 
                    models=[model_path], 
                    n_best=1, 
                    output='/dev/null', 
                    phrase_table='', 
                    random_sampling_temp=1.0, 
                    random_sampling_topk=0, 
                    random_sampling_topp=0.0,
                    ratio=-0.0, 
                    replace_unk=False, 
                    report_align=False, 
                    report_time=False, 
                    seed=829, 
                    stepwise_penalty=False, 
                    tgt=None, 
                    tgt_prefix=None,
                    verbose=False)
    return build_translator(opt, report_score=False)

def translate(text: [str], lang_code: str, translator: onmt.translate.Translator) -> [str]:
    """
    Performs batch level translation and returns detokenized strings
    """
    x_tokenized = preprocess_data(text, lang_code)
    hyp = translator.translate(x_tokenized, batch_size=len(x_tokenized))
    return postprocess_data([h[0] for h in hyp[1]])

def custom_sentence_wrapper(input_lang_code: str, translator: onmt.translate.Translator) -> None:
    """
    Wrapper function to allow user to test input sentences.
    """
    while True:
        sent = input("Enter your own sentence here (or Q to quit): ")
        if not sent:
            continue
        elif sent == 'Q':
            break
        else:
            print('Translation: ', translate([sent], input_lang_code, translator)[0])
            
def format_output(predicted: [str], expected: [str]) -> str:
    """
    Returns texttable string of predicted and expected values
    """
    tableObj = texttable.Texttable() 
    tableObj.set_cols_align(["c", "c"]) 
    tableObj.set_cols_dtype(["t", "t"]) 
    tableObj.set_cols_valign(["t", "t"]) 
    tableObj.add_rows([["Expected", "Predicted"], *zip(expected, predicted)])
    return tableObj.draw()

### The Models

Here we will showcase the best trained models for each translation direction.

* Modern English <-> Modern English
* Middle English <-> Modern English
* Old English <-> Modern English

Scores for BLEU and METEOR have been adjusted from decimals to percentiles for ease of reading.

In [None]:
from pathlib import Path
# download the best models
!gdown --id 1dnkVQMfF72Qv-KUmNeAkJVyUSXsxh8ht --output onmt-models.zip
!unzip onmt-models.zip

# folder containing all best encoder-decoder models
models = Path('onmt-models')

#### Modern English
The Modern English models are models translating between Early Modern English used in the *King James Version (1611)* and Modern English used in the *Bible In Basic English (1965)*.

These were our earliest models when figuring out how to build, tweak, and design using our custom Encoder-Decoder and OpenNMT. These models are also used in comparison against the Transformer Model.

We will showcase the capabilities using verses from the *Book of Revelation*.

In [None]:
# get the book ids
book_id = get_bible_book_id_map()

# Specify which verses we want to recieve
# VerseIdentifier(book, chapter, verse)
ten_verses = [VerseIdentifier(book_id['revelation'], 6, i) for i in range(1,11)]

# get the bible versions
bbe_info, kjv_info = get_bible_versions_by_file_name(['t_bbe', 't_kjv'])
bbe_bible = get_bible_verses(bbe_info)
kjv_bible = get_bible_verses(kjv_info)

# extract the verses selected
bbe_verses = []
kjv_verses = []
for v in ten_verses:
    bbe_verses.append(bbe_bible[v])
    kjv_verses.append(kjv_bible[v])

##### King James to Basic English

This model is a 2-layer LSTM encoder-decoder with a bidirectional encoder, an hidden size of `256` and an embedding size of `256`. Dropout was used in both the hidden layers and attention mechanism, `0.5` and `0.3` respectfully.

The best performing model was after 4000 training iterations with early stopping and beam size 5 with a score of `BLEU = 36.048` and `METEOR = 54.51`.

In [None]:
# the model path
kjv2bbe = models / 'kjv2bbe_step_4000.pt'

checkpoint = torch.load(kjv2bbe, map_location=lambda storage, loc: storage)

# Need to set checkpoint flags (the training opts)
checkpoint['opt'].fix_word_vecs_enc = False
checkpoint['opt'].fix_word_vecs_dec = False

torch.save(checkpoint, kjv2bbe)

# generate the translator
kjv2bbe_translator = gen_model_translator(kjv2bbe)

In [None]:
hypothesis = translate(kjv_verses, 'eng', kjv2bbe_translator)
print(format_output(hypothesis, bbe_verses))

In [None]:
# run this cell to try out the model translation yourself!
custom_sentence_wrapper('eng', kjv2bbe_translator)

##### Basic English to King James

This model is a 2-layer LSTM encoder-decoder with a bidirectional encoder, an hidden size of `256` and an embedding size of `256`. Dropout was used in both the hidden layers and attention mechanism, `0.5` and `0.3` respectfully.

The best performing model was after 4000 training iterations with early stopping and beam size 5 with a score of `BLEU = 31.26` and `METEOR = 49.73`.

In [None]:
# the model path
bbe2kjv = models / 'bbe2kjv_step_4000.pt'

checkpoint = torch.load(bbe2kjv, map_location=lambda storage, loc: storage)

# Need to set checkpoint flags (the training opts)
checkpoint['opt'].fix_word_vecs_enc = False
checkpoint['opt'].fix_word_vecs_dec = False

torch.save(checkpoint, bbe2kjv)

# generate the translator
bbe2kjv_translator = gen_model_translator(kjv2bbe)

In [None]:
hypothesis = translate(bbe_verses, 'eng', bbe2kjv_translator)
print(format_output(hypothesis, bbe_verses))

In [None]:
# run this cell to try out the model translation yourself!
custom_sentence_wrapper('eng', bbe2kjv_translator)

#### Middle English

The Middle English models are models translating between Middle English used in the *Wycliffe's Bible (1395?)* and Early Modern English used in the *King James Version (1611)*. King James was chosen over Bible in Basic English because it's vocabulary size was closer to our Middle English corpus and performed better in initial tests.

We will showcase the capabilities using verses from the *Book of Revelation*.

In [None]:
# get the book ids
book_id = get_bible_book_id_map()

# Specify which verses we want to recieve
# VerseIdentifier(book, chapter, verse)
ten_verses = [VerseIdentifier(book_id['revelation'], 6, i) for i in range(1,11)]

# get the bible versions
kjv_info, wyc_info = get_bible_versions_by_file_name(['t_kjv', 't_wyc'])
kjv_bible = get_bible_verses(kjv_info)
wyc_bible = get_bible_verses(wyc_info)

# extract the verses selected
kjv_verses = []
wyc_verses = []
for v in ten_verses:
    kjv_verses.append(kjv_bible[v])
    wyc_verses.append(wyc_bible[v])

##### Wycliffe to King James

This model is a 2-layer LSTM encoder-decoder with a bidirectional encoder, an hidden size of `1024` and an embedding size of `256`. Dropout was used in both the hidden layers and attention mechanism, `0.5` and `0.3` respectfully.

The best performing model was after 2900 training iterations with early stopping and beam size 5 with a score of `BLEU = 26.56` and `METEOR = 44.81`.

In [None]:
# the model path
enm2mod = models / 'enm2mod_step_2900.pt'

checkpoint = torch.load(enm2mod, map_location=lambda storage, loc: storage)

# Need to set checkpoint flags (the training opts)
checkpoint['opt'].fix_word_vecs_enc = False
checkpoint['opt'].fix_word_vecs_dec = False

torch.save(checkpoint, enm2mod)

# generate the translator
enm2mod_translator = gen_model_translator(enm2mod)

In [None]:
hypothesis = translate(wyc_verses, 'enm', enm2mod_translator)
print(format_output(hypothesis, kjv_verses))

In [None]:
# run this cell to try out the model translation yourself!
custom_sentence_wrapper('enm', enm2mod_translator)

##### King James to Wycliffe

This model is a 2-layer LSTM encoder-decoder with a bidirectional encoder, an hidden size of `1024` and an embedding size of `256`. Dropout was used in both the hidden layers and attention mechanism, `0.5` and `0.3` respectfully.

The best performing model was after 3000 training iterations with early stopping and beam size 5 with a score of `BLEU = 28.44` and `METEOR = 45.77`.

In [None]:
# the model path
mod2enm = models / 'mod2enm_step_3000.pt'

checkpoint = torch.load(mod2enm, map_location=lambda storage, loc: storage)

# Need to set checkpoint flags (the training opts)
checkpoint['opt'].fix_word_vecs_enc = False
checkpoint['opt'].fix_word_vecs_dec = False

torch.save(checkpoint, mod2enm)

# generate the translator
mod2enm_translator = gen_model_translator(mod2enm)

In [None]:
hypothesis = translate(kjv_verses, 'enm', mod2enm_translator)
print(format_output(hypothesis, wyc_verses))

In [None]:
# run this cell to try out the model translation yourself!
custom_sentence_wrapper('eng', mod2enm_translator)

#### Old English

The Old English models are models translating between Middle English used in: 
* *Aelfric's Homilies of the Anglo-Saxon Church (993?)*
* fragments of the *Old English Hexateuch (990-1010?)*
* *West Saxon Gospels (990 to 1175?)*. 

Early Modern English used in the *King James Version (1611)*.

We will showcase these capabilities using an excerpt from *Aelfric's Homilies*.

In [None]:
old_sentences = [
    'On þyssere andwerdan gelaðunge sind gemengde yfele and gode, swa swa clæne corn mid fulum coccele: ac on ende þyssere worulde se soða Dema hæt his englas gadrian þone coccel byrþenmælum, and awurpan into ðam unadwæscendlicum fyre.',
    'Þa leas-gewitan ða lédon heora hacelan ætforan fotum sumes geonges cnihtes, se wæs geciged Saulus.',
    'Þurh myrran is gehíwod cwelmbærnys ures flæsces; be ðam cweð seo halige gelaðung, "Mine handa drypton myrran."',
    'Næs seo eadige Maria na ofslegen ne gemartyrod lichomlice, ac gastlice.',
    "Saraí wæs his wíf gehaten, þæt is gereht, Min ealdor,ac God hi het syððan Sarra, þæt is, Ealdor,þæt heo nære synderlice hire hiredes ealdor geciged, ac forðrihte Ealdor'; þæt is to understandenne ealra gelyfedra wifa moder.",
    'Hwæt is ðis deadlice líf buton weg? Understandað nu hwilc sy on weges geswince to ateorigenne, and ðeah nelle þone weg geendigan.',
    'His frynd sind engla heapas, forðan ðe hi healdað on heora staðelfæstnysse singallice his willan.',
    'Ælc bisceop and ælc láreow is to hyrde gesett Godes folce, þæt hí sceolon þæt folc wið ðone wulf gescyldan.'
]

mod_sentences = [
    'In this present church are mingled evil and good, as clean corn with foul cockle: but at the end of this world the true Judge will bid his angels gather the cockle by burthens, and cast it into the unquenchable fire.',
    'The false witnesses then laid their coats before the feet of a young man who was called Saul.',
    'By myrrh is typified the mortality of our flesh, concerning which the holy congregation says, "My hands dropt myrrh."',
    'The blessed Mary was not slain nor martyred bodily, but spiritually.',
    "His wife was called Sarai, which is interpreted, My chief; but God called her afterwards Sarah, that is Chief; that she might not be exclusively called her family's chief, but absolutely chief; which is to be understood, mother of all believing women.",
    'What is this deathlike life but a way? Understand now what it is to faint through the toil of the way, and yet not to desire the way to end.',
    'His friends are companies of angels, because they in their steadfastness constantly observe his will.',
    "Every bishop and every teacher is placed as a shepherd over God's people, that they may shield the people against the wolf."
]

##### Old English Corpora to King James

This model is a 2-layer LSTM encoder-decoder with a bidirectional encoder, an hidden size of `512` and an embedding size of `128`. Dropout was used in both the hidden layers and attention mechanism, `0.6` and `0.4` respectfully.

The best performing model was after 3000 training iterations with early stopping and beam size 5 with a score of `BLEU = 16.99` and `METEOR = 33.38`.

In [None]:
# the model path
ang2mod = models / 'ang2mod_step_3000.pt'

checkpoint = torch.load(ang2mod, map_location=lambda storage, loc: storage)

# Need to set checkpoint flags (the training opts)
checkpoint['opt'].fix_word_vecs_enc = False
checkpoint['opt'].fix_word_vecs_dec = False

torch.save(checkpoint, ang2mod)

# generate the translator
ang2mod_translator = gen_model_translator(ang2mod)

In [None]:
hypothesis = translate(old_sentences, 'ang', ang2mod_translator)
print(format_output(hypothesis, mod_sentences))

In [None]:
# run this cell to try out the model translation yourself!
custom_sentence_wrapper('ang', ang2mod_translator)

####  King James to Old English

This model is a 2-layer LSTM encoder-decoder with a bidirectional encoder, an hidden size of `512` and an embedding size of `128`. Dropout was used in both the hidden layers and attention mechanism, `0.6` and `0.4` respectfully.

The best performing model was after 2900 training iterations with early stopping and beam size 5 with a score of `BLEU = 10.94` and `METEOR = 25.51`.

In [None]:
# the model path
mod2ang = models / 'mod2ang_step_2900.pt'

checkpoint = torch.load(mod2ang, map_location=lambda storage, loc: storage)

# Need to set checkpoint flags (the training opts)
checkpoint['opt'].fix_word_vecs_enc = False
checkpoint['opt'].fix_word_vecs_dec = False

torch.save(checkpoint, mod2ang)

# generate the translator
mod2ang_translator = gen_model_translator(mod2ang)

In [None]:
hypothesis = translate(mod_sentences, 'eng', mod2ang_translator)
print(format_output(hypothesis, old_sentences))

In [None]:
# run this cell to try out the model translation yourself!
custom_sentence_wrapper('eng', mod2ang_translator)

## Transformer Model

In [None]:
# Import dependencies
from transformers import (
    BartForConditionalGeneration, BartTokenizer
)

model_path = 'bart-bbe-to-kjv-1615204968'

First, let's load one of our fine-tuned sequence-to-sequence transformer models along with a pre-trained tokenizer (this might take a minute):

In [None]:
# download and unzip the fine-tuned model
!gdown --id 1Bx9o8Rt6MDItpG6uGSvJGEVuhXTXFqkq --output models.zip
!unzip models.zip

# load the fine-tuned model and the pre-trained tokenizer
model = BartForConditionalGeneration.from_pretrained(model_path, max_length = 100)
tokenizer = BartTokenizer.from_pretrained('facebook/bart-large')

Next, let's define a transformer pipeline for translating text:

In [None]:
from transformers import pipeline

translator = pipeline('translation_bbe_to_kjv', model = model, tokenizer = tokenizer)

Finally, we can translate!

In [None]:
num_verses = 4
source_verses, target_verses = datasets['test']['t_bbe'][:num_verses], datasets['test']['t_kjv'][:num_verses]
predicted_verses = [translation['translation_text'] for translation in translator(source_verses, return_text = True)]

for (source_verse, target_verse, predicted_verse) in zip(source_verses, target_verses, predicted_verses):
    print(f'SOURCE:    {source_verse}')
    print(f'TARGET:    {target_verse}')
    print(f'PREDICTED: {predicted_verse}')
    print()

Best of all, you can make your own predictions (feel free to pass whatever you want to the translate function below!):

In [None]:
def translate(text: str) -> str:
    # translates a single string
    return translator(text, return_text = True)[0]['translation_text']

translate('And the fearless instructors gave us a good grade in the class. For just they were. And full of kindness in their soul.')

### Notable findings (Transformer)

Feel free to skip this section if you're not interested.

The transformer seems to have learned parallelism:

In [None]:
print(translate('For they were just.'))
print(translate('For they were just. And full of kindness in their soul.'))

The structure of the second sentence was extrapolated into the first sentence, shown by how the first sentence was translated differently when followed by the second.

The model learned that 'lord' is often capitalized in the King James Version:

In [None]:
translate('What did the lord say to you?')

The model may translate a sentence differently depending on the ending puncutation (compare word order with above):

In [None]:
translate('What did the lord say to you.')

The model still learned to be derogatory towards homosexuals:

In [None]:
translate('He was a homosexual man.')

It was, after all, trained from verses such as:
`There shall be no prostitute of the daughters of Israel, neither shall there be a sodomite of the sons of Israel.`

However, as opposed to our previous models, it seems like these later models with more training and slightly different methods were less biased against homosexuals and less prone to complete failure:

In [None]:
translate('He was a gay man.')

Previously: `He was a man of the offspring of the evil spirits;`

In [None]:
translate('The black man')

Previously: `The black man, the king of the army, the captain of the army, the captains of the army, the captains of the captains of the captains...`

However, there was still gender bias, assuming that pretty much any profession is held by men, except those associated with women:

In [None]:
print(translate('The person had a marriage.'))
print()
print(translate('The carpenter had a marriage.'))
print(translate('The tailor had a marriage.'))
print(translate('The butcher had a marriage.'))
print(translate('The blacksmith had a marriage.'))
print(translate('The real estate agent had a marriage.'))
print(translate('The journalist had a marriage.'))
print(translate('The artist had a marriage.'))
# etc., there are many more
print()
print(translate('The nurse had a marriage.'))
print(translate('The babysitter had a marriage.'))

We received inconclusive results when trying to determine whether the gender bias was inherent to the models or if it was learned. More humerously, however:

In [None]:
translate('The avocado had a marriage.')

The model understands context, and translates the same word differently even within the same sentence (loving -> loving and loving -> loveth):

In [None]:
translate("Now I'm saving all my loving for someone who's loving me")

Finally, some quotes by Yoda:

In [None]:
translate('Once you start down the dark path, forever will it dominate your destiny. Consume you, it will.')

In [None]:
translate('Death is a natural part of life. Rejoice for those around you who transform into the Force. Mourn them do not. Miss them do not. Attachment leads to jealously.')

In [None]:
translate('On many long journeys have I gone. And waited, too, for others to return from journeys of their own. Some return; some are broken; some come back so different only their names remain.')

In [None]:
translate('No longer certain, that one ever does win a war, I am. For in fighting the battles, the bloodshed, already lost we have. Yet, open to us a path remains. That unknown to the Sith is. Through this path, victory we may yet find. Not victory in the Clone Wars, but victory for all time.')

In [None]:
translate('I can’t believe it, said Luke Skywalker. And Yoda replied, That is why you fail.')