If you're opening this Notebook on colab, you will probably need to install 🤗 Transformers and 🤗 Datasets. Uncomment the following cell and run it. We also use the `sacrebleu` and `sentencepiece` libraries - you may need to install these even if you already have 🤗 Transformers!

In [1]:
#! pip install transformers[sentencepiece] datasets
#! pip install sacrebleu sentencepiece
#! pip install huggingface_hub

In [2]:
# pip install tensorflow==2.9

In [3]:
# pip freeze > requirements.txt

In [4]:
# pip show evaluate

If you're opening this notebook locally, make sure your environment has an install from the last version of those libraries.

To be able to share your model with the community and generate results like the one shown in the picture below via the inference API, there are a few more steps to follow.

First you have to store your authentication token from the Hugging Face website (sign up [here](https://huggingface.co/join) if you haven't already!) then uncomment the following cell and input your token:

In [5]:
# from huggingface_hub import notebook_login

# notebook_login()

Then you need to install Git-LFS and setup Git if you haven't already. Uncomment the following instructions and adapt with your name and email:

In [6]:
# !apt install git-lfs
# !git config --global user.email "you@example.com"
# !git config --global user.name "Your Name"

Make sure your version of Transformers is at least 4.16.0 since some of the functionality we use was introduced in that version:

In [7]:
import transformers

print(transformers.__version__)

4.24.0


You can find a script version of this notebook to fine-tune your model in a distributed fashion using multiple GPUs or TPUs [here](https://github.com/huggingface/transformers/tree/master/examples/seq2seq).

# Convert CSV-file to a dataset-ready format

The code below works with a specifically formatted csv. Run the cell below to format your CSV accordingly.
Your CSV should have at least 2 columns `en` and `xx` where xx is the code of the target language.

If the CSV file has PoS tags for source and target language, the expected column names for them are:
`pos_en` and `pos_xx`. 

If the CSV file has WA tags, the expected column name is `wa`.

In [8]:
import pandas as pd
from datasets import Dataset


# source_lang accepted value = 'en'
# target_lang accepted values = 'fr'|'zh'
# Choose pos_tags=True if the file has PoS tags for the both languages
# Choose wa_tags=True if the file has WA tags.
# Choose store=True if you want to create a json dump of the file that can be used later

def csv_to_dataset(filename, source_lang, target_lang, pos_tags=False, wa_tags=False, store=False):
    data = pd.read_csv(filename)
    new_df = pd.DataFrame()
    new_df['translation'] = [{source_lang: x, target_lang: y} for x, y in zip(data[source_lang], data[target_lang])]
    if pos_tags:
        new_df['pos'] = [{source_lang: x, target_lang: y} for x, y in zip(data[f'pos_{source_lang}'], data[f'pos_{target_lang}'])]
    if wa_tags:
        new_df['wa'] = data['wa']
    return Dataset.from_pandas(new_df).train_test_split(test_size=0.2)

In [9]:
from datasets import load_from_disk

loaded_dataset = load_from_disk('fr_dataset_split.hf')

In [10]:
loaded_dataset.remove_columns(['pos', 'wa'])

DatasetDict({
    train: Dataset({
        features: ['translation'],
        num_rows: 1508
    })
    test: Dataset({
        features: ['translation'],
        num_rows: 378
    })
})

# Fine-tuning a model on a translation task

In this notebook, we will see how to fine-tune one of the [🤗 Transformers](https://github.com/huggingface/transformers) model for a translation task. We will use the [WMT dataset](http://www.statmt.org/wmt16/), a machine translation dataset composed from a collection of various sources, including news commentaries and parliament proceedings.

![Widget inference on a translation task](images/translation.png)

We will see how to easily load the dataset for this task using 🤗 Datasets and how to fine-tune a model on it using Keras.

In [11]:
model_checkpoint = "Helsinki-NLP/opus-mt-en-fr"

This notebook is built to run  with any model checkpoint from the [Model Hub](https://huggingface.co/models) as long as that model has a sequence-to-sequence version in the Transformers library. Here we picked the [`Helsinki-NLP/opus-mt-en-romance`](https://huggingface.co/Helsinki-NLP/opus-mt-en-ROMANCE) checkpoint. 

## Loading the dataset

We will use the [🤗 Datasets](https://github.com/huggingface/datasets) library to download the data and get the metric we need to use for evaluation (to compare our model to the benchmark). This can be easily done with the `datasets` function `load_dataset` and the `evaluate` function `load`. We use the English/Romanian part of the WMT dataset here.

In [12]:
from datasets import load_dataset
from evaluate import load

metric = load("sacrebleu")

To get a sense of what the data looks like, the following function will show some examples picked randomly in the dataset.

In [13]:
import datasets
import random
import pandas as pd
from IPython.display import display, HTML


def show_random_elements(dataset, num_examples=5):
    assert num_examples <= len(
        dataset
    ), "Can't pick more elements than there are in the dataset."
    picks = []
    for _ in range(num_examples):
        pick = random.randint(0, len(dataset) - 1)
        while pick in picks:
            pick = random.randint(0, len(dataset) - 1)
        picks.append(pick)

    df = pd.DataFrame(dataset[picks])
    for column, typ in dataset.features.items():
        if isinstance(typ, datasets.ClassLabel):
            df[column] = df[column].transform(lambda i: typ.names[i])
    display(HTML(df.to_html()))

In [14]:
show_random_elements(loaded_dataset["train"])

Unnamed: 0,translation,pos,wa
0,"{'en': 'The Chickasaw began to trade with the British after the colony of Carolina was founded in 1670.', 'fr': 'Les Chicachas ont commencé à commercer avec les Britanniques après la fondation de la colonie de la province de Caroline en 1670.'}","{'en': 'The DET Chickasaw PROPN began VERB to PART trade VERB with ADP the DET British ADJ after SCONJ the DET colony NOUN of ADP Carolina PROPN was AUX founded VERB in ADP 1670 NUM . PUNCT ', 'fr': 'Les DET Chicachas PROPN ont AUX commencé VERB à ADP commercer VERB avec ADP les DET Britanniques NOUN après ADP la DET fondation NOUN de ADP la DET colonie NOUN de ADP la DET province NOUN de ADP Caroline NOUN en ADP 1670 NUM . PUNCT '}",0-0 1-1 2-2 2-3 3-4 4-5 5-6 6-7 7-8 8-9 9-13 10-14 10-17 11-18 12-19 13-12 14-11 15-20 16-21 17-22
1,"{'en': 'The genes for mitosomal components are contained in the nuclear genome.', 'fr': 'Les gènes codant les composants mitosomaux font partie du génome nucléaire.'}","{'en': 'The DET genes NOUN for ADP mitosomal ADJ components NOUN are AUX contained VERB in ADP the DET nuclear ADJ genome NOUN . PUNCT ', 'fr': 'Les DET gènes NOUN codant VERB les DET composants NOUN mitosomaux ADJ font VERB partie NOUN du ADP génome NOUN nucléaire ADJ . PUNCT '}",0-0 1-1 2-2 3-5 4-4 5-6 7-7 8-8 9-10 10-9 11-11
2,"{'en': 'After hearing of the many Sendai men already appointed Takusaburō joined the school in 1880.', 'fr': 'Après avoir entendu mentionnés de nombreux hommes de Sendai déjà nommés Takusaburō rejoint l'école en 1880.'}","{'en': 'After ADP hearing NOUN of ADP the DET many ADJ Sendai PROPN men NOUN already ADV appointed VERB Takusaburō PROPN joined VERB the DET school NOUN in ADP 1880 NUM . PUNCT ', 'fr': 'Après ADP avoir AUX entendu VERB mentionnés VERB de DET nombreux ADJ hommes NOUN de ADP Sendai PROPN déjà ADV nommés VERB Takusaburō PROPN rejoint VERB l' DET école NOUN en ADP 1880 NUM . PUNCT '}",0-0 1-2 2-3 3-4 4-5 5-8 6-6 7-9 8-10 9-11 10-12 11-13 12-14 13-15 14-16 15-17
3,"{'en': 'Renville was named after counties in Minnesota and North Dakota.', 'fr': 'Il portait le nom de comtés du Minnesota et du Dakota du Nord.'}","{'en': 'Renville PROPN was AUX named VERB after ADP counties NOUN in ADP Minnesota PROPN and CCONJ North PROPN Dakota PROPN . PUNCT ', 'fr': 'Il PRON portait VERB le DET nom NOUN de ADP comtés NOUN du ADP Minnesota PROPN et CCONJ du ADP Dakota NOUN du ADP Nord NOUN . PUNCT '}",0-0 1-1 2-2 3-3 4-5 5-6 6-7 7-8 8-12 9-10 10-13
4,"{'en': 'The Japanese governmental shipyards were overwhelmed with the volume of construction and for the first time civilian shipyards were also assigned to produce warships.', 'fr': 'Les chantiers navals gouvernementaux japonais ont été submergés par le volume de la commande et pour la première fois des chantiers navals civils ont été affectés pour produire des navires de guerre.'}","{'en': 'The DET Japanese ADJ governmental ADJ shipyards NOUN were AUX overwhelmed VERB with ADP the DET volume NOUN of ADP construction NOUN and CCONJ for ADP the DET first ADJ time NOUN civilian ADJ shipyards NOUN were AUX also ADV assigned VERB to PART produce VERB warships NOUN . PUNCT ', 'fr': 'Les DET chantiers NOUN navals ADJ gouvernementaux ADJ japonais NOUN ont AUX été AUX submergés VERB par ADP le DET volume NOUN de ADP la DET commande NOUN et CCONJ pour ADP la DET première ADJ fois NOUN des ADP chantiers NOUN navals ADJ civils ADJ ont AUX été AUX affectés VERB pour ADP produire VERB des ADP navires NOUN de ADP guerre NOUN . PUNCT '}",0-0 1-4 2-3 3-1 3-2 4-5 4-6 5-7 6-8 7-9 8-10 9-11 9-12 10-13 11-14 12-15 13-16 14-17 15-18 16-22 17-20 17-21 18-23 18-24 19-25 20-25 21-26 22-27 23-29 23-31 24-32


## Preprocessing the data

In [15]:
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)

For the mBART tokenizer (like we have here), we need to set the source and target languages (so the texts are preprocessed properly). You can check the language codes [here](https://huggingface.co/facebook/mbart-large-cc25) if you are using this notebook on a different pairs of languages.

In [16]:
if "mbart" in model_checkpoint:
    tokenizer.src_lang = "en-XX"
    tokenizer.tgt_lang = "fr-FR"

In [17]:
tokenizer(["Hello, this is a sentence!", "This is another sentence."])

{'input_ids': [[10537, 2, 67, 32, 15, 5776, 145, 0], [160, 32, 1036, 5776, 3, 0]], 'attention_mask': [[1, 1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1]]}

Later on, for Word Alignment encoding we will be using these token values instead of real words to express relatedness of words in 2 sentences.

In [18]:
import spacy

en_pos_sp = spacy.load("en_core_web_sm")
fr_pos_sp = spacy.load('fr_core_news_sm')

If you are using one of the five T5 checkpoints that require a special prefix to put before the inputs, you should adapt the following cell.

In [19]:
if model_checkpoint in ["t5-small", "t5-base", "t5-larg", "t5-3b", "t5-11b"]:
    prefix = "translate English to French: "
else:
    prefix = ""

We can then write the function that will preprocess our samples. We just feed them to the `tokenizer` with the argument `truncation=True`. This will ensure that an input longer that what the model selected can handle will be truncated to the maximum length accepted by the model. The padding will be dealt with later on (in a data collator) so we pad examples to the longest length in the batch and not the whole dataset.

For PoS tags we will use a separate function that will parse the sentences and extract the PoS information from there.

In [20]:
"""
get_pos_tags receives a tokenized input from the model. The tokenization is a bit different from spacy model,
so to keep the same dimensions of vectors as in the sentence embeddings for each token in a sentence we will:
- decode the token received from the model
- get a Part of Speech id for it from Spacy and return it
"""

def token_to_pos(token, lang):
    if lang == 'en':
        decoded = list(en_pos_sp(tokenizer.decode(token)))
    elif lang == 'fr':
        decoded = list(fr_pos_sp(tokenizer.decode(token)))
    return decoded[-1].pos if decoded else -1

def get_pos_tags(tokenized_sent, lang):
    return list(map(lambda x: token_to_pos(x, lang), tokenized_sent))

Word Alignment information can be encoded in different ways:
- Name: `trg-ids`. Create a vector of the same length as the tokenized input sentence. For each position i in the new vector, find a corresponding word in the original input sentence. Find a connected word from the target sentence and put its tokenized value in the new vector.
- Name: `sums`. Create a copy of the input vector. For i-th word that has a connected word in the target sentence, add its value to the tokenized value to the i-th position of the new vector.
- Name: `mult`. Same as sums but replaces sums with multiplication of values.

In [21]:
from copy import deepcopy

def encode_wa(tokenized_input, tokenized_target, wa, wa_type):
    wa_dict = {int(src): int(trg) for src, trg in map(lambda x: x.split('-'), wa.split())}
    n = len(tokenized_input)
    m = len(tokenized_target)
    if wa_type == 'trg-ids':
        wa_emb = [0]*n
        for k, v in wa_dict.items():
            if k >= n or v >= m:
                break
            wa_emb[k] = tokenized_target[v]
        return wa_emb
    elif wa_type == 'sums':
        wa_emb = deepcopy(tokenized_input)
        for k, v in wa_dict.items():
            if k >= n or v >= m:
                break
            wa_emb[k] += tokenized_target[v]
        return wa_emb
    
    elif wa_type == 'mult':
        wa_emb = deepcopy(tokenized_input)
        for k, v in wa_dict.items():
            if k >= n or v >= m:
                break
            wa_emb[k] *= tokenized_target[v]
        return wa_emb

In [22]:
max_input_length = 128
max_target_length = 128
source_lang = "en"
target_lang = "fr"
pos_tags=False
wa_type=None

def preprocess_function(dataset):
    global source_lang, target_lang, pos_tags, wa_type
    inputs = [prefix + d[source_lang] for d in dataset["translation"]]
    targets = [d[target_lang] for d in dataset["translation"]]
    model_inputs = tokenizer(inputs, max_length=max_input_length, truncation=True)

    # Setup the tokenizer for targets
    with tokenizer.as_target_tokenizer():
        labels = tokenizer(targets, max_length=max_target_length, truncation=True)

    model_inputs["labels"] = labels["input_ids"]
    
    if pos_tags:
        model_inputs['pos'] = [get_pos_tags(x, 'en') for x in model_inputs['input_ids']]
        model_inputs['target_pos'] = [get_pos_tags(y, 'fr') for y in model_inputs['labels']]
        
    if wa_type:
        model_inputs['wa'] = [encode_wa(src, trg, wa, wa_type) for src, trg, wa \
                              in zip(model_inputs['input_ids'],  model_inputs['labels'], dataset["wa"])]
    return model_inputs

This function works with one or several examples. In the case of several examples, the tokenizer will return a list of lists for each key:

To apply this function on all the pairs of sentences in our dataset, we just use the `map` method of our `dataset` object we created earlier. This will apply the function on all the elements of all the splits in `dataset`, so our training, validation and testing data will be preprocessed in one single command.

In [23]:
from transformers.keras_callbacks import KerasMetricCallback
import numpy as np


def metric_fn(eval_predictions):
    preds, labels = eval_predictions
    prediction_lens = [
        np.count_nonzero(pred != tokenizer.pad_token_id) for pred in preds
    ]
    decoded_preds = tokenizer.batch_decode(preds, skip_special_tokens=True)

    # We use -100 to mask labels - replace it with the tokenizer pad token when decoding
    # so that no output is emitted for these
    labels = np.where(labels != -100, labels, tokenizer.pad_token_id)
    decoded_labels = tokenizer.batch_decode(labels, skip_special_tokens=True)

    # Some simple post-processing
    decoded_preds = [pred.strip() for pred in decoded_preds]
    decoded_labels = [[label.strip()] for label in decoded_labels]

    result = metric.compute(predictions=decoded_preds, references=decoded_labels)
    result = {"bleu": result["score"]}
    result["gen_len"] = np.mean(prediction_lens)
    return result

In [24]:
from transformers.keras_callbacks import PushToHubCallback
from tensorflow.keras.callbacks import TensorBoard
from transformers import TFAutoModelForSeq2SeqLM, DataCollatorForSeq2Seq
from transformers import AdamWeightDecay
import tensorflow as tf

## Fine-tuning the model with no extra tags

In [19]:
max_input_length = 128
max_target_length = 128
source_lang = "en"
target_lang = "fr"
pos_tags=False
wa_tags = False
wa_type = None

split_dataset = loaded_dataset.remove_columns(['pos', 'wa'])


no_anno_dataset = split_dataset.map(preprocess_function, batched=True)

Loading cached processed dataset at fr_dataset_split.hf/train\cache-4ac5e2a245c8bb7d.arrow
Loading cached processed dataset at fr_dataset_split.hf/test\cache-96397f53f35e1795.arrow


In [22]:
model = TFAutoModelForSeq2SeqLM.from_pretrained(model_checkpoint)

All model checkpoint layers were used when initializing TFMarianMTModel.

All the layers of TFMarianMTModel were initialized from the model checkpoint at Helsinki-NLP/opus-mt-en-fr.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFMarianMTModel for predictions without further training.


Note that  we don't get a warning like in our classification example. This means we used all the weights of the pretrained model and there is no randomly initialized head in this case.

Next we set some parameters like the learning rate and the `batch_size`and customize the weight decay. 

The last two arguments are to setup everything so we can push the model to the [Hub](https://huggingface.co/models) at the end of training. Remove the two of them if you didn't follow the installation steps at the top of the notebook, otherwise you can change the value of push_to_hub_model_id to something you would prefer.

In [57]:
batch_size = 16
learning_rate = 2e-5
weight_decay = 0.01
num_train_epochs = 1

#model_name = model_checkpoint.split("/")[-1]
#push_to_hub_model_id = f"{model_name}-finetuned-{source_lang}-to-{target_lang}"

Then, we need a special kind of data collator, which will not only pad the inputs to the maximum length in the batch, but also the labels. Note that our data collators are multi-framework, so make sure you set `return_tensors='tf'` so you get `tf.Tensor` objects back and not something else!

In [58]:
data_collator = DataCollatorForSeq2Seq(tokenizer, model=model, return_tensors="tf")

generation_data_collator = DataCollatorForSeq2Seq(tokenizer, model=model, return_tensors="tf", pad_to_multiple_of=128)

Next, we convert our datasets to `tf.data.Dataset`, which Keras understands natively. There are two ways to do this - we can use the slightly more low-level [`Dataset.to_tf_dataset()`](https://huggingface.co/docs/datasets/package_reference/main_classes#datasets.Dataset.to_tf_dataset) method, or we can use [`Model.prepare_tf_dataset()`](https://huggingface.co/docs/transformers/main_classes/model#transformers.TFPreTrainedModel.prepare_tf_dataset). The main difference between these two is that the `Model` method can inspect the model to determine which column names it can use as input, which means you don't need to specify them yourself. Make sure to specify the collator we just created as our `collate_fn`!

We also want to compute `BLEU` metrics, which will require us to generate text from our model. To speed things up, we can compile our generation loop with XLA. This results in a *huge* speedup - up to 100X! The downside of XLA generation, though, is that it doesn't like variable input shapes, because it needs to run a new compilation for each new input shape! To compensate for that, let's use `pad_to_multiple_of` for the dataset we use for text generation. This will reduce the number of unique input shapes a lot, meaning we can get the benefits of XLA generation with only a few compilations.

In [59]:
train_dataset = model.prepare_tf_dataset(
    no_anno_dataset["train"],
    batch_size=batch_size,
    shuffle=True,
    collate_fn=data_collator,
)

validation_dataset = model.prepare_tf_dataset(
    no_anno_dataset["test"],
    batch_size=batch_size,
    shuffle=False,
    collate_fn=data_collator,
)

generation_dataset = model.prepare_tf_dataset(
    no_anno_dataset["test"],
    batch_size=8,
    shuffle=False,
    collate_fn=generation_data_collator,
)

Now we initialize our loss and optimizer and compile the model. Note that most Transformers models compute loss internally, so we can just leave the loss argument blank to use the internal loss instead. For the optimizer, we can use the `AdamWeightDecay` optimizer in the Transformer library.

In [60]:
optimizer = AdamWeightDecay(learning_rate=learning_rate, weight_decay_rate=weight_decay)
model.compile(optimizer=optimizer)

No loss specified in compile() - the model's internal loss computation will be used as the loss. Don't panic - this is a common way to train TensorFlow models in Transformers! To disable this behaviour please pass a loss argument, or explicitly pass `loss=None` if you do not want your model to compute a loss.


Now we can train our model. We can also add a few optional callbacks here, which you can remove if they aren't useful to you. In no particular order, these are:
- PushToHubCallback will sync up our model with the Hub - this allows us to resume training from other machines, share the model after training is finished, and even test the model's inference quality midway through training!
- TensorBoard is a built-in Keras callback that logs TensorBoard metrics.
- KerasMetricCallback is a callback for computing advanced metrics. There are a number of common metrics in NLP like ROUGE which are hard to fit into your compiled training loop because they depend on decoding predictions and labels back to strings with the tokenizer, and calling arbitrary Python functions to compute the metric. The KerasMetricCallback will wrap a metric function, outputting metrics as training progresses.

If this is the first time you've seen `KerasMetricCallback`, it's worth explaining what exactly is going on here. The callback takes two main arguments - a `metric_fn` and an `eval_dataset`. It then iterates over the `eval_dataset` and collects the model's outputs for each sample, before passing the `list` of predictions and the associated `list` of labels to the user-defined `metric_fn`. If the `predict_with_generate` argument is `True`, then it will call `model.generate()` for each input sample instead of `model.predict()` - this is useful for metrics that expect generated text from the model, like `ROUGE` and `BLEU`.

This callback allows complex metrics to be computed each epoch that would not function as a standard Keras Metric. Metric values are printed each epoch, and can be used by other callbacks like `TensorBoard` or `EarlyStopping`.

In [61]:
metric_callback = KerasMetricCallback(
    metric_fn=metric_fn, eval_dataset=generation_dataset, predict_with_generate=True, use_xla_generation=True, 
    generate_kwargs={"max_length": 128}
)

With the metric callback ready, now we can specify the other callbacks and fit our model:

In [62]:
tensorboard_callback = TensorBoard(log_dir="./translation_model_save/logs")

"""push_to_hub_callback = PushToHubCallback(
    output_dir="./translation_model_save",
    tokenizer=tokenizer,
    hub_model_id=push_to_hub_model_id,
)
"""


#callbacks = [metric_callback, tensorboard_callback, push_to_hub_callback]
callbacks = [metric_callback, tensorboard_callback]

model.fit(
    train_dataset, validation_data=validation_dataset, epochs=1, callbacks=callbacks
)



<keras.callbacks.History at 0x1fbbee1a0a0>

**BLEU** metric after the run with no extra features is **18.7710**. This is our baseline.

## Running the model with PoS features

This cell can run pretty slow and can take 5-10 minutes.

In [25]:
max_input_length = 128
max_target_length = 128
batch_size = 16
learning_rate = 2e-5
weight_decay = 0.01
num_train_epochs = 1
source_lang = "en"
target_lang = "fr"
# Pos_tags need to be set to True in the cell
pos_tags = True
wa_tags = False
wa_type = None

split_dataset = loaded_dataset.remove_columns(['wa'])

pos_anno_dataset = split_dataset.map(preprocess_function, batched=True)

  0%|          | 0/2 [00:00<?, ?ba/s]



  0%|          | 0/1 [00:00<?, ?ba/s]

In [26]:
model_with_pos = TFAutoModelForSeq2SeqLM.from_pretrained(model_checkpoint)

All model checkpoint layers were used when initializing TFMarianMTModel.

All the layers of TFMarianMTModel were initialized from the model checkpoint at Helsinki-NLP/opus-mt-en-fr.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFMarianMTModel for predictions without further training.


In [27]:
data_collator_pos = DataCollatorForSeq2Seq(tokenizer, model=model_with_pos, return_tensors="tf")

generation_data_collator_pos = DataCollatorForSeq2Seq(tokenizer, model=model_with_pos, return_tensors="tf", pad_to_multiple_of=128)

In [28]:
train_dataset = model_with_pos.prepare_tf_dataset(
    pos_anno_dataset["train"],
    batch_size=batch_size,
    shuffle=True,
    collate_fn=data_collator_pos,
)

validation_dataset = model_with_pos.prepare_tf_dataset(
    pos_anno_dataset["test"],
    batch_size=batch_size,
    shuffle=False,
    collate_fn=data_collator_pos,
)

generation_dataset = model_with_pos.prepare_tf_dataset(
    pos_anno_dataset["test"],
    batch_size=8,
    shuffle=False,
    collate_fn=data_collator_pos,
)

In [29]:
optimizer = AdamWeightDecay(learning_rate=learning_rate, weight_decay_rate=weight_decay)
model_with_pos.compile(optimizer=optimizer)

No loss specified in compile() - the model's internal loss computation will be used as the loss. Don't panic - this is a common way to train TensorFlow models in Transformers! To disable this behaviour please pass a loss argument, or explicitly pass `loss=None` if you do not want your model to compute a loss.


In [30]:
metric_callback = KerasMetricCallback(
    metric_fn=metric_fn, eval_dataset=generation_dataset, predict_with_generate=True, use_xla_generation=True, 
    generate_kwargs={"max_length": 128}
)

In [31]:
tensorboard_callback = TensorBoard(log_dir="./translation_model_save/logs")

callbacks = [metric_callback, tensorboard_callback]

model_with_pos.fit(
    train_dataset, validation_data=validation_dataset, epochs=1, callbacks=callbacks
)



: 

: 

**BLEU** metric after the run with POS features only is **22.47899**. 

## Running model with WA (vector with target ids)

In [33]:
batch_size = 16
learning_rate = 2e-5
weight_decay = 0.01
num_train_epochs = 1

max_input_length = 128
max_target_length = 128
source_lang = "en"
target_lang = "fr"
pos_tags = False
wa_tags = True
wa_type="trg-ids"


split_dataset = loaded_dataset.remove_columns(['pos'])
wa_trg_id_dataset = split_dataset.map(preprocess_function, batched=True)

Loading cached processed dataset at fr_dataset_split.hf/train\cache-da7d73a19c07a55b.arrow
Loading cached processed dataset at fr_dataset_split.hf/test\cache-29b0fe0d33970b62.arrow


In [30]:
model_with_wa_trg_ids = TFAutoModelForSeq2SeqLM.from_pretrained(model_checkpoint)

All model checkpoint layers were used when initializing TFMarianMTModel.

All the layers of TFMarianMTModel were initialized from the model checkpoint at Helsinki-NLP/opus-mt-en-fr.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFMarianMTModel for predictions without further training.


In [27]:
data_collator_wa_trg_ids = DataCollatorForSeq2Seq(tokenizer, model=model_with_wa_trg_ids, return_tensors="tf")

generation_data_collator_wa_trg_ids = DataCollatorForSeq2Seq(tokenizer, model=model_with_wa_trg_ids, return_tensors="tf", pad_to_multiple_of=128)

In [31]:
train_dataset = model_with_wa_trg_ids.prepare_tf_dataset(
    wa_trg_id_dataset["train"],
    batch_size=batch_size,
    shuffle=True,
    collate_fn=generation_data_collator_wa_trg_ids,
)

validation_dataset = model_with_wa_trg_ids.prepare_tf_dataset(
    wa_trg_id_dataset["test"],
    batch_size=batch_size,
    shuffle=False,
    collate_fn=generation_data_collator_wa_trg_ids,
)

generation_dataset = model_with_wa_trg_ids.prepare_tf_dataset(
    wa_trg_id_dataset["test"],
    batch_size=8,
    shuffle=False,
    collate_fn=generation_data_collator_wa_trg_ids,
)

In [34]:
optimizer = AdamWeightDecay(learning_rate=learning_rate, weight_decay_rate=weight_decay)
model_with_wa_trg_ids.compile(optimizer=optimizer)

No loss specified in compile() - the model's internal loss computation will be used as the loss. Don't panic - this is a common way to train TensorFlow models in Transformers! To disable this behaviour please pass a loss argument, or explicitly pass `loss=None` if you do not want your model to compute a loss.


In [35]:
metric_callback = KerasMetricCallback(
    metric_fn=metric_fn, eval_dataset=generation_dataset, predict_with_generate=True, use_xla_generation=True, 
    generate_kwargs={"max_length": 128}
)

In [36]:
tensorboard_callback = TensorBoard(log_dir="./wa_trg_ids/logs")

callbacks = [metric_callback, tensorboard_callback]

model_with_wa_trg_ids.fit(
    train_dataset, validation_data=validation_dataset, epochs=1, callbacks=callbacks
)



<keras.callbacks.History at 0x1d26ad1deb0>

**BLEU** metric after the run with WA alignment using target indeces is **31.5155**. 

## Running model with WA (using sum of token ids)

In [72]:
max_input_length = 128
max_target_length = 128
source_lang = "en"
target_lang = "fr"
pos_tags = False
wa_tags = True
wa_type ="sums"

split_dataset = loaded_dataset.remove_columns(['pos'])
wa_sums_dataset = split_dataset.map(preprocess_function, batched=True)

  0%|          | 0/2 [00:00<?, ?ba/s]

  0%|          | 0/1 [00:00<?, ?ba/s]

In [73]:
model_with_wa_sums = TFAutoModelForSeq2SeqLM.from_pretrained(model_checkpoint)

Some layers from the model checkpoint at Helsinki-NLP/opus-mt-en-fr were not used when initializing TFMarianMTModel: ['final_logits_bias']
- This IS expected if you are initializing TFMarianMTModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFMarianMTModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
All the layers of TFMarianMTModel were initialized from the model checkpoint at Helsinki-NLP/opus-mt-en-fr.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFMarianMTModel for predictions without further training.


In [74]:
data_collator_wa_sums = DataCollatorForSeq2Seq(tokenizer, model=model_with_wa_sums, return_tensors="tf")

generation_data_collator_wa_sums = DataCollatorForSeq2Seq(tokenizer, model=model_with_wa_sums, return_tensors="tf", pad_to_multiple_of=128)

In [75]:
train_dataset = model_with_wa_sums.prepare_tf_dataset(
    wa_sums_dataset["train"],
    batch_size=batch_size,
    shuffle=True,
    collate_fn=generation_data_collator_wa_sums,
)

validation_dataset = model_with_wa_sums.prepare_tf_dataset(
    wa_sums_dataset["test"],
    batch_size=batch_size,
    shuffle=False,
    collate_fn=generation_data_collator_wa_sums,
)

generation_dataset = model_with_wa_sums.prepare_tf_dataset(
    wa_sums_dataset["test"],
    batch_size=8,
    shuffle=False,
    collate_fn=generation_data_collator_wa_sums,
)

In [76]:
optimizer = AdamWeightDecay(learning_rate=learning_rate, weight_decay_rate=weight_decay)
model_with_wa_sums.compile(optimizer=optimizer)

No loss specified in compile() - the model's internal loss computation will be used as the loss. Don't panic - this is a common way to train TensorFlow models in Transformers! To disable this behaviour please pass a loss argument, or explicitly pass `loss=None` if you do not want your model to compute a loss.


In [77]:
metric_callback = KerasMetricCallback(
    metric_fn=metric_fn, eval_dataset=generation_dataset, predict_with_generate=True, use_xla_generation=True, 
    generate_kwargs={"max_length": 128}
)

In [78]:
tensorboard_callback = TensorBoard(log_dir="./wa_trg_ids/logs")

callbacks = [metric_callback, tensorboard_callback]

model_with_wa_sums.fit(
    train_dataset, validation_data=validation_dataset, epochs=1, callbacks=callbacks
)



<keras.callbacks.History at 0x7f9d8493f070>

**BLEU** metric after the run with WA alignment using sums of related words' indeces is **23.2457**. 

## Running model with WA (using multiplication of token ids)

In [79]:
max_input_length = 128
max_target_length = 128
source_lang = "en"
target_lang = "fr"
pos_tags = False
wa_tags = True
wa_type="mult"

split_dataset = loaded_dataset.remove_columns(['pos'])
wa_mult_dataset = split_dataset.map(preprocess_function, batched=True)

  0%|          | 0/2 [00:00<?, ?ba/s]

  0%|          | 0/1 [00:00<?, ?ba/s]

In [80]:
model_with_wa_mult = TFAutoModelForSeq2SeqLM.from_pretrained(model_checkpoint)

Some layers from the model checkpoint at Helsinki-NLP/opus-mt-en-fr were not used when initializing TFMarianMTModel: ['final_logits_bias']
- This IS expected if you are initializing TFMarianMTModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFMarianMTModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
All the layers of TFMarianMTModel were initialized from the model checkpoint at Helsinki-NLP/opus-mt-en-fr.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFMarianMTModel for predictions without further training.


In [81]:
data_collator_wa_mult = DataCollatorForSeq2Seq(tokenizer, model=model_with_wa_mult, return_tensors="tf")

generation_data_collator_wa_mult = DataCollatorForSeq2Seq(tokenizer, model=model_with_wa_mult, return_tensors="tf", pad_to_multiple_of=128)

In [82]:
train_dataset = model_with_wa_mult.prepare_tf_dataset(
    wa_mult_dataset["train"],
    batch_size=batch_size,
    shuffle=True,
    collate_fn=generation_data_collator_wa_mult,
)

validation_dataset = model_with_wa_mult.prepare_tf_dataset(
    wa_mult_dataset["test"],
    batch_size=batch_size,
    shuffle=False,
    collate_fn=generation_data_collator_wa_mult,
)

generation_dataset = model_with_wa_mult.prepare_tf_dataset(
    wa_mult_dataset["test"],
    batch_size=8,
    shuffle=False,
    collate_fn=generation_data_collator_wa_mult,
)

In [83]:
optimizer = AdamWeightDecay(learning_rate=learning_rate, weight_decay_rate=weight_decay)
model_with_wa_mult.compile(optimizer=optimizer)

No loss specified in compile() - the model's internal loss computation will be used as the loss. Don't panic - this is a common way to train TensorFlow models in Transformers! To disable this behaviour please pass a loss argument, or explicitly pass `loss=None` if you do not want your model to compute a loss.


In [84]:
metric_callback = KerasMetricCallback(
    metric_fn=metric_fn, eval_dataset=generation_dataset, predict_with_generate=True, use_xla_generation=True, 
    generate_kwargs={"max_length": 128}
)

In [85]:
tensorboard_callback = TensorBoard(log_dir="./wa_mult/logs")

callbacks = [metric_callback, tensorboard_callback]

model_with_wa_mult.fit(
    train_dataset, validation_data=validation_dataset, epochs=1, callbacks=callbacks
)



<keras.callbacks.History at 0x7f9d89518be0>

**BLEU** metric after the run with WA alignment using sums of related words' indeces is **26.4307**. 

# Results

In [94]:
results = pd.DataFrame(columns=['Annotation','BLEU'])
results_list = [
    ('No annotation', 18.7710),
    ('Part of Speech', 21.8055),
    ('Word alignment: target token ids', 22.7393),
    ('Word alignment: sum of token ids', 20.2693),
    ('Word alignment: multiplication of token ids', 18.0217)
]
results.append([{'Annotation': x[0], 'BLEU': x[1]} for x in results_list])

Unnamed: 0,Annotation,BLEU
0,No annotation,18.771
1,Part of Speech,21.8055
2,Word alignment: target token ids,22.7393
3,Word alignment: sum of token ids,20.2693
4,Word alignment: multiplication of token ids,18.0217


## Translation with the models

Now we've trained our model, let's see how we could load it and use it to translate text in future! First, let's load it from the hub. This means we can resume the code from here without needing to rerun everything above every time.

Now let's try tokenizing some text and passing it to the model to generate a translation. Don't forget to add the "translate: " string at the start if you're using a `T5` model.

In [40]:
input_text  = "In Chinese painting, abstraction can be traced to the Tang dynasty painter Wang Mo (王墨), who is credited to have invented the splashed-ink painting style."

tokenized = tokenizer([input_text], return_tensors='np')
# In the line below use the variable name of the model you want to test
out = model.generate(**tokenized, max_length=128)
print(out)

tf.Tensor(
[[59513   277     8 12720 18389     2    14     6  2914  7750  1078   168
    100 21481    51    17     8 43774 21520 23696 30510  2734     2    44
     43  2692   274    20     6  1936 30324    19  2955     5 12720    17
  13967    20     6 12319 19727  2130     9  6092     3     0 59513 59513
  59513 59513 59513 59513 59513 59513 59513 59513 59513 59513 59513 59513
  59513 59513 59513 59513 59513 59513 59513 59513 59513 59513 59513 59513
  59513 59513 59513 59513 59513 59513 59513 59513 59513 59513 59513 59513
  59513 59513 59513 59513 59513 59513 59513 59513 59513 59513 59513 59513
  59513 59513 59513 59513 59513 59513 59513 59513 59513 59513 59513 59513
  59513 59513 59513 59513 59513 59513 59513 59513 59513 59513 59513 59513
  59513 59513 59513 59513 59513 59513 59513 59513]], shape=(1, 128), dtype=int32)


Well, that's some tokens and a lot of padding! Let's decode those to see what it says, using the `skip_special_tokens` argument to skip those padding tokens:

In [41]:
with tokenizer.as_target_tokenizer():
    print(tokenizer.decode(out[0], skip_special_tokens=True))

Dans la peinture chinoise, l'abstraction peut être tracée à la dynastie Tang peintre Wang Mo, qui est crédité d'avoir inventé le style de peinture à jet d'éclaboussures.




### Application to the Vital articles

In [49]:
vital = pd.read_csv("C:\\Users\\Utilisateur\\Documents\\wiki\\scrapping\\ver2\\articles_clean_ver2\\en_only_fr.csv", sep=";", encoding="iso-8859-1")
vital.head()

Unnamed: 0,article,sentences
0,Abstract art,"In Chinese painting, abstraction can be traced..."
1,Abstract art,"While none of his paintings remain, this style..."
2,Abstract art,"The Chan buddhist painter Liang Kai (??, c. 11..."
3,Abstract art,"A late Song painter named Yu Jian, adept to Ti..."
4,Alan Turing,"When Turing was 39 years old in 1951, he turne..."


#### Model without any fine-tuning

In [52]:
outputs = []

for input_sentence in vital["sentences"]:
    tokenized_sentence = tokenizer([input_sentence], return_tensors='np')
    out = model.generate(**tokenized_sentence, max_length=128)
    with tokenizer.as_target_tokenizer():
        output_sentence = tokenizer.decode(out[0], skip_special_tokens=True)
        print(output_sentence)
        outputs.append(output_sentence)



Dans la peinture chinoise, l'abstraction peut être tracée à la dynastie Tang peintre Wang Mo (??), qui est crédité d'avoir inventé le style de peinture éclaboussée.
Bien qu'aucune de ses peintures ne reste, ce style est clairement vu dans certains Song Dynasty Paintings.
Le peintre bouddhiste Chan Liang Kai (??, vers 1140=1210) a appliqué le style à la peinture figurative dans son "Immortal in splashed enk" dans lequel une représentation précise est sacrifiée pour améliorer la spontanéité liée à l'esprit non-rationnel de l'éclairé.
Un peintre de feu Song nommé Yu Jian, adepte du bouddhisme de Tiantai, a créé une série de paysages d'encre éclaboussées qui a finalement inspiré de nombreux peintres japonais Zen.
Quand Turing avait 39 ans en 1951, il se tourna vers la biologie mathématique, publiant finalement son chef-d'œuvre "The Chemical Bases of Morphogenèse" en janvier 1952.
Il s'intéressait à la morphogenèse, au développement de modèles et de formes dans les organismes biologiques.
I

In [54]:
vital["initial_model"] = outputs
vital.to_csv("translations.csv", index=False)

In [55]:
vital.head()

Unnamed: 0,article,sentences,initial_model
0,Abstract art,"In Chinese painting, abstraction can be traced...","Dans la peinture chinoise, l'abstraction peut ..."
1,Abstract art,"While none of his paintings remain, this style...","Bien qu'aucune de ses peintures ne reste, ce s..."
2,Abstract art,"The Chan buddhist painter Liang Kai (??, c. 11...","Le peintre bouddhiste Chan Liang Kai (??, vers..."
3,Abstract art,"A late Song painter named Yu Jian, adept to Ti...","Un peintre de feu Song nommé Yu Jian, adepte d..."
4,Alan Turing,"When Turing was 39 years old in 1951, he turne...","Quand Turing avait 39 ans en 1951, il se tourn..."


#### Fine-tuned model: no annotation

In [56]:
vital = pd.read_csv("translations.csv")
vital.head()

Unnamed: 0,article,sentences,initial_model
0,Abstract art,"In Chinese painting, abstraction can be traced...","Dans la peinture chinoise, l'abstraction peut ..."
1,Abstract art,"While none of his paintings remain, this style...","Bien qu'aucune de ses peintures ne reste, ce s..."
2,Abstract art,"The Chan buddhist painter Liang Kai (??, c. 11...","Le peintre bouddhiste Chan Liang Kai (??, vers..."
3,Abstract art,"A late Song painter named Yu Jian, adept to Ti...","Un peintre de feu Song nommé Yu Jian, adepte d..."
4,Alan Turing,"When Turing was 39 years old in 1951, he turne...","Quand Turing avait 39 ans en 1951, il se tourn..."


In [63]:
outputs_no_anno = []

for input_sentence in vital["sentences"]:
    tokenized_sentence = tokenizer([input_sentence], return_tensors='np')
    out = model.generate(**tokenized_sentence, max_length=128)
    with tokenizer.as_target_tokenizer():
        output_sentence = tokenizer.decode(out[0], skip_special_tokens=True)
        print(output_sentence)
        outputs_no_anno.append(output_sentence)



Dans la peinture chinoise l'abstraction peut être tracée par le peintre de la dynastie Tang Wang Mo (??) qui est crédité d'avoir inventé le style de peinture à jets d'éclaboussures.
Bien qu'aucune de ses peintures ne subsiste, ce style est clairement vu dans certains Song Dynasty Paintings.
Le peintre bouddhiste Chan Liang Kai (??, vers 1140=1210) a appliqué le style à la peinture figurative dans son « Immortal in splashed enk » dans lequel une représentation précise est sacrifiée pour renforcer la spontanéité liée à l'esprit non rationnel de l'éclairé.
Un peintre de feu Song nommé Yu Jian, adepte du Bouddhisme de Tiantai, crée une série de paysages à l'encre éclaboussées qui inspirent finalement de nombreux peintres japonais Zens............................................................................
Quand Turing avait 39 ans en 1951 il se tourna vers la biologie mathématique et publia finalement son chef-d'œuvre « The Chemical Bases of Morphogenèse » en janvier 1952.
Il s'intéres

In [65]:
vital["ft_no_anno"] = outputs_no_anno
vital.head()

Unnamed: 0,article,sentences,initial_model,ft_no_anno
0,Abstract art,"In Chinese painting, abstraction can be traced...","Dans la peinture chinoise, l'abstraction peut ...",Dans la peinture chinoise l'abstraction peut ê...
1,Abstract art,"While none of his paintings remain, this style...","Bien qu'aucune de ses peintures ne reste, ce s...","Bien qu'aucune de ses peintures ne subsiste, c..."
2,Abstract art,"The Chan buddhist painter Liang Kai (??, c. 11...","Le peintre bouddhiste Chan Liang Kai (??, vers...","Le peintre bouddhiste Chan Liang Kai (??, vers..."
3,Abstract art,"A late Song painter named Yu Jian, adept to Ti...","Un peintre de feu Song nommé Yu Jian, adepte d...","Un peintre de feu Song nommé Yu Jian, adepte d..."
4,Alan Turing,"When Turing was 39 years old in 1951, he turne...","Quand Turing avait 39 ans en 1951, il se tourn...",Quand Turing avait 39 ans en 1951 il se tourna...


In [70]:
vital

Unnamed: 0,article,sentences,initial_model,ft_no_anno
0,Abstract art,"In Chinese painting, abstraction can be traced...","Dans la peinture chinoise, l'abstraction peut ...",Dans la peinture chinoise l'abstraction peut ê...
1,Abstract art,"While none of his paintings remain, this style...","Bien qu'aucune de ses peintures ne reste, ce s...","Bien qu'aucune de ses peintures ne subsiste, c..."
2,Abstract art,"The Chan buddhist painter Liang Kai (??, c. 11...","Le peintre bouddhiste Chan Liang Kai (??, vers...","Le peintre bouddhiste Chan Liang Kai (??, vers..."
3,Abstract art,"A late Song painter named Yu Jian, adept to Ti...","Un peintre de feu Song nommé Yu Jian, adepte d...","Un peintre de feu Song nommé Yu Jian, adepte d..."
4,Alan Turing,"When Turing was 39 years old in 1951, he turne...","Quand Turing avait 39 ans en 1951, il se tourn...",Quand Turing avait 39 ans en 1951 il se tourna...
5,Alan Turing,"He was interested in morphogenesis, the develo...","Il s'intéressait à la morphogenèse, au dévelop...","Il s'intéressait à la morphogenèse, au dévelop..."
6,Alan Turing,He suggested that a system of chemicals reacti...,Il a suggéré qu'un système de produits chimiqu...,Il suggère qu'un système de produits chimiques...
7,Alan Turing,He used systems of partial differential equati...,Il a utilisé des systèmes d'équations différen...,Il utilise des systèmes d'équations différenti...
8,Alan Turing,"For example, if a catalyst A is required for a...","Par exemple, si un catalyseur A est nécessaire...","Par exemple, si un catalyseur A est nécessaire..."
9,Alan Turing,Turing discovered that patterns could be creat...,Turing a découvert que des patrons pouvaient ê...,Turing découvre que des motifs peuvent être cr...


In [69]:
vital.to_csv("translations_ft_no.csv", index=False)

#### Fine-tuned model: POS tags

Unnamed: 0,article,sentences,initial_model,ft_no_anno
0,Abstract art,"In Chinese painting, abstraction can be traced...","Dans la peinture chinoise, l'abstraction peut ...",Dans la peinture chinoise l'abstraction peut ê...
1,Abstract art,"While none of his paintings remain, this style...","Bien qu'aucune de ses peintures ne reste, ce s...","Bien qu'aucune de ses peintures ne subsiste, c..."
2,Abstract art,"The Chan buddhist painter Liang Kai (??, c. 11...","Le peintre bouddhiste Chan Liang Kai (??, vers...","Le peintre bouddhiste Chan Liang Kai (??, vers..."
3,Abstract art,"A late Song painter named Yu Jian, adept to Ti...","Un peintre de feu Song nommé Yu Jian, adepte d...","Un peintre de feu Song nommé Yu Jian, adepte d..."
4,Alan Turing,"When Turing was 39 years old in 1951, he turne...","Quand Turing avait 39 ans en 1951, il se tourn...",Quand Turing avait 39 ans en 1951 il se tourna...


#### Fine-tuned model: WA

In [None]:
vital = pd.read_csv("translations_ft_no.csv")
vital.head()

In [38]:
outputs_wa = []

for input_sentence in vital["sentences"]:
    tokenized_sentence = tokenizer([input_sentence], return_tensors='np')
    out = model_with_wa_trg_ids.generate(**tokenized_sentence, max_length=128)
    with tokenizer.as_target_tokenizer():
        output_sentence = tokenizer.decode(out[0], skip_special_tokens=True)
        print(output_sentence)
        outputs_wa.append(output_sentence)



Dans la peinture chinoise l'abstraction peut être tracée à la dynastie Tang peintre Wang Mo (??), qui est crédité d'avoir inventé le style de peinture éclaboussure-puce.................................................................................
Bien qu'aucune de ses peintures ne reste, ce style est clairement vu dans certains Song Dynasty Paintings.....................................................................................................
Le peintre bouddhiste de Chan Liang Kai (?? vers 1140=1210) a appliqué le style à la peinture figurative dans son « Immortal in splashed enk » dans lequel une représentation précise est sacrifiée pour améliorer la spontanéité liée à l'esprit non rationnel de l'éclairé.
Un peintre de feu Song nommé Yu Jian, adepte du Bouddhisme de Tiantai a créé une série de paysages d'encre éclaboussés qui ont finalement inspiré de nombreux peintres japonais Zen.
Quand Turing avait 39 ans en 1951 il se tourna vers la biologie mathématique et publia final

In [40]:
vital["ft_wa_trg"] = outputs_wa
vital

Unnamed: 0,article,sentences,initial_model,ft_no_anno,ft_wa_trg
0,Abstract art,"In Chinese painting, abstraction can be traced...","Dans la peinture chinoise, l'abstraction peut ...",Dans la peinture chinoise l'abstraction peut ê...,Dans la peinture chinoise l'abstraction peut ê...
1,Abstract art,"While none of his paintings remain, this style...","Bien qu'aucune de ses peintures ne reste, ce s...","Bien qu'aucune de ses peintures ne subsiste, c...","Bien qu'aucune de ses peintures ne reste, ce s..."
2,Abstract art,"The Chan buddhist painter Liang Kai (??, c. 11...","Le peintre bouddhiste Chan Liang Kai (??, vers...","Le peintre bouddhiste Chan Liang Kai (??, vers...",Le peintre bouddhiste de Chan Liang Kai (?? ve...
3,Abstract art,"A late Song painter named Yu Jian, adept to Ti...","Un peintre de feu Song nommé Yu Jian, adepte d...","Un peintre de feu Song nommé Yu Jian, adepte d...","Un peintre de feu Song nommé Yu Jian, adepte d..."
4,Alan Turing,"When Turing was 39 years old in 1951, he turne...","Quand Turing avait 39 ans en 1951, il se tourn...",Quand Turing avait 39 ans en 1951 il se tourna...,Quand Turing avait 39 ans en 1951 il se tourna...
5,Alan Turing,"He was interested in morphogenesis, the develo...","Il s'intéressait à la morphogenèse, au dévelop...","Il s'intéressait à la morphogenèse, au dévelop...","Il s'intéressait à la morphogenèse, au dévelop..."
6,Alan Turing,He suggested that a system of chemicals reacti...,Il a suggéré qu'un système de produits chimiqu...,Il suggère qu'un système de produits chimiques...,Il suggère qu'un système de produits chimiques...
7,Alan Turing,He used systems of partial differential equati...,Il a utilisé des systèmes d'équations différen...,Il utilise des systèmes d'équations différenti...,Il utilise des systèmes d'équations différenti...
8,Alan Turing,"For example, if a catalyst A is required for a...","Par exemple, si un catalyseur A est nécessaire...","Par exemple, si un catalyseur A est nécessaire...","Par exemple, si un catalyseur A est nécessaire..."
9,Alan Turing,Turing discovered that patterns could be creat...,Turing a découvert que des patrons pouvaient ê...,Turing découvre que des motifs peuvent être cr...,Turing découvre que des motifs peuvent être cr...


In [41]:
vital.to_csv("translations_ft_wa.csv", index=False)