# 📈 Transformer-based Data Augmentation in NLP 

The training size will impact the performace of a model heavily, this notebook looks into the possibilities of performing data augmentation on an NLP dataset. Data augmentation techniques are used to generate additional samples. 

Data augmentation is already standard practice in computer vision projects 👌, but can also be leveraged in multilingual NLP problems. We'll use a limited trainingset to simulate a real-world use case, where we often are constrained by the size of the available data 🤦. 

We'll focuss on using back-translation and contextual word-embedding insertions as data augmentation techniques 🤗.

## 🛠️ Getting started

The cells below will setup everything that is required to get started with data augmentation and finetuning an NLP model with the HuggingFace API.

### Setup

In [None]:
!pip install -q transformers sentencepiece datasets tokenizers nltk nlpaug 

### Imports

In [2]:
import re
import numpy as np
import pandas as pd 

import nltk
import nlpaug.flow as naf
import nlpaug.augmenter.word as naw
import plotly.graph_objects as go
from transformers import pipeline, AutoTokenizer, AutoModelForSequenceClassification, TrainingArguments, Trainer, TrainerCallback
from datasets import load_dataset, concatenate_datasets, load_from_disk, load_metric

### Download dataset
Since we're particulary interested in multilingual NLP, we'll use a well-known dutch dataset [DBRD](https://github.com/benjaminvdb/DBRD). The dataset contains over 110k book reviews along with associated binary sentiment polarity labels. The downstream task will be assigning a sentiment to a book review.  

In [None]:
max_input_len=128
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-multilingual-cased")

# filtering on samples that have a token count less than 128
# the translations are more accurate for shorter input texts 
# slight increase in performance when only considering shorter texts  
book_review_ds = load_dataset("dbrd").filter(lambda e: len(tokenizer.batch_encode_plus([e['text']]).input_ids[0]) < int(max_input_len))

In [4]:
# Limiting the size of the training dataset to simulate our low-data use case
book_review_train_ds = book_review_ds["train"].shuffle(seed=42).select(range(50))

book_review_test_ds = book_review_ds["test"] 

## Data augmentation pipelines

### ㊗️ Back-translation 
We'll be using the [MarianMT](https://huggingface.co/transformers/model_doc/marian.html) model to perform back-translations, the translated sentences should be similar in context but not structurally identical. The back-translation process is as follows:

1.   Translate a Dutch book review into French
2.   Translate the resulting French text into English
3.   Translate the resulting English text back into Dutch

In [None]:
trans_pipeline_en_nl = pipeline(
    task='translation_en_to_nl',
    model='Helsinki-NLP/opus-mt-en-nl',
    tokenizer='Helsinki-NLP/opus-mt-en-nl',
    device=0)
trans_pipeline_nl_fr = pipeline(
    task='translation_nl_to_fr',
    model='Helsinki-NLP/opus-mt-nl-fr',
    tokenizer='Helsinki-NLP/opus-mt-nl-fr',
    device=0)
trans_pipeline_fr_en = pipeline(
    task='translation_fr_to_en',
    model='Helsinki-NLP/opus-mt-fr-en',
    tokenizer='Helsinki-NLP/opus-mt-fr-en',
    device=0)

In [6]:
def back_tranlation_nl_fr_en_nl(texts):
    fr_texts = trans_pipeline_nl_fr(texts)
    back_translated_texts = trans_pipeline_fr_en([el['translation_text'] for el in fr_texts])
    twohopback_translated_texts = trans_pipeline_en_nl([el['translation_text'] for el in back_translated_texts])
    return [el['translation_text'] for el in twohopback_translated_texts]
    
backtranslate_dataset = lambda dataset: dataset.map(lambda x: {'text': back_tranlation_nl_fr_en_nl(x["text"])}, batch_size=10, batched=True)

In [None]:
# Back-translate the training dataset
book_review_train_ds_back = backtranslate_dataset(book_review_train_ds)

### ✨ Contextual word embedding insertions


The [nlpaug](https://github.com/makcedward/nlpaug) library combines frequently used augmentation techniques into a python package. We'll use the `ContextualWordEmbsForSentenceAug` component that uses contextual word embeddings to find the top n similar words for augmentation. 

The contextual embeddings are retrieved from the tranformer-based pretrained RoBERTa model, which was trained on the Dutch section of the [OSCAR](https://oscar-corpus.com/) corpus. The word embeddings have a dependency on the surrounding words, this defines the **context** of the embededing.  

In [None]:
aug = naf.Sequential([
    naw.ContextualWordEmbsAug(
        model_path='pdelobelle/robbert-v2-dutch-base',
        model_type='roberta',
        aug_p=0.20,
        action="insert")
])

replace_newline = lambda dataset: dataset.map(lambda x: {'text': x["text"].replace("\n",' ')}, batched=False)
contextual_emb_aug = lambda dataset: dataset.map(lambda x: {'text': aug.augment(x["text"])},  batch_size=10, batched=True)

In [None]:
# Removing newlines in the text and performing word insertions based on contextual word embeddings
book_review_train_ds_newline = replace_newline(book_review_train_ds)
book_review_train_ds_contemb = contextual_emb_aug(book_review_train_ds_newline)

### 💪 Combination of both techniques
Digging deeper into our bag of tricks 🔥! 

This approach will combine both back-translation and contextual word-embedding insertions as follows:

1.   Inserting new words by using the contextual word-embeddings 
2.   Back-translate the augmented textual dataset


In [None]:
# Combination of both contextual word embedding insertion and back-translation
book_review_train_ds_contemb_back = backtranslate_dataset(book_review_train_ds_contemb)

### 🏆 Honourable mentions

#### 🦜 Parrot Paraphraser
[Parrot](https://github.com/PrithivirajDamodaran/Parrot_Paraphraser) is a paraphrase augmentation framework. This library provides a pre-trained text paraphrasing model, to generate paraphrases that preserve the original intent. It uses the T5 text-to-text transformer as the base model. We didn't use this library on this use case since our problem statement focusses on non-English textual data. 

#### AugLy
Facebook recently open-sourced an augmentation library, [AugLy](https://github.com/facebookresearch/AugLy), that combines data augmentation techniques for several modalities including text, audio, video and images. It is more dedicated towards data augmentations that are applicable for social media content. 

## 🚀 Model 

In [None]:
metric = load_metric("accuracy")


batch_size = 8
epochs = 20
max_steps = epochs * int(((len(book_review_train_ds)*3)/batch_size)) 

run_dicts = [] # list of dicts to store both metrics and logs for all the experiment runs 

In [12]:
def compute_metrics(eval_pred):
    """
        Calculates the accuracy of the model's predictions, calculated as follows; (TP + TN) / (TP + TN + FP + FN) with TP: True positive TN: True negative FP: False positive FN: False negative
    """

    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=-1)
    return metric.compute(predictions=predictions, references=labels) 


class LogAccumulatorCallback(TrainerCallback):
    """
    A class that stores both the training and the evaluation loss
    """
    
    def __init__(self):
        self.acc_logs = []

    def on_log(self, args, state, control, logs=None, **kwargs):
        _ = logs.pop("total_flos", None)
        if state.is_local_process_zero and ('loss' in logs or 'eval_loss' in logs):
            self.acc_logs.append(logs.copy())


def train_and_evaluate(train_ds, test_ds, identifier):
    def tokenize(batch):
        return tokenizer(batch['text'], padding=True, truncation=True)
    
    train_ds = train_ds.map(tokenize, batched=True, batch_size=len(train_ds), remove_columns=["text"])
    test_ds = test_ds.map(tokenize, batched=True, batch_size=len(test_ds), remove_columns=["text"])
    
    
    training_args = TrainingArguments(
        identifier,
        evaluation_strategy="steps",
        per_device_train_batch_size=batch_size,
        per_device_eval_batch_size=batch_size,
        eval_steps=25,
        logging_steps=25,
        max_steps=max_steps,
        learning_rate=2e-5,
    )
    
    model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-multilingual-cased", num_labels=2)

    # Partially freezing the weights of initial layers of the model
    # Since we're working on small datasets as it usually reduces overfitting
    # Another advantage of partial freezing is reduced memory usage and a speed improvement during training.
    for block in model.distilbert.embeddings.modules():
        for param in block.parameters():
            param.requires_grad=False

    for i in [0,1,2]:
        for block in model.distilbert.transformer.layer[i].modules():
            for param in block.parameters():
                param.requires_grad=False

            
    logger = LogAccumulatorCallback()
    trainer = Trainer(
        model=model, args=training_args, 
        train_dataset=train_ds, 
        eval_dataset=test_ds,
        compute_metrics=compute_metrics,
        callbacks=[logger],
    )
    trainer.train()
    metrics = trainer.evaluate()
    
    return metrics, logger.acc_logs

### Model baseline

In [13]:
metrics, logs = train_and_evaluate(book_review_train_ds, book_review_test_ds, "baseline")

run_dicts.append({
    "id": "baseline",
    "metrics": metrics,
    "logs": logs
})

HBox(children=(FloatProgress(value=0.0, max=1.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=1.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=541808922.0, style=ProgressStyle(descri…




Some weights of the model checkpoint at distilbert-base-multilingual-cased were not used when initializing DistilBertForSequenceClassification: ['vocab_projector.bias', 'vocab_projector.weight', 'vocab_transform.bias', 'vocab_layer_norm.weight', 'vocab_layer_norm.bias', 'vocab_transform.weight']
- This IS expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-multilingual-cased and are newly initialized: ['pre_classifier.weight', 'classif

Step,Training Loss,Validation Loss,Accuracy
25,0.66,0.692808,0.570502
50,0.5455,0.681091,0.596434
75,0.2968,0.785878,0.623987
100,0.0576,1.094799,0.640194
125,0.0097,1.341968,0.641815
150,0.0049,1.408342,0.641815
175,0.0035,1.486777,0.646677
200,0.0029,1.502415,0.646677
225,0.0025,1.53735,0.646677
250,0.002,1.602132,0.645057


***** Running Evaluation *****
  Num examples = 617
  Batch size = 8
***** Running Evaluation *****
  Num examples = 617
  Batch size = 8
***** Running Evaluation *****
  Num examples = 617
  Batch size = 8
***** Running Evaluation *****
  Num examples = 617
  Batch size = 8
***** Running Evaluation *****
  Num examples = 617
  Batch size = 8
***** Running Evaluation *****
  Num examples = 617
  Batch size = 8
***** Running Evaluation *****
  Num examples = 617
  Batch size = 8
***** Running Evaluation *****
  Num examples = 617
  Batch size = 8
***** Running Evaluation *****
  Num examples = 617
  Batch size = 8
***** Running Evaluation *****
  Num examples = 617
  Batch size = 8
***** Running Evaluation *****
  Num examples = 617
  Batch size = 8
***** Running Evaluation *****
  Num examples = 617
  Batch size = 8
***** Running Evaluation *****
  Num examples = 617
  Batch size = 8
***** Running Evaluation *****
  Num examples = 617
  Batch size = 8


Training completed. Do not forge


### Model back-translated

In [14]:
train_ds = concatenate_datasets([book_review_train_ds, book_review_train_ds_back])
metrics, logs = train_and_evaluate(train_ds, book_review_test_ds, "backtranslated")

run_dicts.append({
    "id": "backtranslated",
    "metrics": metrics,
    "logs": logs
})

HBox(children=(FloatProgress(value=0.0, max=1.0), HTML(value='')))

Loading cached processed dataset at /root/.cache/huggingface/datasets/dbrd/plain_text/3.0.0/a454f53ccf247517cbb44e57f07904d4adefc5837d766f6120ff467ea7a465f7/cache-cec207af9e123f47.arrow
PyTorch: setting up devices
The default value for the training argument `--report_to` will change in v5 (from all installed integrations to none). In v5, you will need to use `--report_to all` to get the same behavior as now. You should start updating your code and make this info disappear :-).





loading configuration file https://huggingface.co/distilbert-base-multilingual-cased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/cf37a9dc282a679f121734d06f003625d14cfdaf55c14358c4c0b8e7e2b89ac9.7a727bd85e40715bec919a39cdd6f0aba27a8cd488f2d4e0f512448dcd02bf0f
Model config DistilBertConfig {
  "activation": "gelu",
  "architectures": [
    "DistilBertForMaskedLM"
  ],
  "attention_dropout": 0.1,
  "dim": 768,
  "dropout": 0.1,
  "hidden_dim": 3072,
  "initializer_range": 0.02,
  "max_position_embeddings": 512,
  "model_type": "distilbert",
  "n_heads": 12,
  "n_layers": 6,
  "output_past": true,
  "pad_token_id": 0,
  "qa_dropout": 0.1,
  "seq_classif_dropout": 0.2,
  "sinusoidal_pos_embds": false,
  "tie_weights_": true,
  "transformers_version": "4.8.1",
  "vocab_size": 119547
}

Some weights of the model checkpoint at distilbert-base-multilingual-cased were not used when initializing DistilBertForSequenceClassification: ['vocab_projector.bias', 'vocab_

Step,Training Loss,Validation Loss,Accuracy
25,0.6997,0.697789,0.444084
50,0.6368,0.652168,0.619125
75,0.4572,0.635084,0.646677
100,0.2224,0.806585,0.654781
125,0.0649,1.133113,0.65316
150,0.0113,1.3075,0.666126
175,0.0053,1.473922,0.654781
200,0.0039,1.515776,0.659643
225,0.0029,1.55109,0.661264
250,0.0027,1.628136,0.649919


***** Running Evaluation *****
  Num examples = 617
  Batch size = 8
***** Running Evaluation *****
  Num examples = 617
  Batch size = 8
***** Running Evaluation *****
  Num examples = 617
  Batch size = 8
***** Running Evaluation *****
  Num examples = 617
  Batch size = 8
***** Running Evaluation *****
  Num examples = 617
  Batch size = 8
***** Running Evaluation *****
  Num examples = 617
  Batch size = 8
***** Running Evaluation *****
  Num examples = 617
  Batch size = 8
***** Running Evaluation *****
  Num examples = 617
  Batch size = 8
***** Running Evaluation *****
  Num examples = 617
  Batch size = 8
***** Running Evaluation *****
  Num examples = 617
  Batch size = 8
***** Running Evaluation *****
  Num examples = 617
  Batch size = 8
***** Running Evaluation *****
  Num examples = 617
  Batch size = 8
***** Running Evaluation *****
  Num examples = 617
  Batch size = 8
***** Running Evaluation *****
  Num examples = 617
  Batch size = 8


Training completed. Do not forge

### Model contextual word embedding insertions



In [15]:
train_ds = concatenate_datasets([book_review_train_ds, book_review_train_ds_contemb])

metrics, logs = train_and_evaluate(train_ds, book_review_test_ds, "contextual_embedding")

run_dicts.append({
    "id": "contextual_embedding",
    "metrics": metrics,
    "logs": logs
})

HBox(children=(FloatProgress(value=0.0, max=1.0), HTML(value='')))

Loading cached processed dataset at /root/.cache/huggingface/datasets/dbrd/plain_text/3.0.0/a454f53ccf247517cbb44e57f07904d4adefc5837d766f6120ff467ea7a465f7/cache-cec207af9e123f47.arrow
PyTorch: setting up devices
The default value for the training argument `--report_to` will change in v5 (from all installed integrations to none). In v5, you will need to use `--report_to all` to get the same behavior as now. You should start updating your code and make this info disappear :-).





loading configuration file https://huggingface.co/distilbert-base-multilingual-cased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/cf37a9dc282a679f121734d06f003625d14cfdaf55c14358c4c0b8e7e2b89ac9.7a727bd85e40715bec919a39cdd6f0aba27a8cd488f2d4e0f512448dcd02bf0f
Model config DistilBertConfig {
  "activation": "gelu",
  "architectures": [
    "DistilBertForMaskedLM"
  ],
  "attention_dropout": 0.1,
  "dim": 768,
  "dropout": 0.1,
  "hidden_dim": 3072,
  "initializer_range": 0.02,
  "max_position_embeddings": 512,
  "model_type": "distilbert",
  "n_heads": 12,
  "n_layers": 6,
  "output_past": true,
  "pad_token_id": 0,
  "qa_dropout": 0.1,
  "seq_classif_dropout": 0.2,
  "sinusoidal_pos_embds": false,
  "tie_weights_": true,
  "transformers_version": "4.8.1",
  "vocab_size": 119547
}

Some weights of the model checkpoint at distilbert-base-multilingual-cased were not used when initializing DistilBertForSequenceClassification: ['vocab_projector.bias', 'vocab_

Step,Training Loss,Validation Loss,Accuracy
25,0.6882,0.685484,0.567261
50,0.5967,0.650272,0.604538
75,0.3939,0.665281,0.628849
100,0.1432,0.850971,0.659643
125,0.0289,1.129702,0.658023
150,0.0094,1.262928,0.661264
175,0.0051,1.382803,0.664506
200,0.0037,1.443626,0.664506
225,0.0029,1.468462,0.667747
250,0.0028,1.49951,0.667747


***** Running Evaluation *****
  Num examples = 617
  Batch size = 8
***** Running Evaluation *****
  Num examples = 617
  Batch size = 8
***** Running Evaluation *****
  Num examples = 617
  Batch size = 8
***** Running Evaluation *****
  Num examples = 617
  Batch size = 8
***** Running Evaluation *****
  Num examples = 617
  Batch size = 8
***** Running Evaluation *****
  Num examples = 617
  Batch size = 8
***** Running Evaluation *****
  Num examples = 617
  Batch size = 8
***** Running Evaluation *****
  Num examples = 617
  Batch size = 8
***** Running Evaluation *****
  Num examples = 617
  Batch size = 8
***** Running Evaluation *****
  Num examples = 617
  Batch size = 8
***** Running Evaluation *****
  Num examples = 617
  Batch size = 8
***** Running Evaluation *****
  Num examples = 617
  Batch size = 8
***** Running Evaluation *****
  Num examples = 617
  Batch size = 8
***** Running Evaluation *****
  Num examples = 617
  Batch size = 8


Training completed. Do not forge

### Model back-translated & contextual word embedding insertions

In [16]:
train_ds = concatenate_datasets([book_review_train_ds,  book_review_train_ds_contemb_back])

metrics, logs = train_and_evaluate(train_ds, book_review_test_ds, "backtranslated_contextual_embedding")

run_dicts.append({
    "id": "backtranslated_contextual_embedding",
    "metrics": metrics,
    "logs": logs
})

HBox(children=(FloatProgress(value=0.0, max=1.0), HTML(value='')))

Loading cached processed dataset at /root/.cache/huggingface/datasets/dbrd/plain_text/3.0.0/a454f53ccf247517cbb44e57f07904d4adefc5837d766f6120ff467ea7a465f7/cache-cec207af9e123f47.arrow
PyTorch: setting up devices
The default value for the training argument `--report_to` will change in v5 (from all installed integrations to none). In v5, you will need to use `--report_to all` to get the same behavior as now. You should start updating your code and make this info disappear :-).





loading configuration file https://huggingface.co/distilbert-base-multilingual-cased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/cf37a9dc282a679f121734d06f003625d14cfdaf55c14358c4c0b8e7e2b89ac9.7a727bd85e40715bec919a39cdd6f0aba27a8cd488f2d4e0f512448dcd02bf0f
Model config DistilBertConfig {
  "activation": "gelu",
  "architectures": [
    "DistilBertForMaskedLM"
  ],
  "attention_dropout": 0.1,
  "dim": 768,
  "dropout": 0.1,
  "hidden_dim": 3072,
  "initializer_range": 0.02,
  "max_position_embeddings": 512,
  "model_type": "distilbert",
  "n_heads": 12,
  "n_layers": 6,
  "output_past": true,
  "pad_token_id": 0,
  "qa_dropout": 0.1,
  "seq_classif_dropout": 0.2,
  "sinusoidal_pos_embds": false,
  "tie_weights_": true,
  "transformers_version": "4.8.1",
  "vocab_size": 119547
}

Some weights of the model checkpoint at distilbert-base-multilingual-cased were not used when initializing DistilBertForSequenceClassification: ['vocab_projector.bias', 'vocab_

Step,Training Loss,Validation Loss,Accuracy
25,0.6957,0.698063,0.460292
50,0.642,0.668173,0.583468
75,0.5096,0.656209,0.617504
100,0.3412,0.669859,0.649919
125,0.16,0.850245,0.664506
150,0.0457,1.058329,0.679092
175,0.0127,1.221674,0.679092
200,0.0063,1.386224,0.658023
225,0.0044,1.379554,0.672609
250,0.0037,1.445592,0.672609


***** Running Evaluation *****
  Num examples = 617
  Batch size = 8
***** Running Evaluation *****
  Num examples = 617
  Batch size = 8
***** Running Evaluation *****
  Num examples = 617
  Batch size = 8
***** Running Evaluation *****
  Num examples = 617
  Batch size = 8
***** Running Evaluation *****
  Num examples = 617
  Batch size = 8
***** Running Evaluation *****
  Num examples = 617
  Batch size = 8
***** Running Evaluation *****
  Num examples = 617
  Batch size = 8
***** Running Evaluation *****
  Num examples = 617
  Batch size = 8
***** Running Evaluation *****
  Num examples = 617
  Batch size = 8
***** Running Evaluation *****
  Num examples = 617
  Batch size = 8
***** Running Evaluation *****
  Num examples = 617
  Batch size = 8
***** Running Evaluation *****
  Num examples = 617
  Batch size = 8
***** Running Evaluation *****
  Num examples = 617
  Batch size = 8
***** Running Evaluation *****
  Num examples = 617
  Batch size = 8


Training completed. Do not forge

### Model with combined augmented datasets, the whole merry gang together

In [17]:
train_ds = concatenate_datasets([book_review_train_ds, book_review_train_ds_back, book_review_train_ds_contemb, book_review_train_ds_contemb_back])

metrics, logs = train_and_evaluate(train_ds, book_review_test_ds, "combined_augmented_data")

run_dicts.append({
    "id": "combined_augmented_data",
    "metrics": metrics,
    "logs": logs
})

HBox(children=(FloatProgress(value=0.0, max=1.0), HTML(value='')))

Loading cached processed dataset at /root/.cache/huggingface/datasets/dbrd/plain_text/3.0.0/a454f53ccf247517cbb44e57f07904d4adefc5837d766f6120ff467ea7a465f7/cache-cec207af9e123f47.arrow
PyTorch: setting up devices
The default value for the training argument `--report_to` will change in v5 (from all installed integrations to none). In v5, you will need to use `--report_to all` to get the same behavior as now. You should start updating your code and make this info disappear :-).





loading configuration file https://huggingface.co/distilbert-base-multilingual-cased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/cf37a9dc282a679f121734d06f003625d14cfdaf55c14358c4c0b8e7e2b89ac9.7a727bd85e40715bec919a39cdd6f0aba27a8cd488f2d4e0f512448dcd02bf0f
Model config DistilBertConfig {
  "activation": "gelu",
  "architectures": [
    "DistilBertForMaskedLM"
  ],
  "attention_dropout": 0.1,
  "dim": 768,
  "dropout": 0.1,
  "hidden_dim": 3072,
  "initializer_range": 0.02,
  "max_position_embeddings": 512,
  "model_type": "distilbert",
  "n_heads": 12,
  "n_layers": 6,
  "output_past": true,
  "pad_token_id": 0,
  "qa_dropout": 0.1,
  "seq_classif_dropout": 0.2,
  "sinusoidal_pos_embds": false,
  "tie_weights_": true,
  "transformers_version": "4.8.1",
  "vocab_size": 119547
}

Some weights of the model checkpoint at distilbert-base-multilingual-cased were not used when initializing DistilBertForSequenceClassification: ['vocab_projector.bias', 'vocab_

Step,Training Loss,Validation Loss,Accuracy
25,0.6856,0.694722,0.444084
50,0.6466,0.654018,0.617504
75,0.5529,0.632713,0.622366
100,0.3999,0.638285,0.648298
125,0.2423,0.725717,0.664506
150,0.1282,0.844552,0.685575
175,0.0467,0.965625,0.685575
200,0.0135,1.163972,0.683955
225,0.0077,1.23976,0.677472
250,0.0047,1.294267,0.680713


***** Running Evaluation *****
  Num examples = 617
  Batch size = 8
***** Running Evaluation *****
  Num examples = 617
  Batch size = 8
***** Running Evaluation *****
  Num examples = 617
  Batch size = 8
***** Running Evaluation *****
  Num examples = 617
  Batch size = 8
***** Running Evaluation *****
  Num examples = 617
  Batch size = 8
***** Running Evaluation *****
  Num examples = 617
  Batch size = 8
***** Running Evaluation *****
  Num examples = 617
  Batch size = 8
***** Running Evaluation *****
  Num examples = 617
  Batch size = 8
***** Running Evaluation *****
  Num examples = 617
  Batch size = 8
***** Running Evaluation *****
  Num examples = 617
  Batch size = 8
***** Running Evaluation *****
  Num examples = 617
  Batch size = 8
***** Running Evaluation *****
  Num examples = 617
  Batch size = 8
***** Running Evaluation *****
  Num examples = 617
  Batch size = 8
***** Running Evaluation *****
  Num examples = 617
  Batch size = 8


Training completed. Do not forge

##  📊 Visualize

In [None]:
df = pd.DataFrame(run_dicts)
df.head()

In [20]:
fig = go.Figure()


for index, row in df.iterrows():
    
    fig.add_trace(go.Scatter(
                    x=list(range(25,max_steps,25)),
                    y=pd.DataFrame(row['logs']).dropna(subset=['eval_accuracy'])['eval_accuracy'],
                    name='accuracy {}'.format(row['id'])))

fig.update_xaxes(title_text='step')
fig.update_yaxes(title_text='accuracy')

fig.show()

## 🏁 Take-aways 


You've reached the finish line! 👏  Let's sum up some of the findings.

* Both back-translation and contextual word embedding insertions boosted the robustness and performance of the model 👌 
* Creativity also helps! 🎨 The combination of both back-translation and contextual word embedding insertions achieved the highest performance. 
* The goal is to use context-preserving augmentation techniques that generate structurally different sentences while preserving the meaning.
* The data from the DBRD dataset was well-represented by the pretrained model, such that training without data-augmentation techniques already yielded good results

We considered 3-hop backtranslation between Dutch, French and English, but you could also include other languages and more hops to generate even more samples . 

You could also try out other text augmentation techniques such as: Synonym Replacement, Random Insertion, Random Swap, Random Deletion. 🕵️‍♂️




