In [15]:
from textattack.augmentation import WordNetAugmenter, EmbeddingAugmenter, EasyDataAugmenter, CharSwapAugmenter, BackTranslationAugmenter
import warnings
from tqdm.notebook import tqdm
import numpy as np
import pandas as pd
from typing import List
warnings.filterwarnings("ignore")

In [3]:
prompt = "I am an example, we will explore textattack's augmentation capabilities with me."

In [8]:
wn_aug = WordNetAugmenter()
emb_aug = EmbeddingAugmenter()
bt_aug = BackTranslationAugmenter()
easy_aug = EasyDataAugmenter()
char_aug = CharSwapAugmenter()

[nltk_data] Downloading package omw-1.4 to /Users/opop/nltk_data...
[nltk_data]   Package omw-1.4 is already up-to-date!
[nltk_data] Downloading package omw-1.4 to /Users/opop/nltk_data...
[nltk_data]   Package omw-1.4 is already up-to-date!


In [45]:
wn_augmented_text = wn_aug.augment(prompt)
emb_augmented_text = emb_aug.augment(prompt)
easy_augmented_text = easy_aug.augment(prompt)
char_augmented_text = char_aug.augment(prompt)
bt_augmented_text = bt_aug.augment(prompt)

print(f"Original: {prompt}")
print(f"WordNetAugmenter: {wn_augmented_text[0]}")
print(f"EmbeddingAugmenter: {emb_augmented_text[0]}")
print(f"EasyDataAugmenter: {easy_augmented_text[0]}")
print(f"CharSwapAugmenter: {char_augmented_text[0]}")
print(f"BackTranslationAugmenter: {bt_augmented_text[0]}")

Original: I am an example, we will explore textattack's augmentation capabilities with me.
WordNetAugmenter: I am an representative, we will explore textattack's augmentation capabilities with me.
EmbeddingAugmenter: I am an example, we will explore textattack's heighten capabilities with me.
EasyDataAugmenter: I an example, we will explore textattack's augmentation capabilities with me.
CharSwapAugmenter: I am an exampe, we will explore textattack's augmentation capabilities with me.
BackTranslationAugmenter: I'm an example, exploring the ability to increase text attacks with me.


Nous allons utiliser trois de ces techniques pour augmenter notre dataset. On fera en sorte d'utiliser aléatoirement l'une d'entre-elles à chaque fois.

In [24]:
def augmentation_pipeline(batch: List[str],
                          new_examples: int = 4) -> List[str]:
    """
    This function takes in a batch of text and returns augmented text. It uses randomly the following augmentation techniques:
    1. EmbeddingAugmenter
    2. EasyDataAugmenter
    3. BackTranslationAugmenter
    
    Args:
        batch: List of strings
        new_examples: Number of new examples to generate from each example in the batch
    Returns:
        augmented_text: List of strings
    """

    emb = EmbeddingAugmenter(transformations_per_example=new_examples)
    easy = EasyDataAugmenter(transformations_per_example=new_examples)
    # bt_aug = BackTranslationAugmenter(transformations_per_example=new_examples)

    choices = [emb, easy]

    np.random.seed(42)
    augmenters = np.random.choice(choices, len(batch))

    augmented_text = []
    for i, aug in tqdm(enumerate(augmenters), desc="Augmenting text", total=len(augmenters)):
        augmented_text.extend(aug.augment(batch[i]))
    
    return augmented_text

Testons sur un échantillon

In [25]:
df_train = pd.read_csv("../data/train.csv")

In [26]:
df_sample = df_train.sample(5, random_state=42)

In [27]:
print(f"Original:\n {df_sample['text'].values}")

Original:
 ['the article unmaking the face mars explains how the face mars was not made aliens but was actually made naturally and called mesa the face was not made aliens but does tremble human head the face was also natural landform and the face finally just mesa even though people still believe that was created aliens the face was not created aliens was actually just rock that was shaped like human face with nose eyes and mouth quote from the article that supports answer how know that the face was not created aliens and the was actually just rock that was shaped like human face with nose eyes and mouth few days later nasa unveiled the image for all see the caption noted huge rock formation which resembles human head formed shadows giving the illusion eyes nose and mouth the authors reasoned would good way engage the public and attract attention mars this quote supports answer because explains how nasa said that was just rock formation that resembled human head another way how know t