# Text Augmentations

## Method 1. Using `textattack` library to replace some words with their synonyms

`textattack` is a popular library for adversarial attacks and text augmentation.

    # install the library
    pip install textattack

**Shuffled text** is also an augmented text.

In [1]:
import random

from textattack.augmentation import WordNetAugmenter


text = [  # original text
    "a child in a pink dress is climbing up a set of stairs in an entry way .",
    "a girl going into a wooden building .",
    "a little girl climbing into a wooden playhouse .",
    "a little girl climbing the stairs to her playhouse .",
    "a little girl in a pink dress going into a wooden cabin .",
]
print("Original text:", *text, sep="\n\t", end="\n\n")

random.shuffle(text)  # shuffle strings in a list
text_shuffled = " ".join(text)  # concatenate strings in a list

# WordNetAugmenter leverages WordNet to replace some words with their synonyms
augmenter = WordNetAugmenter()  # initialize the augmenter
text_augmented1 = augmenter.augment(text_shuffled)[0]  # augment the text. Method 1

print(f"\nShuffled text:\n\t{text_shuffled}\n")
print(f"Augmented text:\n\t{text_augmented1}\n")

  import pkg_resources


Original text:
	a child in a pink dress is climbing up a set of stairs in an entry way .
	a girl going into a wooden building .
	a little girl climbing into a wooden playhouse .
	a little girl climbing the stairs to her playhouse .
	a little girl in a pink dress going into a wooden cabin .



[nltk_data] Downloading package omw-1.4 to /home/pavlenko/nltk_data...
[nltk_data]   Package omw-1.4 is already up-to-date!



Shuffled text:
	a child in a pink dress is climbing up a set of stairs in an entry way . a little girl climbing into a wooden playhouse . a girl going into a wooden building . a little girl in a pink dress going into a wooden cabin . a little girl climbing the stairs to her playhouse .

Augmented text:
	a child in a pink preen is climbing up a fit of stairs in an entry way . a little girl climbing into a wooden playhouse . a girl blend into a wooden building . a short girl in a pink dress sound into a wooden cabin . a little girl climbing the stairs to her playhouse .



## Method 2. Using back translation

**Back Translation** - translating a sentence to another language and then back to the original language, which can introduce paraphrases and variations.

Use pre-trained `MarianMT` model from `transformers` library for back-translation.

In [2]:
import random
import transformers as hf


def translate(text, lang1, lang2):
    """ Translate text from language1 to language2 """
    # Load model and tokenizer
    model_name = f"Helsinki-NLP/opus-mt-{lang1}-{lang2}"
    tokenizer = hf.MarianTokenizer.from_pretrained(model_name)
    model = hf.MarianMTModel.from_pretrained(model_name, use_safetensors=True)

    # Translate from languate-1 to language-2
    inputs = tokenizer(text, return_tensors="pt", padding=True)
    translated_tokens = model.generate(**inputs)
    translated = tokenizer.decode(translated_tokens[0], skip_special_tokens=True)

    return translated


def back_translate(text, lang1, lang2):
    """
    Performs back-translation on a given text.

    Args:
        text (str): The input text to back-translate.
        lang1 (str): The source language code (e.g., 'en').
        lang2 (str): The intermediate target language code (e.g., 'fr').

    Returns:
        str: The back-translated text.
    """
    translated = translate(text, lang1, lang2)
    back_translated = translate(translated, lang2, lang1)
    return back_translated, translated


# Randomly translate to one of these languates. If "en", then do not tranaslate
langs = {
    "en": "English",
    "fr": "French",
    "es": "Spanish",
    "de": "German",
    "ru": "Russian",
    "zh": "Chinese",
    "ar": "Arabic",
    "ja": "Japanese",
    "nl": "Dutch",
    "hi": "Hindi",
}

# Select a random value from the list
language = random.choice(list(langs.keys()))
print(f"Language: {langs[language]}")

if language == "en":  # do not translate from English
    text_augmented2, text_translated = text_shuffled, text_shuffled
else:
    text_augmented2, text_translated = back_translate(text_shuffled, "en", language)

print(f"\nTranslated to {langs[language]}:\n\t{text_translated}\n")
print(f"Back-translated text:\n\t{text_augmented2}\n")

Language: Russian


tokenizer_config.json:   0%|          | 0.00/42.0 [00:00<?, ?B/s]

source.spm:   0%|          | 0.00/803k [00:00<?, ?B/s]

target.spm:   0%|          | 0.00/1.08M [00:00<?, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

config.json: 0.00B [00:00, ?B/s]



model.safetensors:   0%|          | 0.00/307M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/293 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/42.0 [00:00<?, ?B/s]

source.spm:   0%|          | 0.00/1.08M [00:00<?, ?B/s]

target.spm:   0%|          | 0.00/803k [00:00<?, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

config.json: 0.00B [00:00, ?B/s]

model.safetensors:   0%|          | 0.00/307M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/293 [00:00<?, ?B/s]


Translated to Russian:
	Ребёнок в розовом платье поднимается по лестнице, забираясь в деревянный игровой домик, девочка заходит в деревянное здание, маленькая девочка в розовом платье забирается в деревянную хижину, маленькая девочка поднимается по лестнице к своему хижине.

Back-translated text:
	A child in a pink dress climbs up the stairs, climbing into a wooden playhouse, a girl entering a wooden building, a little girl in a pink dress climbing into a wooden hut, a little girl climbing up the stairs to her hut.

