# **Text Data Augmentation Using NLPAUG**

**Install Libraries**

In [3]:
!pip install nlpaug


Collecting nlpaug
  Downloading nlpaug-1.1.11-py3-none-any.whl.metadata (14 kB)
Collecting gdown>=4.0.0 (from nlpaug)
  Downloading gdown-5.2.0-py3-none-any.whl.metadata (5.8 kB)
Downloading nlpaug-1.1.11-py3-none-any.whl (410 kB)
Downloading gdown-5.2.0-py3-none-any.whl (18 kB)
Installing collected packages: gdown, nlpaug
Successfully installed gdown-5.2.0 nlpaug-1.1.11


**Import Libraries**

In [2]:
import nlpaug.augmenter.char as nac
import nlpaug.augmenter.word as naw
import nlpaug.flow as naf


**Keyboard**

Augmenter that applies typo error simulation to textual input.

In [3]:
test_sentence = 'I went Shopping Today, and my trolly was filled with Bananas. I also had food at burgur palace'

aug = nac.KeyboardAug(name='Keyboard_Aug', aug_char_min=1, aug_char_max=10, aug_char_p=0.3, aug_word_p=0.3, 
                      aug_word_min=1, aug_word_max=10, stopwords=None, tokenizer=None, reverse_tokenizer=None, 
                      include_special_char=True, include_numeric=True, include_upper_case=True, lang='en', verbose=0, 
                      stopwords_regex=None, model_path=None, min_char=4)
 
test_sentence_aug = aug.augment(test_sentence)
print(test_sentence)
print(test_sentence_aug)


I went Shopping Today, and my trolly was filled with Bananas. I also had food at burgur palace
['I went ahopLihg Today, and my trolly was filled !(th BanSnXc. I Qls( had food at burrkr palQxe']


**Optical Character Recognition (OCR)**

- Augmenter that applies OCR error simulation to textual input.
- For example, OCR may recognize ‘I’ as ‘1’ incorrectly, or ‘0’ as ‘o’ or ‘O’.
- Pre-defined OCR mapping is leveraged to replace a character with a possible OCR error.
- Solving the out of vocabulary (OOV) problem, that is Out of vocabulary words are words that are not in the training set, but appear in the test set, real data.
- The main problem is that the model assigns a probability zero to out of vocabulary words resulting in a zero likelihood.
    - This is a common problem, especially when you have not trained on a smaller data set.
    - So, to overcome this we can use models like BERT and GPT (Generative Pre-trained Transformer models).

In [4]:
aug = nac.OcrAug(name='OCR_Aug', aug_char_min=1, aug_char_max=10, aug_char_p=0.3, aug_word_p=0.3, aug_word_min=1, 
                 aug_word_max=10, stopwords=None, tokenizer=None, reverse_tokenizer=None, verbose=0, stopwords_regex=None, 
                 min_char=1)
 
test_sentence_aug = aug.augment(test_sentence)
print(test_sentence)
print(test_sentence_aug)


I went Shopping Today, and my trolly was filled with Bananas. I also had food at burgur palace
['1 went 8h0ppin9 Today, and my trolly wa8 filled with Bananas. I also had food at burgur palace']


**Random**

Augmenter that applies random character error to textual input

In [5]:
aug = nac.RandomCharAug(
    action='substitute',
    name='RandomChar_Aug',
    aug_char_min=1,
    aug_char_max=10,
    aug_char_p=0.3,
    aug_word_p=0.3,
    aug_word_min=1,
    aug_word_max=10,
    include_upper_case=True,
    include_lower_case=True,
    include_numeric=True,
    min_char=4,
    swap_mode='adjacent',
    spec_char='!@#$%^&*()_+',
    stopwords=None,
    tokenizer=None,
    reverse_tokenizer=None,
    verbose=0,
    stopwords_regex=None,
    candidates=None  # Corrected here
)

test_sentence_aug = aug.augment(test_sentence)
print(test_sentence)
print(test_sentence_aug)


I went Shopping Today, and my trolly was filled with Bananas. I also had food at burgur palace
['I went LhopPiSg Toamy, and my tr+lLy was filled wQSh BaSRpas. I alS# had food at burgur palace']


**Word Level Augmentation**

Aside from character enhancement, word-level is also crucial. To insert and substitute equivalent words, we use word2vec, GloVe, fast text, BERT, and wordnet. Word2vecAug, GloVeAug, and FasttextAug use word embeddings to replace the original word with the most equivalent set of words.

In [6]:
aug = naw.SynonymAug(aug_src='wordnet', model_path=None, name='Synonym_Aug', aug_min=1, aug_max=10, aug_p=0.3, lang='eng', 
                     stopwords=None, tokenizer=None, reverse_tokenizer=None, stopwords_regex=None, force_reload=False, 
                     verbose=0)
 
test_sentence_aug = aug.augment(test_sentence)
print(test_sentence)
print(test_sentence_aug)


I went Shopping Today, and my trolly was filled with Bananas. I also had food at burgur palace
['I went Shopping Today, and my trolly was filled with Bananas. Atomic number 53 besides had food at burgur palace']


**Antonym**

Augmenter that applies semantic meaning based on textual input.

In [7]:
aug = naw.AntonymAug(name='Antonym_Aug', aug_min=1, aug_max=10, aug_p=0.3, lang='eng', stopwords=None, tokenizer=None, 
                     reverse_tokenizer=None, stopwords_regex=None, verbose=0)
 
test_sentence_aug = aug.augment("very beautiful")
print("very beautiful")
print(test_sentence_aug)


very beautiful
['very ugly']


**Random**

Augmenter that applies random word operation to textual input.

In [8]:
aug = naw.RandomWordAug(action='delete', name='RandomWord_Aug', aug_min=1, aug_max=10, aug_p=0.3, stopwords=None, 
                        target_words=None, tokenizer=None, reverse_tokenizer=None, stopwords_regex=None, verbose=0)
 
test_sentence_aug = aug.augment(test_sentence)
print(test_sentence)
print(test_sentence_aug)


I went Shopping Today, and my trolly was filled with Bananas. I also had food at burgur palace
['I Today, and my was filled with. I had at burgur palace']


**Spelling**

Augmenter that applies spelling error simulation to textual input.

In [9]:
aug = naw.SpellingAug(dict_path=None, name='Spelling_Aug', aug_min=1, aug_max=10, aug_p=0.3, stopwords=None, 
                      tokenizer=None, reverse_tokenizer=None, include_reverse=True, stopwords_regex=None, verbose=0)
 
test_sentence_aug = aug.augment(test_sentence)
print(test_sentence)
print(test_sentence_aug)


I went Shopping Today, and my trolly was filled with Bananas. I also had food at burgur palace
["I'd want Sopping Today, and mmy trolly was filles with Bananas. e also had food at burgur palace"]


**Split**

Augmenter that apply word splitting operation to textual input.

In [10]:
aug = naw.SplitAug(name='Split_Aug', aug_min=1, aug_max=10, aug_p=0.3, min_char=4, stopwords=None, tokenizer=None, 
                   reverse_tokenizer=None, stopwords_regex=None, verbose=0)
 
test_sentence_aug = aug.augment(test_sentence)
print(test_sentence)
print(test_sentence_aug)


I went Shopping Today, and my trolly was filled with Bananas. I also had food at burgur palace
['I went Sh opping Today, and my trolly was fi lled w ith Bananas. I al so had food at bu rgur pa lace']


**Flow Augmentation**

In this type of augmentation, we can make use of multiple augmenters at once. Sequential and sometimes pipelines are used to connect augmenters in order to make use of many augmentations. A single text can be sent through multiple augmenters to yield a wide range of data.



In [11]:
TOPK=20 #default=100
ACT = 'insert' #"substitute"
 
aug_bert = naw.ContextualWordEmbsAug(
    model_path='distilbert-base-uncased', 
    #device='cuda',
    action=ACT, top_k=TOPK)
print("Original:")
print(test_sentence)
print("Augmented Text:")
for ii in range(5):
    augmented_text = aug_bert.augment(test_sentence)
    print(augmented_text)




Original:
I went Shopping Today, and my trolly was filled with Bananas. I also had food at burgur palace
Augmented Text:
['nowadays i went shopping twice today, and inside my trolly cupboard was filled with chopped bananas. yesterday i also had food sold at burgur palace']
['luckily i went off shopping today, and my trolly closet was filled mostly with canned bananas. i had also had cooked food at burgur palace']
['thankfully i went through shopping today, and my favourite trolly was filled solely with fresh bananas. i also still had food at our burgur palace']
['i went shopping today, smiling and my trolly belly was filled with bananas. but i have also recently had fresh food at new burgur palace']
['i went about shopping today, and my trolly was constantly filled with delicious bananas. i has also had two food shops at burgur rahman palace']
