## Data Augmentation using NLPaug

This notebook demostrate the usage of a character augmenter, word augmenter. There are other types such as augmentation for sentences, audio, spectrogram inputs etc.

In [None]:
#Installing the nlpaug package
!pip install nlpaug==0.0.14

In [None]:
#this will be the base text which we will be using throughout this notebook
text="The quick brown fox jumps over the lazy dog ."

In [None]:
import nlpaug.augmenter.char as nac
import nlpaug.augmenter.word as naw
import nlpaug.augmenter.sentence as nas
import nlpaug.flow as nafc

from nlpaug.util import Action
import os
!git clone https://github.com/makcedward/nlpaug.git
os.environ["MODEL_DIR"] = 'nlpaug/model/'

### Augmentation at the Character Level


1.   OCR Augmenter: To read textual data from on image, we need an OCR(optical character recognition) model. Once the text is extracted from the image, there may be errors like; '0' instead of an 'o', '2' instead of 'z' and other such similar errors.  
2.   Keyboard Augmenter: While typing/texting typos are fairly common this augmenter simulates the errors by substituting characters in words with ones at a similar distance on a keyboard.



In [None]:
#OCR augmenter
#import nlpaug.augmenter.char as nac

aug = nac.OcrAug()  
augmented_texts = aug.augment(text, n=3)
 #specifying n=3 gives us only 3 augmented versions of the sentence.

print("Original:")
print(text)

print("Augmented Texts:")
print(augmented_texts)

Original:
The quick brown fox jumps over the lazy dog .
Augmented Texts:
['The qoick bkown fox jumps over the lazy dog .', 'The quick brown fox jumps over the la2y dog .', 'The qoick brown fox jumps over the lazy do9 .']


In [None]:
#Keyboard Augmenter
#import nlpaug.augmenter.word as naw
aug = nac.KeyboardAug()
augmented_text = aug.augment(text, n=3) 
#specifying n=3 gives us only 3 augmented versions of the sentence.

print("Original:")
print(text)

print("Augmented Text:")
print(augmented_text)

Original:
The quick brown fox jumps over the lazy dog .
Augmented Text:
['The quick brown fox jumps over the laxy dog .', 'The quiXk brown fox jumps over the ?azy dog .', 'The quick brown fox jumps oger the lazy dog .']


### Augmentation at the Word Level

Augmentation is important at the word level as well , here we use word2vec to insert or substitute a similar word.

**Spelling** **augmentor**


In [None]:
#Downloading the required txt file
!wget https://github.com/makcedward/nlpaug/blob/master/model/spelling_en.txt

In [None]:
#Substitute word by spelling mistake words dictionary
aug = naw.SpellingAug('spelling_en.txt')
augmented_texts = aug.augment(text)
print("Original:")
print(text)
print("Augmented Texts:")
print(augmented_texts)

Original:
The quick brown fox jumps over the lazy dog .
Augmented Texts:
quick brown fox jumps lazy dog .


**Word embeddings augmentor**

In [None]:
#Downloading the reqired model
!wget -c "https://s3.amazonaws.com/dl4j-distribution/GoogleNews-vectors-negative300.bin.gz"

In [None]:
!gunzip GoogleNews-vectors-negative300.bin.gz

In [None]:
!ls

Insert word randomly by word embeddings similarity

In [None]:
# model_type: word2vec, glove or fasttext
aug = naw.WordEmbsAug(
    model_type='word2vec', model_path='GoogleNews-vectors-negative300.bin',
    action="insert")
augmented_text = aug.augment(text)
print("Original:")
print(text)
print("Augmented Text:")
print(augmented_text)

Original:
The quick brown fox jumps over the lazy dog .
Augmented Text:
The dc quick La brown Ulf fox jumps over the lazy dog .


Substitute word by word2vec similarity


In [None]:
aug = naw.WordEmbsAug(
    model_type='word2vec', model_path='GoogleNews-vectors-negative300.bin',
    action="substitute")
augmented_text = aug.augment(text)
print("Original:")
print(text)
print("Augmented Text:")
print(augmented_text)

Original:
The quick brown fox jumps over the lazy dog .
Augmented Text:
The hasty brown foxes jumps Within the lazy dog .


There are many more features which nlpaug offers you can visit the github repo and documentation for further details