## Example of Textual Augmenter Usage<a class="anchor" id="home"></a>:
* [Character Augmenter](#chara_aug)
    * [OCR](#ocr_aug)
    * [Keyboard](#keyboard_aug)
    * [Random](#random_aug)
* [Word Augmenter](#word_aug)
    * [Spelling](#spelling_aug)
    * [Word Embeddings](#word_embs_aug)
    * [TF-IDF](#tfidf_aug)
    * [Contextual Word Embeddings](#context_word_embs_aug)
    * [WordNet](#word_net_aug)
    * [Random Word Augmenter](#random_word_aug)
* [Sentence Augmenter](#sent_aug)
    * [Contextual Word Embeddings for Sentence](#context_word_embs_sentence_aug)

In [2]:
import os
os.environ["MODEL_DIR"] = '../model'

# Config

In [3]:
import nlpaug.augmenter.char as nac
import nlpaug.augmenter.word as naw
import nlpaug.augmenter.sentence as nas
import nlpaug.flow as nafc

from nlpaug.util import Action

In [9]:
text = 'The quick brown fox jumps over the lazy dog .'
print(text)

The quick brown fox jumps over the lazy dog .


# Character Augmenter<a class="anchor" id="chara_aug">

Augmenting data in character level. Possible scenarios include image to text and chatbot. During recognizing text from image, we need to optical character recognition (OCR) model to achieve it but OCR introduces some errors such as recognizing "o" and "0". `OCRAug` simulate these errors to perform the data augmentation. For chatbot, we still have typo even though most of application comes with word correction. Therefore, `QWERTYAug` is introduced to similar this kind of errors.

### OCR Augmenter<a class="anchor" id="ocr_aug"></a>

##### Substitute character by pre-defined OCR error

In [4]:
aug = nac.OcrAug()
augmented_text = aug.augment(text)
print("Original:")
print(text)
print("Augmented Text:")
print(augmented_text)

Original:
The quick brown fox jumps over the lazy dog
Augmented Text:
The quick 6rown fux jumps over the lazy dog


### Keyboard Augmenter<a class="anchor" id="keyboard_aug"></a>

##### Substitute character by keyboard distance

In [5]:
aug = nac.QwertyAug()
augmented_text = aug.augment(text)
print("Original:")
print(text)
print("Augmented Text:")
print(augmented_text)

Original:
The quick brown fox jumps over the lazy dog
Augmented Text:
Tbe quiSk nrown fIx jKmps ov2r tje laAy don


### Random Augmenter<a class="anchor" id="random_aug"></a>

##### Insert character randomly

In [6]:
aug = nac.RandomCharAug(action="insert")
augmented_text = aug.augment(text)
print("Original:")
print(text)
print("Augmented Text:")
print(augmented_text)

Original:
The quick brown fox jumps over the lazy dog
Augmented Text:
T3he quicNk @brown fEox juamps $over th6e la1zy d*og


##### Substitute character randomly

In [7]:
aug = nac.RandomCharAug(action="substitute")
augmented_text = aug.augment(text)
print("Original:")
print(text)
print("Augmented Text:")
print(augmented_text)

Original:
The quick brown fox jumps over the lazy dog
Augmented Text:
ThN qDick brow0 foB jumks oveE t+e laz6 dBg


##### Swap character randomly

In [4]:
aug = nac.RandomCharAug(action="swap")
augmented_text = aug.augment(text)
print("Original:")
print(text)
print("Augmented Text:")
print(augmented_text)

Original:
The quick brown fox jumps over the lazy dog
Augmented Text:
Hte quikc borwn fxo jupms ovre teh lzay dgo


##### Delete character randomly

In [8]:
aug = nac.RandomCharAug(action="delete")
augmented_text = aug.augment(text)
print("Original:")
print(text)
print("Augmented Text:")
print(augmented_text)

Original:
The quick brown fox jumps over the lazy dog
Augmented Text:
Te quic rown fx jump ver he laz og


# Word Augmenter<a class="anchor" id="word_aug"></a>

Besides character augmentation, word level is important as well. We make use of word2vec (Mikolov et al., 2013), GloVe (Pennington et al., 2014), fasttext (Joulin et al., 2016), BERT(Devlin et al., 2018) and wordnet to insert and substitute similar word. `Word2vecAug`,  `GloVeAug` and `FasttextAug` use word embeddings to find most similar group of words to replace original word. On the other hand, `BertAug` use language models to predict possible target word. `WordNetAug` use statistics way to find the similar group of words.

### Spelling Augmenter<a class="anchor" id="spelling_aug"></a>

##### Substitute word by spelling mistake words dictionary

In [4]:
aug = naw.SpellingAug(os.environ["MODEL_DIR"] + 'spelling_en.txt')
augmented_text = aug.augment(text)
print("Original:")
print(text)
print("Augmented Text:")
print(augmented_text)

Original:
The quick brown fox jumps over the lazy dog
Augmented Text:
The quick borwn fox jumps over the lazy gog


### Word Embeddings Augmenter<a class="anchor" id="word_embs_aug"></a>

##### Insert word randomly by word embeddings similarity

In [9]:
# model_type: word2vec, glove or fasttext
aug = naw.WordEmbsAug(
    model_type='word2vec', model_path=model_dir+'GoogleNews-vectors-negative300.bin',
    action="insert")
augmented_text = aug.augment(text)
print("Original:")
print(text)
print("Augmented Text:")
print(augmented_text)

Original:
The quick brown fox jumps over the lazy dog
Augmented Text:
The quick brown fox jumps Alzeari over the lazy Superintendents dog


##### Substitute word by word2vec similarity

In [10]:
# model_type: word2vec, glove or fasttext
aug = naw.WordEmbsAug(
    model_type='word2vec', model_path=model_dir+'GoogleNews-vectors-negative300.bin',
    action="substitute")
augmented_text = aug.augment(text)
print("Original:")
print(text)
print("Augmented Text:")
print(augmented_text)

Original:
The quick brown fox jumps over the lazy dog
Augmented Text:
The easy brown fox jumps around the lazy dog


### TF-IDF Augmenter<a class="anchor" id="tfidf_aug"></a>

##### Insert word by TF-IDF similarity

In [7]:
aug = naw.TfIdfAug(
    model_path=os.environ.get("MODEL_DIR"),
    action="insert")
augmented_text = aug.augment(text)
print("Original:")
print(text)
print("Augmented Text:")
print(augmented_text)

Original:
The quick brown fox jumps over the lazy dog
Augmented Text:
sinks The quick brown fox jumps over the lazy Sidney dog


##### Substitute word by TF-IDF similarity

In [8]:
aug = naw.TfIdfAug(
    model_path=os.environ.get("MODEL_DIR"),
    action="substitute")
augmented_text = aug.augment(text)
print("Original:")
print(text)
print("Augmented Text:")
print(augmented_text)

Original:
The quick brown fox jumps over the lazy dog
Augmented Text:
The quick brown fox Baked over the polygraphy dog


### Contextual Word Embeddings Augmenter<a class="anchor" id="context_word_embs_aug"></a>

##### Insert word by contextual word embeddings (BERT or XLNet)

In [15]:
# model_path: bert-base-uncased or xlnet-base-cased
aug = naw.ContextualWordEmbsAug(
    model_path='bert-base-uncased', action="insert")
augmented_text = aug.augment(text)
print("Original:")
print(text)
print("Augmented Text:")
print(augmented_text)

Original:
The quick brown fox jumps over the lazy dog
Augmented Text:
even the quick brown fox usually jumps over the lazy dog


##### Substitute word by contextual word embeddings (BERT or XLNet)

In [16]:
# model_path: bert-base-uncased or xlnet-base-cased
aug = naw.ContextualWordEmbsAug(
    model_path='bert-base-uncased', action="insert")
augmented_text = aug.augment(text)
print("Original:")
print(text)
print("Augmented Text:")
print(augmented_text)

Original:
The quick brown fox jumps over the lazy dog
Augmented Text:
little quick brown fox jumps over the lazy dog


### WordNet Augmenter<a class="anchor" id="word_net_aug"></a>

##### Substitute word by synonym

In [17]:
aug = naw.WordNetAug()
augmented_text = aug.augment(text)
print("Original:")
print(text)
print("Augmented Text:")
print(augmented_text)

Original:
The quick brown fox jumps over the lazy dog
Augmented Text:
The straightaway brown fox jumps over the faineant dog


### Random Word Augmenter<a class="anchor" id="random_word_aug"></a>

##### Swap word randomly

In [None]:
aug = nac.RandomWordAug(action="swap")
augmented_text = aug.augment(text)
print("Original:")
print(text)
print("Augmented Text:")
print(augmented_text)

##### Delete word randomly

In [18]:
aug = naw.RandomWordAug()
augmented_text = aug.augment(text)
print("Original:")
print(text)
print("Augmented Text:")
print(augmented_text)

Original:
The quick brown fox jumps over the lazy dog
Augmented Text:
The brown jumps over the lazy dog


# Sentence Augmentation

### Contextual Word Embeddings for Sentence Augmenter<a class="anchor" id="context_word_embs_sentence_aug"></a>

##### Insert sentnece by contextual word embeddings (GPT2 or XLNet)

In [10]:
# model_path: xlnet-base-cased or gpt2
aug = nas.ContextualWordEmbsForSentenceAug(model_path='xlnet-base-cased')
augmented_text = aug.augment(text)
print("Original:")
print(text)
print("Augmented Text:")
print(augmented_text)

Original:
The quick brown fox jumps over the lazy dog .
Augmented Text:
The quick brown fox jumps over the lazy dog . There are odd moments in a version that appears un - - and completely headed for its welcome.


# Flow Augmentation

To make use of multiple augmentation, `sequential` and `sometimes` pipelines are introduced to connect augmenters.

## Apply different augmenters sequentially

In [19]:
aug = naf.Sequential([
    nac.RandomCharAug(action="insert"),
    naw.RandomWordAug()
])

aug.augment(text)

'&The b0rown jum@ps ovear %the 1lazy gdog'

## Apply some augmenters randomly

In [20]:
aug = naf.Sometimes([
    nac.RandomCharAug(action="delete"),
    nac.RandomCharAug(action="insert"),
    naw.RandomWordAug()
])

aug.augment(text)

'The quick brown fox jumps over the lazy dog'