#*nlpaug character based augmentation*
* nlpgaug is a library for textual augmentation in machine learning experiments. The goal is improving deep learning model performance by generating textual data. 

In [None]:
!pip install torch>=1.6.0 transformers>=4.0.0 sentencepiece
!pip install numpy requests nlpaug

In [2]:
import nlpaug.augmenter.char as nac
import nlpaug.flow as nafc
from nlpaug.util import Action

In [3]:
text = 'I spilled coffee on my computer'

#*OCR*
* Type of augmentation that implements ocr error to given text.

What is ocr ?
 * Optical character recognition (OCR) lets you turn scanned images into text so you can turn paper-based documents into editable, searchable, digital documents. Ocr error is caused by negatively influenced by poor image quality. For example, classifying the letter O az 0 can be an ocr error.


In [12]:
aug = nac.OcrAug()
print("sentence: ",text)
print("augmented: ",aug.augment(text))

sentence:  I spilled coffee on my computer
augmented:  I spilled coffee on my cumpotek


#*Keyboard*
* Augmenter that simulate typo error by random values. For example, people may type i as o incorrectly. One keyboard distance is leveraged to replace character by possible keyboard error.

In [11]:
aug = nac.KeyboardAug()
print("sentence: ",text)
print("augmented: ",aug.augment(text))

sentence:  I spilled coffee on my computer
augmented:  I eoiller c*Dfee on my computer


#*Random insert*
* Augmentation that is based on generating character error by inserting random values

In [14]:
aug = nac.RandomCharAug(action="insert")
print("sentence: ",text)
print("augmented: ",aug.augment(text))

sentence:  I spilled coffee on my computer
augmented:  I spilled cofgfeTe on my complust#er


#*Random substitute*
* Augmentation that is based on generating character error by substituting random values

In [15]:
aug = nac.RandomCharAug(action="substitute")
print("sentence: ",text)
print("augmented: ",aug.augment(text))

sentence:  I spilled coffee on my computer
augmented:  I spilled coC)ee on my comMuze4


#*Random swap*
* Augmentation based on randomly swapping characters

In [16]:
aug = nac.RandomCharAug(action="swap")
print("sentence: ",text)
print("augmented: ",aug.augment(text))

sentence:  I spilled coffee on my computer
augmented:  I spllied coffee on my omcupter


#*Random delete*
* Augmentation based on randomly deleting characters

In [17]:
aug = nac.RandomCharAug(action="delete")
print("sentence: ",text)
print("augmented: ",aug.augment(text))

sentence:  I spilled coffee on my computer
augmented:  I spilled ofee on my ompue


#*Resources*
* https://nlpaug.readthedocs.io/en/latest/index.html
* OCR Error Correction Using Character Correction and Feature-Based Word Classification, Ido Kissos, Nachum Dershowitz
* https://www.konicaminolta.com.au/news-insight/blog/how-optical-character-recognition-works

