# TextAugment
TextAugment is a Python 3 library for augmenting text for natural language processing applications. TextAugment stands on the giant shoulders of NLTK, Gensim, and TextBlob and plays nicely with them.

In [None]:
! pip install numpy nltk gensim textblob googletrans

! pip install textaugment

In [None]:
import nltk
nltk.download('wordnet')
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')


### Simple text augment

In [2]:
from textaugment import Wordnet

t = Wordnet()

TEST_TEXT = "The quick brown fox jumps over the lazy dog."
t.augment(TEST_TEXT)

'the quick brown fox jump over the lazy dog.'

In [8]:
from textaugment import Wordnet
v = True # enable verbs augmentation. By default is True.
n = False # enable nouns augmentation. By default is False.
runs = 1 # number of times to augment a sentence. By default is 1.
p = 0.5 # The probability of success of an individual trial. (0.1<p<1.0), default is 0.5. Used by Geometric distribution to selects words from a sentence.

t_with_args = Wordnet(v=False ,n=True, p=0.9)
t_with_args.augment(TEST_TEXT)

'the quick brownness fox jumps over the lazy dog.'

### EDA: Easy data augmentation techniques
1. Synonym Replacement
2. Random Deletion
3. Random Swap
4. Random Insertion

### Synonym Replacement
Randomly choose n words from the sentence that are not stop words. Replace each of these words with one of its synonyms chosen at random.

In [4]:
from textaugment import EDA

eda_augment = EDA()

TEST_TEXT = "The quick brown fox jumps over the lazy dog."
syn_augmented = eda_augment.synonym_replacement(TEST_TEXT)
print("ORIGINAL Text :",TEST_TEXT)
print("AUGMENTED TEXT:", syn_augmented)

ORIGINAL Text : The quick brown fox jumps over the lazy dog.
AUGMENTED TEXT: The ready brown fox jumps over the lazy dog.


### Random Deletion
Randomly remove each word in the sentence with probability p.

In [5]:
from textaugment import EDA

eda_augment = EDA()

TEST_TEXT = "The quick brown fox jumps over the lazy dog."
random_del_augmented = eda_augment.random_deletion(TEST_TEXT, p=0.2)
print("ORIGINAL Text :",TEST_TEXT)
print("AUGMENTED TEXT:", random_del_augmented)

ORIGINAL Text : The quick brown fox jumps over the lazy dog.
AUGMENTED TEXT: quick brown fox jumps over the lazy


### Random Swap
Randomly choose two words in the sentence and swap their positions. Do this n times.

In [6]:
from textaugment import EDA

eda_augment = EDA()

TEST_TEXT = "The quick brown fox jumps over the lazy dog."
random_swap_augmented = eda_augment.random_swap(TEST_TEXT)
print("ORIGINAL Text :",TEST_TEXT)
print("AUGMENTED TEXT:", random_swap_augmented)

ORIGINAL Text : The quick brown fox jumps over the lazy dog.
AUGMENTED TEXT: The brown quick fox jumps over the lazy dog.


### Random Insertion
Find a random synonym of a random word in the sentence that is not a stop word. Insert that synonym into a random position in the sentence. Do this n times

In [7]:
from textaugment import EDA

eda_augment = EDA()

TEST_TEXT = "The quick brown fox jumps over the lazy dog."
rnd_insert_augmented = eda_augment.random_insertion(TEST_TEXT)
print("ORIGINAL Text :",TEST_TEXT)
print("AUGMENTED TEXT:", rnd_insert_augmented)

ORIGINAL Text : The quick brown fox jumps over the lazy dog.
AUGMENTED TEXT: The agile quick brown fox jumps over the lazy dog.


#### Cite the paper
```
@article{marivate2019improving,
  title={Improving short text classification through global augmentation methods},
  author={Marivate, Vukosi and Sefara, Tshephisho},
  journal={arXiv preprint arXiv:1907.03752},
  year={2019}
}
```