## Example of Textual Augmenter Usage<a class="anchor" id="home"></a>:

* [Word Augmenter](#word_aug)
    * [Word Embeddings](#word_embs_aug)
    * [Contextual Word Embeddings](#context_word_embs_aug)
    * [Synonym](#synonym_aug)
    * [Antonym](#antonym_aug)
    * [Random Word](#random_word_aug)
    * [Split](#split_aug)
    * [Back Translatoin](#back_translation_aug)
* [Sentence Augmenter](#sent_aug)
    * [Contextual Word Embeddings for Sentence]

## Installing transformers and nlpaug

In [1]:
!pip install transformers

Collecting transformers
  Downloading transformers-4.10.2-py3-none-any.whl (2.8 MB)
[K     |████████████████████████████████| 2.8 MB 5.7 MB/s 
Collecting huggingface-hub>=0.0.12
  Downloading huggingface_hub-0.0.17-py3-none-any.whl (52 kB)
[K     |████████████████████████████████| 52 kB 1.4 MB/s 
[?25hCollecting pyyaml>=5.1
  Downloading PyYAML-5.4.1-cp37-cp37m-manylinux1_x86_64.whl (636 kB)
[K     |████████████████████████████████| 636 kB 70.2 MB/s 
Collecting tokenizers<0.11,>=0.10.1
  Downloading tokenizers-0.10.3-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (3.3 MB)
[K     |████████████████████████████████| 3.3 MB 43.5 MB/s 
[?25hCollecting sacremoses
  Downloading sacremoses-0.0.45-py3-none-any.whl (895 kB)
[K     |████████████████████████████████| 895 kB 71.1 MB/s 
Installing collected packages: tokenizers, sacremoses, pyyaml, huggingface-hub, transformers
  Attempting uninstall: pyyaml
    Found existing installation: Py

In [2]:
!pip install nlpaug

Collecting nlpaug
  Downloading nlpaug-1.1.7-py3-none-any.whl (405 kB)
[?25l[K     |▉                               | 10 kB 18.4 MB/s eta 0:00:01[K     |█▋                              | 20 kB 23.7 MB/s eta 0:00:01[K     |██▍                             | 30 kB 13.0 MB/s eta 0:00:01[K     |███▎                            | 40 kB 9.6 MB/s eta 0:00:01[K     |████                            | 51 kB 5.2 MB/s eta 0:00:01[K     |████▉                           | 61 kB 5.6 MB/s eta 0:00:01[K     |█████▋                          | 71 kB 6.0 MB/s eta 0:00:01[K     |██████▌                         | 81 kB 6.7 MB/s eta 0:00:01[K     |███████▎                        | 92 kB 6.8 MB/s eta 0:00:01[K     |████████                        | 102 kB 5.3 MB/s eta 0:00:01[K     |█████████                       | 112 kB 5.3 MB/s eta 0:00:01[K     |█████████▊                      | 122 kB 5.3 MB/s eta 0:00:01[K     |██████████▌                     | 133 kB 5.3 MB/s eta 0:00:01[K  

# Config

In [3]:
import nlpaug.augmenter.char as nac
import nlpaug.augmenter.word as naw
import nlpaug.augmenter.sentence as nas
import nlpaug.flow as nafc

from nlpaug.util import Action

In [4]:
from nlpaug.util import Action
import os
!git clone https://github.com/makcedward/nlpaug.git
os.environ["MODEL_DIR"] = 'nlpaug/model/'

Cloning into 'nlpaug'...
remote: Enumerating objects: 5194, done.[K
remote: Counting objects: 100% (721/721), done.[K
remote: Compressing objects: 100% (463/463), done.[K
remote: Total 5194 (delta 508), reused 425 (delta 253), pack-reused 4473[K
Receiving objects: 100% (5194/5194), 3.19 MiB | 15.32 MiB/s, done.
Resolving deltas: 100% (3670/3670), done.


In [5]:
text = 'Visited this cafe just for snacks  Nice place good service the bluberry cold coffee was too good  Had basket of fry It was tasty and filling'
print(text)

Visited this cafe just for snacks  Nice place good service the bluberry cold coffee was too good  Had basket of fry It was tasty and filling


# Word Augmenter<a class="anchor" id="word_aug"></a>

Besides character augmentation, word level is important as well. We make use of word2vec (Mikolov et al., 2013), GloVe (Pennington et al., 2014), fasttext (Joulin et al., 2016), BERT(Devlin et al., 2018) and wordnet to insert and substitute similar word. `Word2vecAug`,  `GloVeAug` and `FasttextAug` use word embeddings to find most similar group of words to replace original word. On the other hand, `BertAug` use language models to predict possible target word. `WordNetAug` use statistics way to find the similar group of words.

### Word Embeddings Augmenter<a class="anchor" id="word_embs_aug"></a>

In [6]:
!wget -c "https://nlp.stanford.edu/data/glove.6B.zip"

--2021-09-18 09:07:15--  https://nlp.stanford.edu/data/glove.6B.zip
Resolving nlp.stanford.edu (nlp.stanford.edu)... 171.64.67.140
Connecting to nlp.stanford.edu (nlp.stanford.edu)|171.64.67.140|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: http://downloads.cs.stanford.edu/nlp/data/glove.6B.zip [following]
--2021-09-18 09:07:15--  http://downloads.cs.stanford.edu/nlp/data/glove.6B.zip
Resolving downloads.cs.stanford.edu (downloads.cs.stanford.edu)... 171.64.64.22
Connecting to downloads.cs.stanford.edu (downloads.cs.stanford.edu)|171.64.64.22|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 862182613 (822M) [application/zip]
Saving to: ‘glove.6B.zip’


2021-09-18 09:09:56 (5.12 MB/s) - ‘glove.6B.zip’ saved [862182613/862182613]



In [7]:
!unzip glove.6B.zip -d model/ 


Archive:  glove.6B.zip
  inflating: model/glove.6B.50d.txt  
  inflating: model/glove.6B.100d.txt  
  inflating: model/glove.6B.200d.txt  
  inflating: model/glove.6B.300d.txt  


In [8]:
!ls -al

total 842004
drwxr-xr-x  1 root root      4096 Sep 18 09:09 .
drwxr-xr-x  1 root root      4096 Sep 18 08:54 ..
drwxr-xr-x  4 root root      4096 Sep 16 13:39 .config
-rw-r--r--  1 root root 862182613 Oct 25  2015 glove.6B.zip
drwxr-xr-x  2 root root      4096 Sep 18 09:10 model
drwxr-xr-x 10 root root      4096 Sep 18 09:07 nlpaug
drwxr-xr-x  1 root root      4096 Sep 16 13:40 sample_data


##### Substitute word by word2vec similarity

In [9]:
# model_type: word2vec, glove or fasttext
aug = naw.WordEmbsAug(
    model_type='glove', model_path='model/glove.6B.50d.txt',
    action="substitute")
augmented_text = aug.augment(text)
print("Original:")
print(text)
print("Augmented Text:")
print(augmented_text)

Original:
Visited this cafe just for snacks  Nice place good service the bluberry cold coffee was too good  Had basket of fry It was tasty and filling
Augmented Text:
Visited this cafe just up fruit Nice place good travel the bluberry cold coffee was keep do Had hat of fry It was tasty and extra


### Contextual Word Embeddings Augmenter<a class="anchor" id="context_word_embs_aug"></a>

##### Substitute word by contextual word embeddings (BERT, DistilBERT, RoBERTA or XLNet)

In [10]:
aug = naw.ContextualWordEmbsAug(
    model_path='distilbert-base-uncased', action="substitute")
augmented_text = aug.augment(text)
print("Original:")
print(text)
print("Augmented Text:")
print(augmented_text)

Downloading:   0%|          | 0.00/483 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/268M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/466k [00:00<?, ?B/s]

Original:
Visited this cafe just for snacks  Nice place good service the bluberry cold coffee was too good  Had basket of fry It was tasty and filling
Augmented Text:
visited this cafe enjoying took enjoyment taking place good service the burger cold that was too good had pounds of fry it said tasty and filling


### Synonym Augmenter<a class="anchor" id="synonym_aug"></a>

In [14]:
import nltk
nltk.download('wordnet')
nltk.download('averaged_perceptron_tagger')

[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.


True

##### Substitute word by WordNet's synonym

In [15]:
aug = naw.SynonymAug(aug_src='wordnet')
augmented_text = aug.augment(text)
print("Original:")
print(text)
print("Augmented Text:")
print(augmented_text)

Original:
Visited this cafe just for snacks  Nice place good service the bluberry cold coffee was too good  Had basket of fry It was tasty and filling
Augmented Text:
Visited this cafe just for snacks Nice home good table service the bluberry cold coffee was too unspoilt Had basketful of fry It was tasty and filling


### Antonym Augmenter<a class="anchor" id="antonym_aug"></a>

##### Substitute word by antonym

In [16]:
aug = naw.AntonymAug()
augmented_text = aug.augment(text)
print("Original:")
print(text)
print("Augmented Text:")
print(augmented_text)

Original:
Visited this cafe just for snacks  Nice place good service the bluberry cold coffee was too good  Had basket of fry It was tasty and filling
Augmented Text:
Visited this cafe just for snacks Nice place bad service the bluberry cold coffee differ too bad Had basket of fry It differ tasteless and empty


### Random Word Augmenter<a class="anchor" id="random_word_aug"></a>

##### Swap word randomly

In [17]:
aug = naw.RandomWordAug(action="swap")
augmented_text = aug.augment(text)
print("Original:")
print(text)
print("Augmented Text:")
print(augmented_text)

Original:
Visited this cafe just for snacks  Nice place good service the bluberry cold coffee was too good  Had basket of fry It was tasty and filling
Augmented Text:
Visited this cafe for just snacks place good Nice service the bluberry coffee cold was too Had good basket of It fry tasty was and filling


##### Delete word randomly

In [18]:
aug = naw.RandomWordAug()
augmented_text = aug.augment(text)
print("Original:")
print(text)
print("Augmented Text:")
print(augmented_text)

Original:
Visited this cafe just for snacks  Nice place good service the bluberry cold coffee was too good  Had basket of fry It was tasty and filling
Augmented Text:
This cafe just for Nice place good bluberry coffee was good Had basket of fry It was tasty and


##### Delete a set of contunous word will be removed randomly

In [19]:
aug = naw.RandomWordAug(action='crop')
augmented_text = aug.augment(text)
print("Original:")
print(text)
print("Augmented Text:")
print(augmented_text)

Original:
Visited this cafe just for snacks  Nice place good service the bluberry cold coffee was too good  Had basket of fry It was tasty and filling
Augmented Text:
Visited this cafe just for snacks Nice place good service the bluberry of fry It was tasty and filling


### Split Augmenter<a class="anchor" id="split_aug"></a>

### Back Translation Augmenter<a class="anchor" id="back_translation_aug"></a>

In [20]:
import nlpaug.augmenter.word as naw

#text = 'The quick brown fox jumped over the lazy dog'
back_translation_aug = naw.BackTranslationAug(
    from_model_name='facebook/wmt19-en-de', 
    to_model_name='facebook/wmt19-de-en'
)
augmented_text = back_translation_aug.augment(text)

print("Original:")
print(text)
print("Augmented Text:")
print(augmented_text)

Downloading:   0%|          | 0.00/825 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.08G [00:00<?, ?B/s]

KeyboardInterrupt: ignored