text-augmentation

The codes have been adapted and slightly modified from this repositiory.

Augmentation Techniques

Synonym replacement
Random deletion
Random word swap
Random word insertion

Installation

$ git clone git@github.com:sksoumik/text-augmentation.git
$ cd text-augmentation
$ pip install -U nltk
$ python
>>> import nltk; nltk.download('wordnet')

Directory Structure of The Project

.
├── data
└── src

Place the data in the data/ directory in the following formet in a .txt file:

1   neil burger here succeeded in making the mystery of four decades back the springboard for a more immediate mystery in the present 
0   it is a visual rorschach test and i must have failed 
0   the only way to tolerate this insipid brutally clueless film might be with a large dose of painkillers

Run the project

Let's say,

Your input data file name is input_data.txt
You want to save the augmented data in a file called output.txt,
You want 15 augmented data for each line/text sentence.
You can also specify the alpha parameter, which approximately means the percent of words in the sentence that will be changed
(default is 0.1 or 10%)
Then, run the following command:

$ python src/data_augmentation.py --input=data/input_data.txt --output=data/output.txt --num_aug=15 --alpha=0.05

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
src		src
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

text-augmentation

Augmentation Techniques

Installation

Directory Structure of The Project

Run the project

About

Releases

Packages

Languages

sksoumik/text-augmentation

Folders and files

Latest commit

History

Repository files navigation

text-augmentation

Augmentation Techniques

Installation

Directory Structure of The Project

Run the project

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages