Skip to content

hannahxchen/automatic-paraphrase-dataset-augmentation

main
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 

Automatic Paraphrase Dataset Augmentation

This repository includes data and code for implementing the paper Finding Friends and Flipping Frenemies: Automatic Paraphrase Dataset Augmentation Using Graph Theory.

Dependencies

You can install all the required packages by running the following command:
python -m pip install -r requirements.txt

Datasets

Quora Question Pairs
We used the train/dev splits from the GLUE benchmark, which you can download from here.

Generating Augmented QQP Dataset

python generate_qqp_datasets.py -o OUTPUT_DIR -d [original_flipped | augmented | augmented_flipped]

Bibtex

@inproceedings{chen-etal-2020-finding,
    title = "Finding {F}riends and Flipping Frenemies: Automatic Paraphrase Dataset Augmentation Using Graph Theory",
    author = "Chen, Hannah  and
      Ji, Yangfeng  and
      Evans, David",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.findings-emnlp.426",
    doi = "10.18653/v1/2020.findings-emnlp.426",
    pages = "4741--4751"
}

About

Code and data for automatic paraphrase dataset augmentation.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published