ParaQuality Dataset

A dataset for detecting paraphrasing errors The dataset contains 6000 paraphrases for 40 canonical sentences. Each paraphrase is annotated with one of the following labels:

Correct
Misspelling
Linguistic Error
Cheating
Answering
Semantic Error
Translation

Possible Usecases

Automatic Paraphrasing Issue Detection
Cheating Detection in Crowdsourced Paraphrases
Paraphrase Generation
Semantic Textual Similarity
Sentence Embedding
Gramatical Error Detection/Correction

More information

You can contact me via m.yaghubzade (at) gmail.com

For more information please refer to our papar. Please also cite the following paper if you are using the dataset in your research:

@inproceedings{mohammadali2019,
  title={A Study of Incorrect Paraphrases in Crowdsourced User Utterances},
  author={Yaghoub-Zadeh-Fard, Mohammad-Ali and Benatallah, Boualem and Chai Barush, Moshe and Zamanirad, Shayan},
  booktitle={Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human     Language Technologies},
  year={2019}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

ParaQuality Dataset

Possible Usecases

More information

Files

README.md

Latest commit

History

README.md

File metadata and controls

ParaQuality Dataset

Possible Usecases

More information