Skip to content

Latest commit

 

History

History
32 lines (27 loc) · 1.11 KB

README.md

File metadata and controls

32 lines (27 loc) · 1.11 KB

ParaQuality Dataset

A dataset for detecting paraphrasing errors The dataset contains 6000 paraphrases for 40 canonical sentences. Each paraphrase is annotated with one of the following labels:

  • Correct
  • Misspelling
  • Linguistic Error
  • Cheating
  • Answering
  • Semantic Error
  • Translation

Possible Usecases

  • Automatic Paraphrasing Issue Detection
  • Cheating Detection in Crowdsourced Paraphrases
  • Paraphrase Generation
  • Semantic Textual Similarity
  • Sentence Embedding
  • Gramatical Error Detection/Correction

More information

You can contact me via m.yaghubzade (at) gmail.com

For more information please refer to our papar. Please also cite the following paper if you are using the dataset in your research:

@inproceedings{mohammadali2019,
  title={A Study of Incorrect Paraphrases in Crowdsourced User Utterances},
  author={Yaghoub-Zadeh-Fard, Mohammad-Ali and Benatallah, Boualem and Chai Barush, Moshe and Zamanirad, Shayan},
  booktitle={Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human     Language Technologies},
  year={2019}