Skip to content

mysilver/ParaQuality

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ParaQuality Dataset

A dataset for detecting paraphrasing errors The dataset contains 6000 paraphrases for 40 canonical sentences. Each paraphrase is annotated with one of the following labels:

  • Correct
  • Misspelling
  • Linguistic Error
  • Cheating
  • Answering
  • Semantic Error
  • Translation

Possible Usecases

  • Automatic Paraphrasing Issue Detection
  • Cheating Detection in Crowdsourced Paraphrases
  • Paraphrase Generation
  • Semantic Textual Similarity
  • Sentence Embedding
  • Gramatical Error Detection/Correction

More information

You can contact me via m.yaghubzade (at) gmail.com

For more information please refer to our papar. Please also cite the following paper if you are using the dataset in your research:

@inproceedings{mohammadali2019,
  title={A Study of Incorrect Paraphrases in Crowdsourced User Utterances},
  author={Yaghoub-Zadeh-Fard, Mohammad-Ali and Benatallah, Boualem and Chai Barush, Moshe and Zamanirad, Shayan},
  booktitle={Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human     Language Technologies},
  year={2019}

About

A dataset for detecting paraphrasing errors

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages