Skip to content

A Dataset for Tuning and Evaluation of Sentence Simplification Models with Multiple Rewriting Transformations

License

Notifications You must be signed in to change notification settings

murilo/asset

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ASSET Corpus

ASSET is a dataset for evaluating Sentence Simplification systems with multiple rewriting transformations, as described in "ASSET: A Dataset for Tuning and Evaluation of Sentence Simplification Models with Multiple Rewriting Transformations".

The corpus is composed of 2000 validation and 359 test original sentences dataset/asset.{valid,test}.orig that were each simplified 10 times by different annotators dataset/asset.{valid.test}.simp.{0,1,2,3,4,5,6,7,8,9} (one sample per line).

We recommend that you evaluate your Text Simplification (TS) system using this dataset and traditional TS metrics with the EASSE package.

Authors

If you have any question, please contact the authors:
Fernando Alva-Manchego (f.alva@sheffield.ac.uk)
Louis Martin (louismartincs@gmail.com)

Citation

If you use our work, please cite:

@inproceedings{alvamanchego2020asset,
  title={ASSET: A Dataset for Tuning and Evaluation of Sentence Simplification Models with Multiple Rewriting Transformations},
  author={Alva-Manchego, Fernando and Martin, Louis and Bordes, Antoine and Scarton, Carolina and Sagot, Benoît and Specia, Lucia},
  booktitle={Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics},
  year={2020}
}

License

See the LICENSE file for more details.

About

A Dataset for Tuning and Evaluation of Sentence Simplification Models with Multiple Rewriting Transformations

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Roff 96.1%
  • HTML 3.9%