Gold-Standard Sentence Splitting Corpus
If you use the corpus, please cite the following paper:
BLEU is Not Suitable for the Evaluation of Text Simplification Elior Sulem, Omri Abend and Ari Rappoport Proc. of EMNLP 2018
Gold-standard Sentence Splitting Corpus composed by the generations made by 4 annotators, given the complex side of the test corpus of Xu et al., 2016, following the sentence splitting guidelines. HSplit 1 and 2 correspond to Set 1 guidelines. HSplit 3 and 4 correspond to Set 2 guidelines.
Uniform tokenization and truecasing styles are obtained using the Moses toolkit (Koehn et al., 2007).