Gold-Standard Sentence Splitting Corpus
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.


Gold-Standard Sentence Splitting Corpus

If you use the corpus, please cite the following paper:

  BLEU is Not Suitable for the Evaluation of Text Simplification
  Elior Sulem, Omri Abend and Ari Rappoport
  Proc. of EMNLP 2018


Gold-standard Sentence Splitting Corpus composed by the generations made by 4 annotators, given the complex side of the test corpus of Xu et al., 2016, following the sentence splitting guidelines. HSplit 1 and 2 correspond to Set 1 guidelines. HSplit 3 and 4 correspond to Set 2 guidelines.

Uniform tokenization and truecasing styles are obtained using the Moses toolkit (Koehn et al., 2007).


Attribution-ShareAlike 3.0 Unported license