Corpus of 75 parallel texts, simplified on two levels, annotated with RST. The RST annotations can be found in the rst/
folder, separated according to the complexity level. The aligned files can be found in the alignments/
folder and the original texts in the original_texts
folder.
The alignments/
folder contains three subfolders:
- in
or-b1
the sentences from the original texts are on the line that corresponds to the line in the B1 text or-a2
same but for the A2 textsb1-a2
the sentences from the B1 texts are on the line that corresponds to the line in the A2 text
If there are multiple sentences on one line, that means that multiple sentences were aligned to one sentence. If the line is empty, this means that no alignment was found.
More information on the files can be found in our paper. If you use any of the data please cite this paper!
Freya Hewett. APA-RST: A Text Simplification Corpus with RST Annotations. In Proceedings of the 4th Workshop on Computational Approaches to Discourse. Toronto, Canada and Online, July 2023. Association for Computational Linguistics. To appear.