Skip to content

Corpus of 75 parallel texts, simplified on two levels, annotated with RST

Notifications You must be signed in to change notification settings

fhewett/apa-rst

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

APA-RST

Corpus of 75 parallel texts, simplified on two levels, annotated with RST. The RST annotations can be found in the rst/ folder, separated according to the complexity level. The aligned files can be found in the alignments/ folder and the original texts in the original_texts folder.

Notes on the aligned files

The alignments/ folder contains three subfolders:

  • in or-b1 the sentences from the original texts are on the line that corresponds to the line in the B1 text
  • or-a2 same but for the A2 texts
  • b1-a2 the sentences from the B1 texts are on the line that corresponds to the line in the A2 text

If there are multiple sentences on one line, that means that multiple sentences were aligned to one sentence. If the line is empty, this means that no alignment was found.

More information and citation

More information on the files can be found in our paper. If you use any of the data please cite this paper!

Freya Hewett. APA-RST: A Text Simplification Corpus with RST Annotations. In Proceedings of the 4th Workshop on Computational Approaches to Discourse. Toronto, Canada and Online, July 2023. Association for Computational Linguistics. To appear.

About

Corpus of 75 parallel texts, simplified on two levels, annotated with RST

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published