Skip to content

Repository for the CommonLit Ease of Readability Corpus

Notifications You must be signed in to change notification settings

scrosseye/CLEAR-Corpus

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

CLEAR-Corpus

Repository for the CommonLit Ease of Readability Corpus

This repository contains the CommonLit Ease of Readability (CLEAR) corpus, which provides unique readability scores for ~5,000 text excerpts leveled for 3rd-12th grade readers along with information about the excerpt’s year of publishing, genre, and other meta-data. The CLEAR corpus is meant to provide researchers interested in discourse processing and reading with a resource from which to develop and test readability metrics and to model text readability. The CLEAR corpus includes a number of improvements in comparison to previous readability corpora including size (N = ~5,000 reading excerpts), breadth of the excerpts available, which cover over 250 years of writing in two different genres, and unique readability criterion provided for each text based on teachers’ ratings of text difficulty for student readers.

Two published papers on the corpus are below.

Crossley, S. A., Heintz, A., Choi, J., Batchelor, J., Karimi, M., & Malatinszky, A. (in press). A large-scaled corpus for assessing text readability. Behavior Research Methods.

Crossley2022_Article_ALarge-scaledCorpusForAssessin.pdf

Crossley, S. A., Heintz, A., Choi, J., Batchelor, J., & Karimi, M. (2021). The CommonLit Ease of Readability (CLEAR) Corpus. Proceedings of the 14th International Conference on Educational Data Mining (EDM). Paris, France.

EDM21_paper_35.pdf

The data is provided under a CC BY-NC-SA 4.0 DEED Attribution-NonCommercial-ShareAlike 4.0 International license (https://creativecommons.org/licenses/by-nc-sa/4.0/deed.en)

About

Repository for the CommonLit Ease of Readability Corpus

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published