A Dataset of Word-Complexity Judgements from Deaf and Hard-of-Hearing Adults for Text Simplification
This dataset contains word-complexity judgements from Deaf and Hard-of-Hearing adults for a lexicon of 15,000 words.
For more details about the dataset, take a look at our paper: link coming soon.
This repository contains two files for the dataset:
- General Lexicon DHH Annotations: each line contains a word in the lexicon, its individual ratings from 11 DHH annotators, and the average ratings among the 11 DHH annotators. Each rating is based on a scale from 1 to 6. -1 indicates that the annotator did not rate the word.
- General Lexicon Linguistic Characteristics: each line contains the value for the linguistic features computed for each word, as explained in the paper.
Note that the files are in TSV format (separated by tabs).
If you use this dataset, please cite our paper:
@InProceedings{Alonzo-TSAR-2022,
author = "Alonzo, Oliver and Lee, Sooyoen and Maddela, Mounica and Xu, Wei and Huenerfauth, Matt",
title = "A Dataset of Word-Complexity Judgements from Deaf and Hard-of-Hearing Adults for Text Simplification",
booktitle = "Workshop on Text Simplification, Accessibility, and Readability (TSAR-2022) at EMNLP 2022",
year = "2022",
}