Skip to content

oliveralonzo/DHH-lexical-dataset

Repository files navigation

A Dataset of Word-Complexity Judgements from Deaf and Hard-of-Hearing Adults for Text Simplification

This dataset contains word-complexity judgements from Deaf and Hard-of-Hearing adults for a lexicon of 15,000 words.

For more details about the dataset, take a look at our paper: link coming soon.

The Dataset

This repository contains two files for the dataset:

  1. General Lexicon DHH Annotations: each line contains a word in the lexicon, its individual ratings from 11 DHH annotators, and the average ratings among the 11 DHH annotators. Each rating is based on a scale from 1 to 6. -1 indicates that the annotator did not rate the word.
  2. General Lexicon Linguistic Characteristics: each line contains the value for the linguistic features computed for each word, as explained in the paper.

Note that the files are in TSV format (separated by tabs).

Citation

If you use this dataset, please cite our paper:

@InProceedings{Alonzo-TSAR-2022,
  author = 	"Alonzo, Oliver and Lee, Sooyoen and Maddela, Mounica and Xu, Wei and Huenerfauth, Matt",
  title = 	"A Dataset of Word-Complexity Judgements from Deaf and Hard-of-Hearing Adults for Text Simplification",
  booktitle = 	"Workshop on Text Simplification, Accessibility, and Readability (TSAR-2022) at EMNLP 2022",
  year = 	"2022",
}

About

A Dataset of Word-Complexity Judgements from Deaf and Hard-of-Hearing Adults for Text Simplification

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published