Skip to content

Datasets created in the paper "Improving Truthfulness of Headline Generation"

Notifications You must be signed in to change notification settings

nlp-titech/headline-entailment

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

headline-entailment

Datasets created in the paper Improving Truthfulness of Headline Generation.

Gigaword Entailment Dataset

We put datasets of annotation results we conducted against a part of Gigaword dataset in gigaword directory.

giga_entail_annotation.tsv includes 1,000 records of the annotatiton and id of a gigaword article.

giga_entail_annotation_filtered.tsv is a subset of giga_entail_annotation.tsv. This subset is created by the same filtering procedure as Rush et al., 2015. We used this version to report the entailment ratio of Gigaword dataset in Section 3.2 of our paper.

The meaning of each column is:

Header Description
id The id of articles in the original English Gigaword dataset (Graff and Cieri, 2003; Napoles et al., 2012)
lead1_worker{1-3} The result of worker {1-3} determining whether the first sentence of the article entails its headline. 1 is entailment, 2 is non-entailment, and 3 is incomprehensible.
full_worker{1-3} Same as lead1_worker but full article is used instead of lead-1.
lead1_result Majority vote among the results of the annotation. If every worker has different annotations, the result is 0.
full_result Same as lead1_result

About

Datasets created in the paper "Improving Truthfulness of Headline Generation"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages