headline-entailment

Datasets created in the paper Improving Truthfulness of Headline Generation.

Gigaword Entailment Dataset

We put datasets of annotation results we conducted against a part of Gigaword dataset in gigaword directory.

giga_entail_annotation.tsv includes 1,000 records of the annotatiton and id of a gigaword article.

giga_entail_annotation_filtered.tsv is a subset of giga_entail_annotation.tsv. This subset is created by the same filtering procedure as Rush et al., 2015. We used this version to report the entailment ratio of Gigaword dataset in Section 3.2 of our paper.

The meaning of each column is:

Header	Description
id	The id of articles in the original English Gigaword dataset (Graff and Cieri, 2003; Napoles et al., 2012)
lead1_worker{1-3}	The result of worker {1-3} determining whether the first sentence of the article entails its headline. 1 is entailment, 2 is non-entailment, and 3 is incomprehensible.
full_worker{1-3}	Same as `lead1_worker` but full article is used instead of lead-1.
lead1_result	Majority vote among the results of the annotation. If every worker has different annotations, the result is 0.
full_result	Same as `lead1_result`

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
gigaword		gigaword
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

headline-entailment

Gigaword Entailment Dataset

About

Releases

Packages

nlp-titech/headline-entailment

Folders and files

Latest commit

History

Repository files navigation

headline-entailment

Gigaword Entailment Dataset

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages