Skip to content

Article-summary entailment annotations for agreement-oriented multidoc summarization

Notifications You must be signed in to change notification settings

google-research-datasets/AgreeSum

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

AgreeSum Dataset

This repository contains the raw dataset used for the agreement-oriented multi-doc summarization (AgreeSum) task from "AgreeSum: Agreement-Oriented Multi-Document Summarization" to appear in the Findings of the ACL: ACL-IJCNLP 2021.

The dataset consists of summaries from Wikipedia Current Events Portal (WCEP), along with associated news article URLs (up to four) and whether or not the summary entails the main passage of each news article. Entailment here is defined as "Does the article contain all the information presented in the summary?" The dataset consists of 18K cluster-summary pairs, coming from 5.5K WCEP summaries. 1K cluster-summary pairs are annotated, which corresponds to ~8K article-summary entailment annotations.

This dataset may be alternatively flattened, deduplicated, and used in a (summary, article) entailment classification setting.

Dataset Construction

For more details on how the dataset is constructed, please refer to our paper.

Important Notes

  • The summaries provided in the test set are not gold summaries, as test set articles are not guaranteed to entail the summary. However, these summaries may be used as a proxy to measure summarization informativeness.
  • The dev set is sampled from the training set, which comes from WCEP summaries before August 2019, while the test set is from August 2019 to August 2020.
  • Entailment labels have four possible values: 1 (entails), 0 (not entails), -1 (unannotated), and -2 (no article at that position).

Citation

If you use or discuss this dataset in your work, please cite our paper:

@InProceedings{agreesum2021,
  title = {{AgreeSum: Agreement-Oriented Multi-Document Summarization}},
  author = {Richard Yuanzhe Pang, Adam D. Lelkes, Vinh Q. Tran and Cong Yu},
  booktitle = {Findings of the ACL: ACL-IJCNLP 2021},
  year = {2021}
}

About

Article-summary entailment annotations for agreement-oriented multidoc summarization

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published