Skip to content

panthap2/updated-headline-generation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 

Repository files navigation

Updated Headline Generation: Creating Updated Summaries for Evolving News Stories

Dataset for our ACL-2022 paper "Updated Headline Generation: Creating Updated Summaries for Evolving News Stories"

Data can be found here: https://zenodo.org/record/6578378

We release ids and metadata for selected examples from the NewsEdits corpus for the task of updated headline generation. The train/valid/test splits of the Headline Revision for Evolving News (HREN) dataset are available under hren. Each example is represented as a JSON object with the following structure:

{
  id: str,
  meta_info: {
    has_headline_change: bool,
    has_nontrivial_headline_change: bool,
    has_body_change: bool,
    has_nontrivial_body_change: bool
  },
  old_headline_version_url: str,
  new_headline_version_url: str,
  old_body_version_url: str,
  new_body_version_url: str
}

Note that the id field is formatted as:

[DB SOURCE]_[ARTICLE ID]:[OLD HEADLINE VERSION INDEX]-[NEW HEADLINE VERSION INDEX]-[OLD BODY VERSION INDEX]-[NEW BODY VERSION INDEX]:[PAIR INDEX]

To obtain the full headline and body texts, you will need to map the ids back to the NewsEdits corpus using the following components: DB SOURCE, ARTICLE ID, OLD HEADLINE VERSION INDEX, NEW HEADLINE VERSION INDEX, OLD BODY VERSION INDEX, and NEW BODY VERSION INDEX.

The authors of NewsEdits have released their corpus here. We have provided the map_to_newsedits.py script to help map the HREN ids back to the original NewsEdits corpus.

We also include IDs and metadata for examples we used for training and evaluating a classifier to filter HREN, as well as in the unfiltered updated headline generation data. These can be found in the supplementary_data folder.

Additionally, we include annotations from human evaluation under annotations. Each example is structured as follows:

{
  example_id: str,
  annotator_id: int,
  prediction: str,
  predicted_by_model: str,
  factual: int,
  important_changes: int,
  retains_information: int,
  grammatical: int,
  concise: int
}

If you find this work useful, please cite our paper:

@inproceedings{PanthaplackelETAL22UpdatedHeadlineGeneration,
  author = {Panthaplackel, Sheena and Benton, Adrian and Dredze, Mark},
  title = {Updated Headline Generation: Creating Updated Summaries for Evolving News Stories},
  booktitle = {Association for Computational Linguistics},
  pages = {6438--6461},
  year = {2022},
}

Enjoy!

Image from Wikimedia Commons licensed under (CC BY-SA 2.0)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages