Skip to content

JsnDg/EMNLP23-LLM-headline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Dataset of Harnessing the Power of LLMs: Evaluating Human-AI Text Co-Creation through the Lens of News Headline Generation (EMNLP'23)

This repository contains datasets collected in the EMNLP'23 paper "Harnessing the Power of LLMs: Evaluating Human-AI Text Co-Creation through the Lens of News Headline Generation".

Documentation

The repository has two datasets: 1) news_headlines.csv, 840 headlines generated from 6 conditions (2 baseline conditions x 20 articles + 4 human conditions x 20 articles x 10 participant per condition); and 2) ranked_news_headlines.csv, 2400 ratings of those 840 headlines (20 articles x 6 conditions x 20 evaluators).

Conditions

2 baseline conditions

  • Con 0: Original - Headlines used by online media.
  • Con 1: GPT-generated - Headlines generated by GPT-3.5 (text-davinci-002).

4 human conditions

  • Con 2: Manual only - The participant writes the news headline without any AI assistance.
  • Con 3: Selection - The LLM generates three headlines for each news article (generate headlines), and the user selects the most appropriate one.
  • Con 4: Guidance + Selection - The LLM extracts several potential perspectives (keywords) from each news article (extract perspectives), the user chooses one or more perspectives to emphasize in the headline, the LLM then generates three headlines for each news article based on the selected perspectives (generate headlines w/ perspectives), and finally, the user selects the best one.
  • Con 5: Perspective space + headline space + post-editing - This is similar to Guidance + Selection, but the user can further edit the selected headline (post-editing).

Descriptions by columns

  1. news_headlines.csv
  • headline: the content of headline
  • global_index: index of the headline (from 0 to 839)
  • index: the source of headline (original/GPT3/participant ID)
  • con: the condition (from 0 to 5)
  • newsOrder: the index of the news article (from 0 to 19)
  • word_count: word count of headline
  1. ranked_news_headlines.csv
  • headline: the content of headline
  • rank: the rank of news headline for each round (from 1 - top to 6 - bottom)
  • comment: evaluator's comment on the rating
  • participant: the index of evaluator (from 1 to 20)
  • sequence: the order of news article for ranking (from 0 to 19)
  • global_index: index of the headline (from 0 to 839)
  • index: the source of headline (original/GPT3/participant ID)
  • con: the condition (from 0 to 5)
  • newsOrder: the index of the news article (from 0 to 19)
  • word_count: word count of headline

Citation

For detailed insights into the methodology and implications of this dataset's creation, please refer to our study in the EMNLP'23 paper, available here. If you intend to utilize this dataset, it is requisite to cite the following paper:

@article{ding2023harnessing,
  title={Harnessing the Power of LLMs: Evaluating Human-AI text Co-Creation through the Lens of News Headline Generation},
  author={Ding, Zijian and Smith-Renner, Alison and Zhang, Wenjuan and Tetreault, Joel R and Jaimes, Alejandro},
  journal={arXiv preprint arXiv:2310.10706},
  year={2023}
}

Contact

For any inquiries or additional information, please contact Jason Ding at ding@umd.edu

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published