Dataset of Harnessing the Power of LLMs: Evaluating Human-AI Text Co-Creation through the Lens of News Headline Generation (EMNLP'23)

This repository contains datasets collected in the EMNLP'23 paper "Harnessing the Power of LLMs: Evaluating Human-AI Text Co-Creation through the Lens of News Headline Generation".

Documentation

The repository has two datasets: 1) news_headlines.csv, 840 headlines generated from 6 conditions (2 baseline conditions x 20 articles + 4 human conditions x 20 articles x 10 participant per condition); and 2) ranked_news_headlines.csv, 2400 ratings of those 840 headlines (20 articles x 6 conditions x 20 evaluators).

Conditions

2 baseline conditions

Con 0: Original - Headlines used by online media.
Con 1: GPT-generated - Headlines generated by GPT-3.5 (text-davinci-002).

4 human conditions

Con 2: Manual only - The participant writes the news headline without any AI assistance.
Con 3: Selection - The LLM generates three headlines for each news article (generate headlines), and the user selects the most appropriate one.
Con 4: Guidance + Selection - The LLM extracts several potential perspectives (keywords) from each news article (extract perspectives), the user chooses one or more perspectives to emphasize in the headline, the LLM then generates three headlines for each news article based on the selected perspectives (generate headlines w/ perspectives), and finally, the user selects the best one.
Con 5: Perspective space + headline space + post-editing - This is similar to Guidance + Selection, but the user can further edit the selected headline (post-editing).

Descriptions by columns

news_headlines.csv

headline: the content of headline
global_index: index of the headline (from 0 to 839)
index: the source of headline (original/GPT3/participant ID)
con: the condition (from 0 to 5)
newsOrder: the index of the news article (from 0 to 19)
word_count: word count of headline

ranked_news_headlines.csv

headline: the content of headline
rank: the rank of news headline for each round (from 1 - top to 6 - bottom)
comment: evaluator's comment on the rating
participant: the index of evaluator (from 1 to 20)
sequence: the order of news article for ranking （from 0 to 19)
global_index: index of the headline (from 0 to 839)
index: the source of headline (original/GPT3/participant ID)
con: the condition (from 0 to 5)
newsOrder: the index of the news article (from 0 to 19)
word_count: word count of headline

Citation

For detailed insights into the methodology and implications of this dataset's creation, please refer to our study in the EMNLP'23 paper, available here. If you intend to utilize this dataset, it is requisite to cite the following paper:

@article{ding2023harnessing,
  title={Harnessing the Power of LLMs: Evaluating Human-AI text Co-Creation through the Lens of News Headline Generation},
  author={Ding, Zijian and Smith-Renner, Alison and Zhang, Wenjuan and Tetreault, Joel R and Jaimes, Alejandro},
  journal={arXiv preprint arXiv:2310.10706},
  year={2023}
}

Contact

For any inquiries or additional information, please contact Jason Ding at ding@umd.edu

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.DS_Store		.DS_Store
LICENSE		LICENSE
README.md		README.md
news_headlines.csv		news_headlines.csv
ranked_news_headlines.csv		ranked_news_headlines.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.DS_Store

.DS_Store

LICENSE

LICENSE

README.md

README.md

news_headlines.csv

news_headlines.csv

ranked_news_headlines.csv

ranked_news_headlines.csv

Repository files navigation

Dataset of Harnessing the Power of LLMs: Evaluating Human-AI Text Co-Creation through the Lens of News Headline Generation (EMNLP'23)

Documentation

Conditions

Descriptions by columns

Citation

Contact

About

Releases

Packages

License

JsnDg/EMNLP23-LLM-headline

Folders and files

Latest commit

History

Repository files navigation

Dataset of Harnessing the Power of LLMs: Evaluating Human-AI Text Co-Creation through the Lens of News Headline Generation (EMNLP'23)

Documentation

Conditions

Descriptions by columns

Citation

Contact

About

Resources

License

Stars

Watchers

Forks