The language people use when they interact with each other changes over the course of the conversation, as people dynamically adapt to each other.
Will we see a systematic language change along the interaction of human users with a text-to-image model too?
Generating images with a Text-to-Image model often requires multiple trials, where human users iteratively update their prompt based on feedback, namely the output image. Taking inspiration from cognitive work on reference games and dialogue alignment, we analyze the dynamics of the user prompts along such iterations. We compile a dataset of iterative interactions of human users with Midjourney.
Paper link: https://aclanthology.org/2023.emnlp-main.253/
The dataset that was collected and used in this paper is available in the data
folder.
The data is in a csv format, divided into 9 files (threads_i.csv for i in range(0, 9, 20000)). It is also available as a Huggingface 🤗 dataset here
Main Columns:
- 'text' - the original prompt
- 'args' - predefined parameters (such as the aspect ratio, chaos and [more][myexample])
- 'channel_id' - the discord channel
- 'userid' - an anonymous user id
- 'timestamp' - a timestamp of the prompt creation
- 'label' - Ture whether an image that was generated based on that prompt was upscaled, otherwise False.
- 'id' - unique id of the prompt
- 'url_png' - link to the generated images (a 4-grid version)
- 'main_content' - prefix of the prompt, without trailing magic-words
- 'concreteness' - concreteness score, based on the [this paper][concpaper]
- 'word_len' - the number of words
- 'repeat_words' - the occurrences of each word that appears more than once in the prompt, excluding stop words.
- 'reapeat_words_ratio' - repeat_words / word_len
- 'perplexity' - the perplexity GPT-2 assigns to each prompt.
- 'caption_0-3' - captions that were generated by the BLIP-2 model, with the 4 created images as its inputs.
- 'phase' - train/test split, as was used to train image/text classifiers
- 'magic_ratio' - the percentage of words that were recognized as magic words in the prompt
- 'thread_id' - the id of the thread
- 'depth' - the max depth of a constituency parse tree of the prompt.
- 'num_sent_parser' - the number of sentences in the prompt.
- 'num_sent_parser_ratio' - num_sent_parser / word_len
- 'words_per_sent' - word_len / num_sent_parser
The code for preparing the data is in the prepare
folder.
If you find this work useful, please cite our paper:
@inproceedings{don-yehiya-etal-2023-human,
title = "Human Learning by Model Feedback: The Dynamics of Iterative Prompting with Midjourney",
author = "Don-Yehiya, Shachar and
Choshen, Leshem and
Abend, Omri",
editor = "Bouamor, Houda and
Pino, Juan and
Bali, Kalika",
booktitle = "Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing",
month = dec,
year = "2023",
address = "Singapore",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2023.emnlp-main.253",
pages = "4146--4161",
abstract = "Generating images with a Text-to-Image model often requires multiple trials, where human users iteratively update their prompt based on feedback, namely the output image. Taking inspiration from cognitive work on reference games and dialogue alignment, this paper analyzes the dynamics of the user prompts along such iterations. We compile a dataset of iterative interactions of human users with Midjourney. Our analysis then reveals that prompts predictably converge toward specific traits along these iterations. We further study whether this convergence is due to human users, realizing they missed important details, or due to adaptation to the model{'}s {``}preferences{''}, producing better images for a specific language style. We show initial evidence that both possibilities are at play. The possibility that users adapt to the model{'}s preference raises concerns about reusing user data for further training. The prompts may be biased towards the preferences of a specific model, rather than align with human intentions and natural manner of expression.",
}