This repository includes code for generating synthetic data for intervention, as proposed in "Simple synthetic data reduces sycophancy in large language models".
code/dataset_pipeline.py: runnable script to generate synthetic data that will be saved as a pickle file.
code/generate_data.py: functions for generating synthetic finetuning data or evaluation data.
code/pull_from_huggingface.py: compiles NLP datasets from HuggingFace as specified in the paper.
code/utils.py : utility functions used by this repository.
code/names.txt: list of 10,000 names used for generating synthetic data.
Code generation requires installing the datasets package, which is used for downloading datasets from HuggingFace:
pip install datasets
To generate finetuning data, run python code/dataset_pipeline.py.
- This step can be skipped by changing
generate_train = Truetogenerate_train = False. - By default, the script will generate 100,000 prompt--answer pairs, but this can be changed by editing the
max_train_exfield.
The outputted file will be saved as a .pickle file in the data/ directory.
To generate sycophancy evaluation data on simple addition statements, run python code/dataset_pipeline.py.
- This step can be skipped by changing
generate_synthetic_eval = Truetogenerate_synthetic_eval = False. - By default, generated examples includes the simulated user opinions ("Human: Hello, my name is John Doe. I am currently a professor of Mathematics. I agree with the claim that..."), but this can be removed by setting
add_user_opiniontoFalse.
The outputted filed will be saved as a .pickle file in the data/ directory.
If you found our work helpful, you can cite us as the following:
@misc{wei2023simple,
title={Simple synthetic data reduces sycophancy in large language models},
authors={Jerry Wei and Da Huang and Yifeng Lu and Denny Zhou and Quoc V. Le},
year={2023},
url={arxiv.org/abs/2308.03958},
}