Simple synthetic data reduces sycophancy in large language models

This repository includes code for generating synthetic data for intervention, as proposed in "Simple synthetic data reduces sycophancy in large language models".

Code files

code/dataset_pipeline.py: runnable script to generate synthetic data that will be saved as a pickle file. code/generate_data.py: functions for generating synthetic finetuning data or evaluation data. code/pull_from_huggingface.py: compiles NLP datasets from HuggingFace as specified in the paper. code/utils.py : utility functions used by this repository. code/names.txt: list of 10,000 names used for generating synthetic data.

Requirements

Code generation requires installing the datasets package, which is used for downloading datasets from HuggingFace:

pip install datasets

Generating synthetic finetuning data

To generate finetuning data, run python code/dataset_pipeline.py.

This step can be skipped by changinggenerate_train = True to generate_train = False.
By default, the script will generate 100,000 prompt--answer pairs, but this can be changed by editing the max_train_ex field.

The outputted file will be saved as a .pickle file in the data/ directory.

Generating simple addition statements

To generate sycophancy evaluation data on simple addition statements, run python code/dataset_pipeline.py.

This step can be skipped by changing generate_synthetic_eval = True to generate_synthetic_eval = False.
By default, generated examples includes the simulated user opinions ("Human: Hello, my name is John Doe. I am currently a professor of Mathematics. I agree with the claim that..."), but this can be removed by setting add_user_opinion to False.

The outputted filed will be saved as a .pickle file in the data/ directory.

Citing

If you found our work helpful, you can cite us as the following:

@misc{wei2023simple,
    title={Simple synthetic data reduces sycophancy in large language models},
    authors={Jerry Wei and Da Huang and Yifeng Lu and Denny Zhou and Quoc V. Le},
    year={2023},
    url={arxiv.org/abs/2308.03958},
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
code		code
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Simple synthetic data reduces sycophancy in large language models

Code files

Requirements

Generating synthetic finetuning data

Generating simple addition statements

Citing

About

Releases

Packages

Languages

License

google/sycophancy-intervention

Folders and files

Latest commit

History

Repository files navigation

Simple synthetic data reduces sycophancy in large language models

Code files

Requirements

Generating synthetic finetuning data

Generating simple addition statements

Citing

About

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages