📃 [Paper]
Repo for "Reprompting: Automated Chain-of-Thought Prompt Inference Through Gibbs Sampling".
Authors: Weijia Xu, Andrzej Banburski, Nebojsa Jojic
We introduce Reprompting, an iterative sampling algorithm that automatically learns the Chain-of-Thought (CoT) recipes for a given task without human intervention. Through Gibbs sampling, Reprompting infers the CoT recipes that work consistently well for a set of training samples by iteratively sampling new recipes using previously sampled recipes as parent prompts to solve other training problems. We conduct extensive experiments on 20 challenging reasoning tasks. Results show that Reprompting outperforms human-written CoT prompts substantially by +9.4 points on average. It also achieves consistently better performance than the state-of-the-art prompt optimization and decoding algorithms.
- Access to an LLM (e.g. GPT-4, ChatGPT, Qwen2.5, Claude3.7, etc)
code/run_reprompting.py
is the python script for running Reprompting.
- Before running it, you need to edit the script to:
- change hyperparameters (e.g. N_ITERATIONS)
- specify the path to your output log file
- edit the instruction for generating a solution for your task
- define
loadData
function to load the training and test data - define
sampleResponse
function to sample a response text from your choice of LLM
- Run it with
python run_reprompting.py <task_name> <initial_model_name> <iterative_model_name>
, e.g.python run_reprompting.py logical_deduction gpt-3.5-turbo gpt-3.5-turbo
.
Reprompting was developed for research and experimental purposes. Further testing and validation are needed before considering its application in commercial or real-world scenarios. Reprompting was designed and tested using the English language. Performance in other languages may vary and should be assessed by someone who is both an expert in the expected outputs and a native speaker of that language. Outputs generated by AI may include factual errors, fabrication, or speculation. Users are responsible for assessing the accuracy of generated content. All decisions leveraging outputs of the system should be made with human oversight and not be based solely on system outputs. Reprompting inherits any biases, errors, or omissions produced by its inference model. Developers are advised to choose an appropriate base LLM carefully, depending on the intended use case. Our evaluations found that the CoT recipes that work well on one model may work poorly on another, even when the latter may approach the best performance using prompts optimized for itself. These findings emphasize the need to optimize the prompt for each model for fair comparisons. There has not been a systematic effort to ensure that systems using Reprompting are protected from security vulnerabilities such as indirect prompt injection attacks. Any systems using it should take proactive measures to harden their systems as appropriate.
Better performance can be achieved by LLMs with strong reasoning capabilities. We strongly encourage users to use LLMs/MLLMs that support robust Responsible AI mitigations, such as Azure Open AI (AOAI) services. Such services continually update their safety and RAI mitigations with the latest industry standards for responsible use. For more on AOAI’s best practices when employing foundations models for scripts and applications:
- Blog post on responsible AI features in AOAI that were presented at Ignite 2023
- [Overview of Responsible AI practices for Azure OpenAI models] (https://learn.microsoft.com/en-us/legal/cognitive-services/openai/overview)
- Azure OpenAI Transparency Note
- OpenAI’s Usage policies
- Azure OpenAI’s Code of Conduct
Users are responsible for sourcing their datasets legally and ethically. This could include securing appropriate copy rights, ensuring consent for use of audio/images, and/or the anonymization of data prior to use in research.
Users are reminded to be mindful of data privacy concerns and are encouraged to review the privacy policies associated with any models and data storage solutions interfacing with Reprompting. It is the user’s responsibility to ensure that the use of Reprompting complies with relevant data protection regulations and organizational guidelines.
If you find this repo useful for your research, please consider citing the paper
@InProceedings{pmlr-v235-xu24b,
title = {Reprompting: Automated Chain-of-Thought Prompt Inference Through {G}ibbs Sampling},
author = {Xu, Weijia and Banburski, Andrzej and Jojic, Nebojsa},
booktitle = {Proceedings of the 41st International Conference on Machine Learning},
pages = {54852--54865},
year = {2024},
editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix},
volume = {235},
series = {Proceedings of Machine Learning Research},
month = {21--27 Jul},
publisher = {PMLR},
pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/xu24b/xu24b.pdf},
url = {https://proceedings.mlr.press/v235/xu24b.html}
}