Reprompting

Repo for "Reprompting: Automated Chain-of-Thought Prompt Inference Through Gibbs Sampling".

Authors: Weijia Xu, Andrzej Banburski, Nebojsa Jojic

Introduction

We introduce Reprompting, an iterative sampling algorithm that automatically learns the Chain-of-Thought (CoT) recipes for a given task without human intervention. Through Gibbs sampling, Reprompting infers the CoT recipes that work consistently well for a set of training samples by iteratively sampling new recipes using previously sampled recipes as parent prompts to solve other training problems. We conduct extensive experiments on 20 challenging reasoning tasks. Results show that Reprompting outperforms human-written CoT prompts substantially by +9.4 points on average. It also achieves consistently better performance than the state-of-the-art prompt optimization and decoding algorithms.

Prerequisite

Access to an LLM (e.g. GPT-4, ChatGPT, Qwen2.5, Claude3.7, etc)

Usage

code/run_reprompting.py is the python script for running Reprompting.

Before running it, you need to edit the script to:
- change hyperparameters (e.g. N_ITERATIONS)
- specify the path to your output log file
- edit the instruction for generating a solution for your task
- define loadData function to load the training and test data
- define sampleResponse function to sample a response text from your choice of LLM
Run it with python run_reprompting.py <task_name> <initial_model_name> <iterative_model_name>, e.g. python run_reprompting.py logical_deduction gpt-3.5-turbo gpt-3.5-turbo.

Limitations

Reprompting was developed for research and experimental purposes. Further testing and validation are needed before considering its application in commercial or real-world scenarios. Reprompting was designed and tested using the English language. Performance in other languages may vary and should be assessed by someone who is both an expert in the expected outputs and a native speaker of that language. Outputs generated by AI may include factual errors, fabrication, or speculation. Users are responsible for assessing the accuracy of generated content. All decisions leveraging outputs of the system should be made with human oversight and not be based solely on system outputs. Reprompting inherits any biases, errors, or omissions produced by its inference model. Developers are advised to choose an appropriate base LLM carefully, depending on the intended use case. Our evaluations found that the CoT recipes that work well on one model may work poorly on another, even when the latter may approach the best performance using prompts optimized for itself. These findings emphasize the need to optimize the prompt for each model for fair comparisons. There has not been a systematic effort to ensure that systems using Reprompting are protected from security vulnerabilities such as indirect prompt injection attacks. Any systems using it should take proactive measures to harden their systems as appropriate.

Best Practices

Better performance can be achieved by LLMs with strong reasoning capabilities. We strongly encourage users to use LLMs/MLLMs that support robust Responsible AI mitigations, such as Azure Open AI (AOAI) services. Such services continually update their safety and RAI mitigations with the latest industry standards for responsible use. For more on AOAI’s best practices when employing foundations models for scripts and applications:

Blog post on responsible AI features in AOAI that were presented at Ignite 2023
[Overview of Responsible AI practices for Azure OpenAI models] (https://learn.microsoft.com/en-us/legal/cognitive-services/openai/overview)
Azure OpenAI Transparency Note
OpenAI’s Usage policies
Azure OpenAI’s Code of Conduct Users are responsible for sourcing their datasets legally and ethically. This could include securing appropriate copy rights, ensuring consent for use of audio/images, and/or the anonymization of data prior to use in research.
Users are reminded to be mindful of data privacy concerns and are encouraged to review the privacy policies associated with any models and data storage solutions interfacing with Reprompting. It is the user’s responsibility to ensure that the use of Reprompting complies with relevant data protection regulations and organizational guidelines.

Citation

If you find this repo useful for your research, please consider citing the paper

@InProceedings{pmlr-v235-xu24b,
  title = 	 {Reprompting: Automated Chain-of-Thought Prompt Inference Through {G}ibbs Sampling},
  author =       {Xu, Weijia and Banburski, Andrzej and Jojic, Nebojsa},
  booktitle = 	 {Proceedings of the 41st International Conference on Machine Learning},
  pages = 	 {54852--54865},
  year = 	 {2024},
  editor = 	 {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix},
  volume = 	 {235},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {21--27 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v235/main/assets/xu24b/xu24b.pdf},
  url = 	 {https://proceedings.mlr.press/v235/xu24b.html}
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
code		code
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Reprompting

Contents

Introduction

Prerequisite

Usage

Limitations

Best Practices

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

microsoft/Reprompting

Folders and files

Latest commit

History

Repository files navigation

Reprompting

Contents

Introduction

Prerequisite

Usage

Limitations

Best Practices

Citation

About

Resources

License

Code of conduct

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages