Skip to content

microsoft/Reprompting

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation


Reprompting

📃 [Paper]

Repo for "Reprompting: Automated Chain-of-Thought Prompt Inference Through Gibbs Sampling".

Authors: Weijia Xu, Andrzej Banburski, Nebojsa Jojic

Contents

Introduction

We introduce Reprompting, an iterative sampling algorithm that automatically learns the Chain-of-Thought (CoT) recipes for a given task without human intervention. Through Gibbs sampling, Reprompting infers the CoT recipes that work consistently well for a set of training samples by iteratively sampling new recipes using previously sampled recipes as parent prompts to solve other training problems. We conduct extensive experiments on 20 challenging reasoning tasks. Results show that Reprompting outperforms human-written CoT prompts substantially by +9.4 points on average. It also achieves consistently better performance than the state-of-the-art prompt optimization and decoding algorithms.

Prerequisite

  • Access to an LLM (e.g. GPT-4, ChatGPT, Qwen2.5, Claude3.7, etc)

Usage

code/run_reprompting.py is the python script for running Reprompting.

  • Before running it, you need to edit the script to:
    • change hyperparameters (e.g. N_ITERATIONS)
    • specify the path to your output log file
    • edit the instruction for generating a solution for your task
    • define loadData function to load the training and test data
    • define sampleResponse function to sample a response text from your choice of LLM
  • Run it with python run_reprompting.py <task_name> <initial_model_name> <iterative_model_name>, e.g. python run_reprompting.py logical_deduction gpt-3.5-turbo gpt-3.5-turbo.

Limitations

Reprompting was developed for research and experimental purposes. Further testing and validation are needed before considering its application in commercial or real-world scenarios. Reprompting was designed and tested using the English language. Performance in other languages may vary and should be assessed by someone who is both an expert in the expected outputs and a native speaker of that language. Outputs generated by AI may include factual errors, fabrication, or speculation. Users are responsible for assessing the accuracy of generated content. All decisions leveraging outputs of the system should be made with human oversight and not be based solely on system outputs. Reprompting inherits any biases, errors, or omissions produced by its inference model. Developers are advised to choose an appropriate base LLM carefully, depending on the intended use case. Our evaluations found that the CoT recipes that work well on one model may work poorly on another, even when the latter may approach the best performance using prompts optimized for itself. These findings emphasize the need to optimize the prompt for each model for fair comparisons. There has not been a systematic effort to ensure that systems using Reprompting are protected from security vulnerabilities such as indirect prompt injection attacks. Any systems using it should take proactive measures to harden their systems as appropriate.

Best Practices

Better performance can be achieved by LLMs with strong reasoning capabilities. We strongly encourage users to use LLMs/MLLMs that support robust Responsible AI mitigations, such as Azure Open AI (AOAI) services. Such services continually update their safety and RAI mitigations with the latest industry standards for responsible use. For more on AOAI’s best practices when employing foundations models for scripts and applications:

Citation

If you find this repo useful for your research, please consider citing the paper

@InProceedings{pmlr-v235-xu24b,
  title = 	 {Reprompting: Automated Chain-of-Thought Prompt Inference Through {G}ibbs Sampling},
  author =       {Xu, Weijia and Banburski, Andrzej and Jojic, Nebojsa},
  booktitle = 	 {Proceedings of the 41st International Conference on Machine Learning},
  pages = 	 {54852--54865},
  year = 	 {2024},
  editor = 	 {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix},
  volume = 	 {235},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {21--27 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v235/main/assets/xu24b/xu24b.pdf},
  url = 	 {https://proceedings.mlr.press/v235/xu24b.html}
}

About

Automated Chain-of-Thought Prompt Inference Through Gibbs Sampling

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages