Welcome to the official GitHub repository for TPO (Tree Preference Optimization)!
This is the official code for paper: TPO: Aligning Large Language Models with Multi-branch & Multi-step Preference Trees
📢 News: this work has been accepted at the ICLR 2025 !
If you find our project interesting or helpful, we would appreciate it if you could give us a star! Your support is a tremendous encouragement to us!
- 1+ A100(80G)
We use conda to manage the environment. Please refer to the following steps to install the environment:
conda create -n TPO python=3.11 -y
conda activate TPO
pip install -r requirements.txt- Step 1: Download data from the official Step-DPO library
- Step 2: Use GPT-4 to generate tree data and score each response
python data_prepare.pypython tpo_train.py- see eval/run.sh
@inproceedings{liao2024tpo,
title={TPO: Aligning Large Language Models with Multi-branch \& Multi-step Preference Trees},
author={Liao, Weibin and Chu, Xu and Wang, Yasha},
booktitle={The Thirteenth International Conference on Learning Representations},
year={2025}
}