SuperHF

This project is a research prototype for the paper Supervised Iterative Fine-Tuning From Human Preferences

The goal of this project is to demonstrate an alternative method to RLHF based on training without using PPO that demonstrates better alignment with human preferences.

Directory structure

src/superhf/ contains the code for the SuperHFTrainer class and superhf code.
src/reward_modeling/ contains the code for training the reward model.
experiments/ contains the code for calling the trainers and running various experiments reported in the paper.
experiments/superhf/superhf_iterative_v1 contains the code for running the superhf experiments that were used.
experiments/rlhf/rlhf_v1 contains the code for running the rlhf experiments.
experiments/evaluations/ contains the code for evaluating the trained models.

Installation

Python package

Install the library with pip:

pip install superhf

Name		Name	Last commit message	Last commit date
Latest commit History 1,203 Commits
.vscode		.vscode
charts		charts
docs		docs
eval_results		eval_results
experiments		experiments
requirements		requirements
src		src
tests		tests
trl @ b8a518b		trl @ b8a518b
.coveragerc		.coveragerc
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
.pylintrc		.pylintrc
.readthedocs.yml		.readthedocs.yml
AUTHORS.rst		AUTHORS.rst
CHANGELOG.rst		CHANGELOG.rst
CONTRIBUTING.rst		CONTRIBUTING.rst
LICENSE.txt		LICENSE.txt
README.md		README.md
devtools.ps1		devtools.ps1
devtools.sh		devtools.sh
environment.yml		environment.yml
mypy.ini		mypy.ini
push_eval_results.sh		push_eval_results.sh
pyproject.toml		pyproject.toml
setup.cfg		setup.cfg
setup.py		setup.py
sync_wandb_loop.sh		sync_wandb_loop.sh
tox.ini		tox.ini

License

openfeedback/superhf

Folders and files

Latest commit

History

Repository files navigation

SuperHF

Directory structure

Installation

Python package

About

Resources

License

Stars

Watchers

Forks

Languages