GitHub - pickxiguapi/Uni-RLHF-Platform: Uni-RLHF platform for "Uni-RLHF: Universal Platform and Benchmark Suite for Reinforcement Learning with Diverse Human Feedback" (ICLR2024)

Project Website · Paper · Datasets · Clean Offline RLHF

This is the Uni-RLHF platform implementation of the paper Uni-RLHF: Universal Platform and Benchmark Suite for Reinforcement Learning with Diverse Human Feedback by Yifu Yuan, Jianye Hao, Yi Ma, Zibin Dong, Hebin Liang, Jinyi Liu, Zhixin Feng, Kai Zhao, Yan Zheng. Uni-RLHF aims to provide a complete workflow from real human feedback, fostering progress in the development of RLHF in decision making domain. Here we develops a user-friendly annotation interface tailored to various feedback types, compatible with a wide range of mainstream RL environments. We then establish a systematic pipeline of crowdsourced annotations, resulting in large-scale annotated dataset (≈15 million steps). Also, we provide offline RLHF baselines using collected feedback datasets and various design choice in the Clean Offline RLHF.

Table of Contents

Getting Started
- Prerequisites
- Installation
Usage
Roadmap
Contributing
License
Contact
Acknowledgments

🛠️ Getting Started

The Uni-RLHF platform consists of a vue front-end and a flask back-end. Also, we support a wide range of mainstream RL environments for annotation.

Installation

Platform

Clone the repo

git clone https://github.com/TJU-DRL-LAB/Uni-RLHF.git
cd Uni-RLHF

Install virtualenv

conda create -n rlhf python==3.9
conda activate rlhf
pip install -r requirements.txt

Install NPM packages

npm install --prefix ./uni_rlhf/vue_part

Configure a MySQL Database

Datasets

Uni-RLHF supports the following classic datasets, a full list of all tasks is available here. Uni-RLHF also supports the uploading of customizaton datasets, as long as the dataset contains observations and terminals keys.

Install D4RL dependencies. Note that we made some small changes to the camera view for better visualisations.
```
cd d4rl
pip install -e .
```

Install Atari dependencies.

pip install git+https://github.com/takuseno/d4rl-atari

Install V-D4RL dependencies. Note that v-d4rl provide image datasets and full datasets can be found on GoogleDrive. These must be downloaded before running the code. And the right file structure is:

 uni_rlhf
 └───datasets
 │   └───dataset_resource
 |       └───vd4rl
 |       |   └───cheetah
 |       |   │   └───cheetah_run_medium
 |       |   │   └───cheetah_run_medium_expert
 |       |   └───humanoid
 |       |   |   |───humanoid_walk_medium
 |       |   │   └───humanoid_walk_medium_expert
 |       |   └───walker
 |       |       |───walker_walk_medium
 |       |       └───walker_walk_medium_expert
 |       └───smarts
 |          └───cruise
 |          └───curin
 |          └───left_c
 └───vue_part
 │   ...
 └───controllers
 │   ...

Install MiniGrid dependencies. There are the same dependencies as the D4RL datasets.
Install SMARTS dependencies. We employed online reinforcement learning algorithms to train two agents for datasets collection, each designed specifically for the respective scenario. The first agent demonstrates medium driving proficiency, achieving a success rate ranging from 40% to 80% in its designated scenario. In contrast, the second agent exhibits expert-level performance, attaining a success rate of 95% or higher in the same scenario. For dataset construction, 800 driving trajectories were collected using the intermediate agent, while an additional 200 were gathered via the expert agent. By integrating the two datasets, we compiled a mixed dataset encompassing 1,000 driving trajectories. We upload full datasets containing image (for rendering) and vector (for training) on GoogleDrive. These must be downloaded before running the code. And the right file structure is the same as v-d4rl dataset.

Upload customization datasets. The customization datasets must be h5df format and contain observations and terminal keys:

observations: An N by observation dimensional array of observations.
terminals: An N dimensional array of episode termination flags.

(back to top)

Setup

To run the platform, you should configure SQLALCHEMY_DATABASE in the uni_rlhf/config.py, then run with:

python run.py

App is running at:

http://localhost:5001

You can kill all relative process with:

python scripts/kill_process.py

💻 Usage

Overview

Specially tailored pipelines and tasks for reinforcement learning and decision-making problem.
A clean pipeline designed for employer-annotators coordination
Supports multi-user synchronised labeling and export with no conflict.
Supports a large number of mainstream decision-making datasets and easily cumstomize and upload your own datasets.
Supports serveral mainstream feedback types for decision making problem and provide configurable label formats let you combining new ways of giving feedback.

Supported Tasks

We support serveral build-in environments and datasets. See config for expected name formatting for full domains and tasks.

Supported Feedbacks Format

We support five common feedback types, propose a standardized feedback encoding format how annotators interact with these types and how they can be encoded. Additionally, we briefly outline the potential forms and applications of reinforcement learning that integrate various forms of human feedback in the Uni-RLHF paper.

Offline RLHF Datasets and Benchmark

Thanks to Uni-RLHF, we establish a systematic pipeline of crowdsourced annotations, resulting in an open-source and reuseable large-scale annotated dataset (≈15 million steps). Then, we conduct offline RL baselines using collected feedback datasets, we refer to offline RLHF baselines in the sister repository. We wish to build valuable open-source platforms, datasets, and baselines to facilitate the development of more robust and reliable RLHF solutions for decision making based on realistic human feedback.

For more examples, please refer to the Documentation

(back to top)

🧭 Roadmap

Support auto reward model training process
Fix online training bug
Adapting the sampler in the new code framework

See the open issues for a full list of proposed features (and known issues).

(back to top)

🙏 Contributing

Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!

Fork the Project
Create your Feature Branch (git checkout -b feature/AmazingFeature)
Commit your Changes (git commit -m 'Add some AmazingFeature')
Push to the Branch (git push origin feature/AmazingFeature)
Open a Pull Request

(back to top)

🏷️ License

Distributed under the MIT License. See LICENSE.txt for more information.

(back to top)

✉️ Contact

For any questions, please feel free to email yuanyf@tju.edu.cn.

(back to top)

📝 Citation

If you find our work useful, please consider citing:

@inproceedings{anonymous2023unirlhf,
    title={Uni-{RLHF}: Universal Platform and Benchmark Suite for Reinforcement Learning with Diverse Human Feedback},
    author={Yuan, Yifu and Hao, Jianye and Ma, Yi and Dong, Zibin and Liang, Hebin and Liu, Jinyi and Feng, Zhixin and Zhao, Kai and Zheng, Yan}
    booktitle={The Twelfth International Conference on Learning Representations, ICLR},
    year={2024},
    url={https://openreview.net/forum?id=WesY0H9ghM},
}

(back to top)

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commits
assets		assets
d4rl		d4rl
scripts		scripts
uni_rlhf		uni_rlhf
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
__init__.py		__init__.py
requirements.txt		requirements.txt
run.py		run.py
run.sh		run.sh

License

pickxiguapi/Uni-RLHF-Platform

Folders and files

Latest commit

History

Repository files navigation

🛠️ Getting Started

Installation

Platform

Datasets

Setup

💻 Usage

Overview

Supported Tasks

Supported Feedbacks Format

Offline RLHF Datasets and Benchmark

🧭 Roadmap

🙏 Contributing

🏷️ License

✉️ Contact

📝 Citation

About

Resources

License

Stars

Watchers

Forks

Languages