Skip to content

gmh5225/darling

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DARLING

This is the official implementation of the paper Jointly Reinforcing Diversity and Quality of Language Model Generations.

Getting Started for Training

Creating conda environment and install dependencies:

conda create -n verlenv python=3.10

Installing PyTorch (here we only tested on cuda 12.8)

pip install torch==2.7.1 torchvision==0.22.1 torchaudio==2.7.1 --index-url https://download.pytorch.org/whl/cu128

Install other dependencies:

cd verl
pip install -e ./
# This code only uses FSDP. If you need to use megatron please remove USE_MEGATRON=0
USE_MEGATRON=0  bash scripts/install_vllm_sglang_mcore.sh
# vllm 0.8.3
pip install vllm==0.8.3
# flash-attn
pip3 install flash-attn --no-build-isolation

To use Wandb, you would also need to set your api key:

export WANDB_API_KEY=<your_api_key>

Training Scripts

Training scripts for non-verifiable tasks can be found at verl/wildchat_scripts Training scripts for verifiable tasks can be found at verl/math_scripts

Running Darling

First you would need to serve the partition classifier (an HF checkpoint) from a server:

bash serve_classifier.sh <PATH_TO_CLASSIFIER_HF>

Then you would need to either manually change the hostname in verl/verl/utils/reward_score/partition_reward_vllm_serve.py or set the environment variable:

export VLLM_SERVER_HOSTNAME=<your hostname>

Then you can launch math_scripts/darling.batchfor training on Qwen-4B-Base or wildchat_scripts/darling.batch for training wildchat on Llama-3.1-8B-Instruct.

Evaluation

All evaluation for noveltybench, eqbench and math tasks can be found in evals.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Citation

If you find DARLING useful, please consider citing:

@article{tianjian2025jointlyreinforcingdiversityquality,
	title        = {Jointly Reinforcing Diversity and Quality in Language Model Generations},
	author       = {Tianjian Li and Yiming Zhang and Ping Yu and Swarnadeep Saha and Daniel Khashabi and Jason Weston and Jack Lanchantin and Tianlu Wang},
	year         = 2025,
	journal      = {arXiv preprint arXiv:2509.02534},
	url          = {https://arxiv.org/abs/2509.02534},
	eprint       = {2509.02534},
	archiveprefix = {arXiv},
	primaryclass = {cs.CL},
}

About

Official Implementation of the paper "Jointly Reinforcing Diversity and Quality inLanguage Model Generations"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • HTML 83.1%
  • Python 13.7%
  • Jupyter Notebook 2.0%
  • Shell 1.2%
  • ANTLR 0.0%
  • Jinja 0.0%