Skip to content

jaylee2000/rsm

Repository files navigation

Reward Score Matching

  Website

TL;DR: We unify reward-based fine-tuning algorithms for diffusion and flow generative models. This allows us to distinguish the fundamental design choices from others.

Jeongjae Lee*, Jinho Chang*, Jeongsol Kim†, Jong Chul Ye†.

KAIST

🔥 News

  • [2026.05.21] Code released on Github!
  • [2026.05.07] Preprint updated on arXiv!
  • [2026.04.19] Preprint released on arXiv!

Repository Layout

Directory Purpose Model family
sd35_zeroth_order/ Zeroth-order experiments against TempFlow-GRPO baseline for Figure 4(a). Most users should start here. Stable Diffusion 3.5 Medium
sd15_zeroth_order/ Zeroth-order experiments against PCPO baseline for Figure 4(b, c). Stable Diffusion 1.5
sd35_first_order/ First-order experiments against VGG-Flow baseline for Figure 5(a, b). Stable Diffusion 3.5 Medium
sd15_first_order/ First-order experiments against Nabla-GFlowNet baseline for Figure 5(c, d). Stable Diffusion 1.5

Each is an independent component, with its own setup notes, configs, and launch scripts.

Setup Notes

Install dependencies inside the component you want to run. Model access, reward-model setup, and component-specific packages are described in each subdirectory README and config files.

Weights & Biases logging is expected by the current code. Set WANDB_API_KEY or replace the placeholder fields in component configs before running.

Review cache, checkpoint, and output paths before launching experiments. Some configs/scripts have hardcoded paths and may need local path edits.

Hardware Notes

SD3.5-M experiments were tested on CUDA 12.8 with 4 x H200 GPUs. These runs can nearly saturate 140GB of H200 VRAM when hosting the GenEval server simultaneously. They can be adapted to lower-VRAM GPUs (as low as 1 x 24GB GPU), by increasing gradient accumulation. BatchSampler is flexible; you can train with the same effective batch size.

SD1.5 experiments were tested on CUDA 12.x with RTX 4090 GPUs (24GB VRAM).

Package versions may need adjustment for different CUDA versions, CUDA 13.x, PyTorch wheels, xformers builds, or GPU microarchitectures.

Reproducing Paper Runs

Figure 4(a), Ours

cd sd35_zeroth_order
bash scripts/single_node/run_sd3.sh --profile lowsnr2 --sampler branch --reward geneval --loss matching --reweight fairclip2 --num-processes 4

Figure 4(b), Ours

cd sd15_zeroth_order
accelerate launch train_grpo_pr.py --config configs/train/base_grpo_pr_uwsigma_lora10.yaml

Figure 4(b, c), Baseline

cd sd15_zeroth_order
accelerate launch train_grpo.py --config configs/train/hpsv2_1_lora_grpo.yaml

Figure 5(a, b), Ours

cd sd35_first_order
torchrun --standalone --nproc_per_node=4 train_vggflow.py \
    --config=config/hpsv2_geneval_ours.py \
    --exp_name=OURS

Figure 5(a, b), Pruned Baseline

cd sd35_first_order
torchrun --standalone --nproc_per_node=4 train_vggflow.py \
    --config=config/hpsv2_geneval.py \
    --exp_name=PRUNED_BASELINE

Figure 5(c, d), Ours

cd sd15_first_order
torchrun --nproc_per_node=4 --master_port=29501 simple_res-nabladb.py --config config/simple_res-nabladb_sd_hps_usez0_basesnr.yaml

Figure 5(c, d), Pruned Baseline

cd sd15_first_order
torchrun --nproc_per_node=4 --master_port=29501 simple_res-nabladb.py --config config/simple_res-nabladb_sd_hps.yaml

Citation

If you find this repository useful, please cite:

@misc{lee2026rewardscorematchingunifying,
      title={Reward Score Matching: Unifying Reward-based Fine-tuning for Flow and Diffusion Models}, 
      author={Jeongjae Lee and Jinho Chang and Jeongsol Kim and Jong Chul Ye},
      year={2026},
      eprint={2604.17415},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2604.17415}, 
}

Acknowledgements

This repo is based on ddpo-pytorch, flow_grpo, pcpo, nabla-gfn, vggflow, TempFlow-GRPO.

License

This project is released under the MIT License. See LICENSE.

About

An official implementation of Reward Score Matching: Unifying Reward-based Fine-tuning for Flow and Diffusion Models

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors