Hyojun Go1 ·
Hyungjin Chung ·
Prune Truong2 ·
Goutam Bhat2 ·
Li Mi1 ·
Zhaochong An3
Zixiang Zhao1 ·
Dominik Narnhofer1
Serge Belongie3 ·
Federico Tombari2 ·
Konrad Schindler1
1ETH Zürich · 2Google · 3University of Copenhagen
StitchVM turns any pretrained pixel reward model (CLIP, HPSv2, Aesthetic Predictor, …) into a value model that scores noisy diffusion latents directly, at small compute cost. Drop the resulting value model into any diffusion-alignment recipe (DPS, FK steering, DRaFT, DiffusionNFT, …) and each gets cheaper and often better at the same time.
See the project page for the full story, figures, and results.
The code release is in preparation. Star or watch this repository to be notified.
@article{go2026stitchvm,
title = {Stitched Value Model for Diffusion Alignment},
author = {Go, Hyojun and Chung, Hyungjin and Truong, Prune and Bhat, Goutam
and Mi, Li and An, Zhaochong and Zhao, Zixiang and Narnhofer, Dominik
and Belongie, Serge and Tombari, Federico and Schindler, Konrad},
journal = {arXiv preprint arXiv:2605.19804},
year = {2026}
}For questions about the method or paper, please open a GitHub issue or contact Hyojun Go.
