Official PyTorch Implementation of "Self-Corrected Flow Distillation for Consistent One-Step and Few-Step Text-to-Image Generation" (AAAI 2025)
1VinAI Research 2Rutgers University 3Cornell University
[Paper] [HuggingFace
*Equal contribution †Work done while at VinAI Research
TLDR: We introduce a self-corrected flow distillation method that integrates consistency models and adversarial training within the flow-matching framework, enabling consistent generation quality in both one-step and few-step sampling.
Table of Contents
Abstract: Flow matching has emerged as a promising framework for training generative models, demonstrating impressive empirical performance while offering relative ease of training compared to diffusion-based models. However, this method still requires numerous function evaluations in the sampling process. To address these limitations, we introduce a self-corrected flow distillation method that effectively integrates consistency models and adversarial training within the flow-matching framework. This work is a pioneer in achieving consistent generation quality in both few-step and one-step sampling. Our extensive experiments validate the effectiveness of our method, yielding superior results both quantitatively and qualitatively on CelebA-HQ and zero-shot benchmarks on the COCO dataset.
Details of the model architecture and experimental results can be found in our paper.
@inproceedings{dao2025scflow,
title = {Self-Corrected Flow Distillation for Consistent One-Step and Few-Step Text-to-Image Generation},
author = {Quan Dao and Hao Phung and Trung Dao and Dimitris Metaxas and Anh Tran},
booktitle = {Proceedings of the AAAI Conference on Artificial Intelligence},
year = {2025}
}Please CITE our paper and give us a ⭐ whenever this repository is used to help produce published results or incorporated into other software.
TODO:
- [] add our pretrained checkpoints
Tested on Linux with Python 3.10 and PyTorch 2.1.0.
conda create -n scflow python=3.10
conda activate scflow
pip install -r requirements.txtCelebA-HQ 256: Download our preprocessed lmdb as follow:
mkdir data/
cd data/
gdown --fuzzy https://drive.google.com/file/d/12NJYQv9lOZoVFcQgUeBaA0noYdewQ3lE/view?usp=sharing -O data/
unzip celeba-lmdb.zip
cd ../We use a 2M-sample subset of LAION with aesthetic score > 6.25. Equivalently, laion/aesthetics_v2_6_5plus is a subset of ~3M images with at aesthetic score >=6.5 that is publicly available.
To precompute latents for a dataset for faster training:
pip install dagshub
python sd_distill/precomputed_data.py \
--datadir laion/aesthetics_v2_6_5plus \
--cache_dir ./data/laion_aesthetics_v2_6_5plus \
--save_path ./data/laion_aesthetics_v2_6_5plus/latent_laion_aes/ \
--num_samples 2_000_000 \
--batch_size 64Download pre-computed dataset stats for CelebA-HQ dataset here and place them in pytorch_fid/.
To download pretrained teacher checkpoints:
bash tools/download_teacher.shTo download our distilled checkpoints:
bash tools/download_distilled.shAll checkpoints are saved under the pretrained/ folder.
bash tools/test.shbash tools/infer_instaflow.shTo evaluate on CelebA-HQ, add the following flags to tools/test_adm.sh to enable FID computation:
--compute_fid --output_log ${EXP}_${EPOCH_ID}_${METHOD}${STEPS}.logMulti-GPU sampling with 8 GPUs is supported by default for faster evaluation.
bash tools/train.shbash tools/train_instaflow.shThis codebase builds upon Flow Matching in Latent Space (LFM). We also thank the authors of LCM, Rectified Flow, and InstaFlow for their great work and publicly available codebases that facilitated this research.
If you have any problems, please open an issue in this repository or send an email to tienhaophung@gmail.com.
