Jiaqi Han*
We propose Spectrum, a training-free spectral diffusion feature forecaster that enables global, long-range feature reuse with tightly controlled error. We view the latent features of the denoiser as functions over time and approximate them with Chebyshev polynomials. Specifically, we fit the coefficient for each basis via ridge regression, which is then leveraged to forecast features at multiple future diffusion steps. We theoretically reveal that our approach admits more favorable long-horizon behavior and yields an error bound that does not compound with the step size.
Extensive experiments on various state-of-the-art image and video diffusion models consistently verify the superiority of our approach. Notably, we achieve up to
Please give us a star ⭐ if you find our work interesting!
🚀 Also checkout our previous work CHORDS on multi-core diffusion sampling acceleration, accepted at ICCV 2025!
Our code relies on the following core packages:
torch
transformers
diffusers
hydra-core
imageio
imageio-ffmpeg
For the specific versions of these packages that have been verified as well as some optional dependencies, please refer to requirements.txt. We recommend creating a new virual environment via the following procedure:
conda create -n spectrum python=3.10
conda activate spectrum
pip install -r requirements.txtPrior to running inference pipeline, please make sure that the models have been downloaded from 🤗 huggingface. We provide the download script for some example models for image and video generation in download.py.
We use hydra to organize different hyperparameters for the image/video diffusion model as well as the sampling algorithm. The default configurations can be found under configs folder. The entries to launch the sampling for image and video generation are src/text_to_image.py and src/text_to_video.py, respectively. For SDXL, please refer to src/text_to_image_sdxl.py.
The command below is an example to perform image generation on Flux using CHORDS on 8 GPUs.
CUDA_VISIBLE_DEVICES=0 \
python src/text_to_image.py \
model=flux \
algo=spectrum \
algo.w=0.5 \
algo.lam=0.1 \
algo.m=4 \
window_size=2 \
flex_window=0.75 \
exp_name=temp \
ngpu=1 \
total_prompt_num=1000 \
output_base_path=output_samples_image \
prompt_file=prompts/DrawBench200.txtFor model we currently support:
flux: Fluxsd3-5: Stable Diffusion 3.5-Largesdxl: SDXL (please usepython src/text_to_image_sdxl.pyto launch)
algo.w is by default set to 1.0, which recovers our Chebyshev predictor. Post publication, we also find that a convex mixture of our spectral predictor with linear interpolation slightly enhances robustness across a wider range of acceleration ratios. We recommend setting algo.w between 0.5 and 1.0.
algo.lam refers to the regularization strength
algo.m refers to the number of Chebyshev bases. By default set to 4.
window_size refers to the initial window size
flex_window refers to the hyperparameter
ngpu corresponds to the number of GPUs to use in parallel. We split all prompts equally to several gpus to speedup the benchmark for all methods. Note that it should match CUDA_VISIBLE_DEVICES.
output_base_path is the directory to save the generated samples.
prompt_file stores the list of prompts, each per line, that will be sequentially employed to generate each image.
For full functionality of the script, please refer to the arguments and their default values (such as the number of inference steps, the resolution of the image, etc.) under the configs folder, which is parsed by hydra.
Remarks: window_size=2 and flex_window=0.75 recovers the window_size=2 and flex_window=3.0, which corresponds to the
We also provide a boilerplate script to launch the inference:
# For Flux and Stable Diffusion 3.5-Large
bash scripts/run_mp_image.sh
# For SDXL
bash scripts/run_mp_image_sdxl.shSimilarly, the following script can be used for video generation with Spectrum:
CUDA_VISIBLE_DEVICES=0 \
python src/text_to_video.py \
model=hunyuan \
algo=spectrum \
algo.w=0.5 \
algo.lam=0.1 \
algo.m=4 \
window_size=2 \
flex_window=0.75 \
exp_name=temp \
ngpu=1 \
total_prompt_num=1000 \
output_base_path=output_samples_video \
prompt_file=prompts/video_demo.txtwhere for model we currently support:
hunyuan: HunyuanVideowan14b: Wan2.1-14B
We also provide a boilerplate script to launch the inference:
# For HunyuanVideo and Wan2.1-14B
bash scripts/run_mp_video.shRemark: For high-resolution video generation, change model.width, model.height, and model.num_frames to your specific choice. For exmaple, we use 1080x720x129f setting with HunyuanVideo for the qualitative examples.
Please consider citing our work if you find it useful:
@article{han2026adaptive,
title={Adaptive Spectral Feature Forecasting for Diffusion Sampling Acceleration},
author={Han, Jiaqi and Shi, Juntong and Li, Puheng and Ye, Haotian and Guo, Qiushan and Ermon, Stefano},
journal={arXiv preprint arXiv:2603.01623},
year={2026}
}
Part of the code was inspired by TaylorSeer. We thank the authors for open-sourcing the codebase.
If you have any question, welcome to contact me at:
Jiaqi Han: jiaqihan@stanford.edu
🔥 We warmly welcome community contributions for e.g. supporting more models! Please open/submit a PR if you are interested!

