Skip to content

rajabi2001/sega

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SEGA: Spectral-Energy Guided Attention for Resolution Extrapolation in Diffusion Transformers

Project Page   Paper

SEGA teaser

Official inference code for SEGA, a training-free method that dynamically rescales attention across RoPE components from the latent's spatial-frequency content at each denoising step. SEGA improves high-resolution synthesis without retraining, new weights, or architecture changes. Implementations are provided for FLUX (flux_sega/) and Qwen-Image (qwen_sega/).

Installation

git clone https://github.com/rajabi2001/sega.git
cd sega

pip install torch torchvision --index-url https://download.pytorch.org/whl/cu128
pip install -r requirements.txt

Model weights are fetched from Hugging Face on first run.

Usage

FLUX.1:

cd flux_sega
python run_flux.py --prompt "Your prompt here." --height 4096 --width 4096

Qwen-Image:

cd qwen_sega
python run_qwen.py --prompt "Your prompt here." --height 4096 --width 4096

Outputs are saved under outputs/ in each subdirectory.

Multi-GPU inference

Generating ultra-high-resolution images can exceed the memory of a single GPU. Both run_flux.py and run_qwen.py accept a --multi_gpu flag that distributes the transformer blocks across all visible CUDA devices (CLIP and VAE stay on cuda:0; for Qwen the text encoder is offloaded to CPU). At least 2 GPUs must be visible for this flag to take effect.

As a rule of thumb, you should pass --multi_gpu (with two or more GPUs visible) in these cases:

  • Qwen-Image at 4096×4096 or higher, when the available GPU does not have enough VRAM for a single-device run.
  • FLUX at 6144×6144 or higher, when the available GPU does not have enough VRAM for a single-device run.

If a single GPU has enough memory, you can omit --multi_gpu and run on one device. If you hit OOM, add --multi_gpu and make sure CUDA_VISIBLE_DEVICES exposes two or more GPUs.

Example — FLUX at 6144×6144:

cd flux_sega
CUDA_VISIBLE_DEVICES=0,1 python run_flux.py \
    --prompt "Your prompt here." \
    --height 6144 --width 6144 \
    --multi_gpu

Example — Qwen-Image at 4096×4096:

cd qwen_sega
CUDA_VISIBLE_DEVICES=0,1 python run_qwen.py \
    --prompt "Your prompt here." \
    --height 4096 --width 4096 \
    --multi_gpu

Citation

@article{rajabi2026sega,
  title={SEGA: Spectral-Energy Guided Attention for Resolution Extrapolation in Diffusion Transformers},
  author={Rajabi, Javad and Shaban, Kimia and Roohi, Koorosh and Lindell, David B and Taati, Babak},
  journal={arXiv preprint arXiv:2605.22668},
  year={2026}
}

Acknowledgments

This repository adapts the inference layout and scripts from DyPE.

About

Official implementation of "SEGA: Spectral-Energy Guided Attention for Resolution Extrapolation in Diffusion Transformers"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages