Geometric Autoencoder for Diffusion Models (GAE)

📄 Abstract

Latent diffusion models have established a new state-of-the-art in high-resolution visual generation. Integrating Vision Foundation Model priors improves generative efficiency, yet existing latent designs remain largely heuristic. These approaches often struggle to unify semantic discriminability, reconstruction fidelity, and latent compactness. In this paper, we propose Geometric Autoencoder (GAE), a principled framework that systematically addresses these challenges. By analyzing various alignment paradigms, GAE constructs an optimized low-dimensional semantic supervision target from VFMs to provide guidance for the autoencoder. Furthermore, we leverage latent normalization that replaces the restrictive KL-divergence of standard VAEs, enabling a more stable latent manifold specifically optimized for diffusion learning. To ensure robust reconstruction under high-intensity noise, GAE incorporates a dynamic noise sampling mechanism. Empirically, GAE achieves compelling performance on the ImageNet-1K $256 \times 256$ benchmark, reaching a gFID of 1.82 at only 80 epochs and 1.31 at 800 epochs without Classifier-Free Guidance, significantly surpassing existing state-of-the-art methods. Beyond generative quality, GAE establishes a superior equilibrium between compression, semantic depth and robust reconstruction stability. These results validate our design considerations, offering a promising paradigm for latent diffusion modeling.

📢 News

[2026.03.12]: Core code released! Includes DiT training and inference based on GAE latent space.
[2026.03.12]: Pre-trained weights for GAE-AE and DiT are available on Hugging Face.

📸 Overview

Geometric Autoencoder (GAE) is a principled framework designed to systematically address the heuristic nature of latent space design in Latent Diffusion Models (LDMs). GAE significantly enhances semantic discriminability and latent compactness without compromising reconstruction fidelity through three core innovations:

Latent Normalization: Replaces the restrictive KL-divergence of standard VAEs with RMSNorm regularization. By projecting features onto a unit hypersphere, GAE prevents training collapse and provides a stable, scalable latent manifold optimized for diffusion learning.
Latent Alignment: Leverages Vision Foundation Models (VFMs, e.g., DINOv2) as semantic teachers. Through a carefully designed semantic downsampler, the low-dimensional latent vectors directly inherit strong discriminative semantic priors.
Dynamic Noise Sampling: Specifically addresses the high-intensity noise typical in diffusion processes, ensuring robust reconstruction performance even under extreme noise levels.

📊 Quantitative Analysis

GAE achieves state-of-the-art performance on the ImageNet-1K $256 \times 256$ benchmark.

🌟 Visual Results

📦 Model Zoo & Weights

Some pre-trained weights are hosted on Hugging Face.

Model	Epochs	Latent Dim	gFID (w/o)	Weights
GAE-LightningDiT-XL	80	32	1.82	🔗 HF Link
GAE-LightningDiT-XL	800	32	1.31	🔗 HF Link

Model	Epochs	Latent Dim	Weights
GAE	200	32	🔗 HF Link

🛠️ Usage

We use LightningDiT for the DiT implementation.

1. Installation

git clone https://github.com/sii-research/GAE.git
cd GAE
conda create -n gae python=3.10.12
conda activate gae
pip install -r requirements.txt

2. Extract Latents

Download the pre-trained weights from Hugging Face and place them in the checkpoints/ folder. Ensure update the paths in the configs/ folder to match your local setup.

bash extract_gae.sh  $DIT_CONFIG $VAE_CONFIG

3. Training

bash train_gae.sh  $DIT_CONFIG $VAE_CONFIG

4. Inference (Sampling)

For class-uniform sampling:

bash inference_gae.sh  $DIT_CONFIG $VAE_CONFIG

For class-random sampling:

Change "from inference_sample import" -> "from inference import" in inference_gae.py

🤝 Acknowledgements

Our project is built upon the excellent foundations of the following open-source projects:

LightningDiT: For the PyTorch Lightning based DiT implementation.
RAE: For the timeshift and class-uniform sampling implementation.
ADM: For the evaluation suite to score generated samples.

We express our sincere gratitude to the authors for their valuable contributions to the community.

📝 Citation

If you find this work useful, please consider citing:

@misc{liu2026geometricautoencoderdiffusionmodels,
      title={Geometric Autoencoder for Diffusion Models}, 
      author={Hangyu Liu and Jianyong Wang and Yutao Sun},
      year={2026},
      eprint={2603.10365},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2603.10365}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
configs		configs
datasets		datasets
figure		figure
models		models
tokenizer		tokenizer
transport		transport
.gitignore		.gitignore
README.md		README.md
extract_gae.py		extract_gae.py
extract_gae.sh		extract_gae.sh
inference.py		inference.py
inference_gae.py		inference_gae.py
inference_gae.sh		inference_gae.sh
inference_sample.py		inference_sample.py
requirements.txt		requirements.txt
train.py		train.py
train_gae.py		train_gae.py
train_gae.sh		train_gae.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Geometric Autoencoder for Diffusion Models (GAE)

📄 Abstract

📢 News

📸 Overview

📊 Quantitative Analysis

🌟 Visual Results

📦 Model Zoo & Weights

🛠️ Usage

1. Installation

2. Extract Latents

3. Training

4. Inference (Sampling)

🤝 Acknowledgements

📝 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Geometric Autoencoder for Diffusion Models (GAE)

📄 Abstract

📢 News

📸 Overview

📊 Quantitative Analysis

🌟 Visual Results

📦 Model Zoo & Weights

🛠️ Usage

1. Installation

2. Extract Latents

3. Training

4. Inference (Sampling)

🤝 Acknowledgements

📝 Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages