TC-AE: Unlocking Token Capacity for Deep Compression Autoencoders

Teng Li^1,2*, Ziyuan Huang^1,*,✉, Cong Chen^1,3,*, Yangfu Li^1,4, Yuanhuiyi Lyu^1,5,
Dandan Zheng¹, Chunhua Shen³, Jun Zhang^2✉
¹Inclusion AI, Ant Group, ²HKUST, ³ZJU, ⁴ECNU, ⁵HKUST (GZ)
^*Equal contribution, ✉ Corresponding authors

News

[2026/04/09] Research paper, code, and models are released for TC-AE!

Introduction

TC-AE is a novel Vision Transformer (ViT)-based tokenizer for deep image compression and visual generation. Traditional deep compression methods typically increase channel dimensions to maintain reconstruction quality at high compression ratios, but this often leads to representation collapse that degrades generative performance. TC-AE addresses this fundamental challenge from a new perspective: optimizing the token space — the critical bridge between pixels and latent representations. By scaling token numbers and enhancing their semantic structure, TC-AE achieves superior reconstruction and generation quality. Key Innovations:

Token Space Optimization: First to address representation collapse through token sapce optimization
Staged Token Compression: Decomposes token-to-latent mapping into two stages, reducing structural information loss in the bottleneck
Semantic Enhancement: Incorporates self-supervised learning to produce more generative-friendly latents

🚀 In this codebase, we release:

Pre-trained TC-AE tokenizer weights and evaluation code
Diffusion model training and evaluation code

Environment Setup

To set up the environment for TC-AE, follow these steps:

conda create -n tcae python=3.9
conda activate tcae
pip install -r requirements.txt

Download Checkpoints

Download the pre-trained TC-AE weights and place them in the results/ directory:

Tokenizer	Compression Ratio	rFID	LPIPS	Pretrained Weights
TC-AE-SL	f32d128	0.35	0.060

Reconstruction Evaluation

Image Reconstruction Demo

python tcae/script/demo_recon.py \
    --img_folder /path/to/your/images \
    --output_folder /path/to/output \
    --ckpt_path results/tcae.pt \
    --config configs/TC-AE-SL.yaml \
    --rank 0

ImageNet Evaluation

Evaluate reconstruction quality on ImageNet validation set:

python tcae/script/eval_recon.py \
    --ckpt_path results/tcae.pt \
    --dataset_root /path/to/imagenet_val \
    --config configs/TC-AE-SL.yaml \
    --rank 0

Generation Evaluation

Our DiT architecture and training pipeline are based on RAE and VA-VAE.

Prepare ImageNet Latents for Training

Extract and cache latent representations from ImageNet training set:

accelerate launch \
    --mixed_precision bf16 \
    diffusion/script/extract_features.py \
    --data_path /path/to/imagenet_train \
    --batch_size 50 \
    --tokenizer_cfg_path configs/TC-AE-SL.yaml \
    --tokenizer_ckpt_path results/tcae.pt

This will cache latents to results/cached_latents/imagenet_train_256/.

Training

Train a DiT-XL model on the extracted latents:

mkdir -p results/dit
torchrun --standalone --nproc_per_node=8 \
    diffusion/script/train_dit.py \
    --config configs/DiT-XL.yaml \
    --data-path results/cached_latents/imagenet_train_256 \
    --results-dir results/dit \
    --image-size 256 \
    --precision bf16

Sampling

Generate images using the trained diffusion model:

mkdir -p results/dit/samples
torchrun --standalone --nnodes=1 --nproc_per_node=8 \
    diffusion/script/sample_ddp_dit.py \
    --config configs/DiT-XL.yaml \
    --sample-dir results/dit/samples \
    --precision bf16 \
    --label-sampling equal \
    --tokenizer_cfg_path configs/TC-AE-SL.yaml \
    --tokenizer_ckpt_path results/tcae.pt

Evaluation

Download the ImageNet reference statistics: adm_in256_stats.npz and place it in results/.

python diffusion/script/eval_dit.py \
    --generated_dir results/dit/samples/DiT-0100000-cfg-1.00-bs100-ODE-50-euler-bf16 \
    --reference_npz results/adm_in256_stats.npz \
    --batch-size 512 \
    --num-workers 8

Acknowledgements

The codebase is built on HieraTok, RAE, VA-VAE, iBOT. Thanks for their efforts!

License

This project is licensed under the MIT License - see the LICENSE file for details.

Citation

@article{li2026tcae,
  title={TC-AE: Unlocking Token Capacity for Deep Compression Autoencoders},
  author={Li, Teng and Huang, Ziyuan and Chen, Cong and Li, Yangfu and Lyu, Yuanhuiyi and Zheng, Dandan and Shen, Chunhua and Zhang, Jun},
  journal={arXiv preprint arXiv:2604.07340},
  year={2026}
}

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
assets		assets
configs		configs
diffusion		diffusion
tcae		tcae
.DS_Store		.DS_Store
LEGAL.md		LEGAL.md
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TC-AE: Unlocking Token Capacity for Deep Compression Autoencoders

News

Introduction

Environment Setup

Download Checkpoints

Reconstruction Evaluation

Image Reconstruction Demo

ImageNet Evaluation

Generation Evaluation

Prepare ImageNet Latents for Training

Training

Sampling

Evaluation

Acknowledgements

License

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

TC-AE: Unlocking Token Capacity for Deep Compression Autoencoders

News

Introduction

Environment Setup

Download Checkpoints

Reconstruction Evaluation

Image Reconstruction Demo

ImageNet Evaluation

Generation Evaluation

Prepare ImageNet Latents for Training

Training

Sampling

Evaluation

Acknowledgements

License

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages