TokenGS: Decoupling 3D Gaussian Prediction from Pixels with Learnable Tokens
Jiawei Ren*, Michal Tyszkiewicz*, Jiahui Huang, Zan Gojcic
* indicates equal contribution
Paper · Project Page · HuggingFace
TokenGS predicts 3D Gaussians with a self-supervised rendering objective. An encoder–decoder stacks learnable Gaussian tokens so the number of primitives is not tied to image resolution or view count.
Install the package in editable mode (dependencies include PyTorch, gsplat, and fused-ssim via pyproject.toml):
uv pip install -e .Environment: Python 3.11, CUDA 12.6+ (see pyproject.toml for pinned versions).
Data: DL3DV layout, symlinks, and dataset_kwargs are described in data/DATA.md.
Place weights under checkpoints/ (or pass any path to --resume). Metrics are written to <workspace>/metrics.txt; the workspace directory is created automatically.
Example (6-view preset):
accelerate launch --config_file acc_configs/gpu1.yaml \
-m tokengs.evaluate eval_dl3dv_6view \
--workspace results/dl3dv_eval/6view \
--resume checkpoints/dl3dv_6v.safetensors \
--use_ttt_for_eval \
--eval_n_media_dumps 20 \Presets eval_dl3dv_2view and eval_dl3dv_4view select the matching evaluation JSONs. Remove --use_ttt_for_eval to turn off test-time token tuning.
Media dumps: --eval_n_media_dumps N writes PNGs, MP4s, depth vis, and PLY for the first N dataloader batches under <workspace>/{images,videos,depths,gaussians}/ (default 0 = metrics only).
1. Base run (train_dl3dv_base preset):
accelerate launch --config_file acc_configs/gpu8.yaml \
-m tokengs.train train_dl3dv_base \
--workspace workspace/dl3dv_base \
--experiment_name dl3dv_base2. Finetune from a checkpoint (presets finetune_dl3dv_2view, finetune_dl3dv_4view, finetune_dl3dv_6view):
accelerate launch --config_file acc_configs/gpu8.yaml \
-m tokengs.train finetune_dl3dv_2view \
--workspace workspace/dl3dv_2view \
--experiment_name dl3dv_2view \
--resume workspace/dl3dv_base/model.safetensorsSwap the subcommand for 4- or 6-view finetune presets as needed.
TokenGS is released under the Apache License 2.0. See CONTRIBUTING.md for contribution guidelines.
If you use TokenGS in your research, please cite:
@article{tokengs2026,
title={TokenGS: Decoupling 3D Gaussian Prediction from Pixels with Learnable Tokens},
author={Jiawei Ren and Michal Tyszkiewicz and Jiahui Huang and Zan Gojcic},
journal={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year={2026}
}