Jiawei Lian1,2,*, Haoyi Sun2,*, Yang Wu1,*, Lifu Mu2,*, Siyuan Wang1,2, Le Hui3,4,†, Ning Mao2,†,‡, Tao Wei2, Pan Zhou2, Kun Zhan2, Jian Yang1,†
1🎓 Nanjing University of Science and Technology
2🏢 Li Auto Inc.
3🎓 Northwestern Polytechnical University
4🎓 Department of Computing, The Hong Kong Polytechnic University
*Equal Contribution †✉️ Corresponding Author ‡Project Leader
- [✓] Paper Release
- [✓] LiAuto-GeoX Weight
- [✓] Inference Instructions
- GeoX-Large
- Data Processing
- Training Pipeline
Before using the models, please request access to the checkpoints once they are released.
All released models will be evaluated under the same protocol as reported in the paper.
| Model | Parameters | Input Setting | Download |
|---|---|---|---|
| LiAuto-GeoX | 0.15B | Surround-view / Video | 🤗 Hugging Face |
| LiAuto-GeoX-Teacher | 1.1B | Surround-view | - |
Install the required dependencies:
pip install -r requirements.txtSingle Frame Example - Basic inference with RGB images:
CUDA_VISIBLE_DEVICES=2 python inference.py \
--image_folder /path/to/your/images \
--port 8082RGB + Sky Mask Example - Filter out sky regions for cleaner reconstruction:
CUDA_VISIBLE_DEVICES=2 python inference.py \
--image_folder /path/to/your/images \
--port 8082 \
--mask_skyRGB + Pose Example - Use ground truth camera poses for better accuracy:
CUDA_VISIBLE_DEVICES=2 python inference.py \
--image_folder /path/to/your/images \
--camera_folder /path/to/your/cameras \
--port 8083After running inference, open your browser and navigate to http://localhost:PORT (replace PORT with your specified port) to visualize the 3D reconstruction results interactively.
Additional Options:
--conf_threshold: Adjust the confidence threshold (default: 10.0) to filter low-confidence points. Lower values show more points, higher values show fewer but more confident points.--mask_black_bg: Filter out black background pixels--mask_white_bg: Filter out white background pixels--save_glb: Export the reconstruction as a GLB file
Thanks to these great repositories: DINOv2, CUT3R, VGGT, DA3, PI3, DVGT, OmniVGGT, FastVGGT, LiteVGGT, SparseWorld-TC, and many other inspiring works in the community.
If you find LiAuto-GeoX useful for your work, please cite:
@article{lian2026geox,
author = {Lian, Jiawei and Sun, Haoyi and Wu, Yang and Mu, Lifu and Wang, Siyuan and Wei, Tao and Hui, Le and Mao, Ning and Zhou, Pan and Zhan, Kun and Yang, Jian},
title = {LiAuto-GeoX: Efficient Grounded Driving Transformer},
journal = {arXiv:2606.05774},
year = {2026},
}