MiniDrive: More Efficient Vision-Language Models with Multi-Level 2D Features as Text Tokens for Autonomous Driving
📑 arxiv link : https://arxiv.org/pdf/2409.07267
We are preparing for open-source release.
To cite our work, please use the following BibTeX entry:
@article{zhang2024minidrive,
title={MiniDrive: More Efficient Vision-Language Models with Multi-Level 2D Features as Text Tokens for Autonomous Driving},
author={Zhang, Enming and Dai, Xingyuan and Lv, Yisheng and Miao, Qinghai},
journal={arXiv preprint arXiv:2409.07267},
year={2024}
}