This is the official code repository for the paper "Modality Plug-and-Play: Elastic Modality Adaptation in Multimodal LLMs for Embodied AI". We present mPnP-LLM to enable fully elastic modality adaptation for LLMs via trainable latent connctions. We evaluate the performance of mPnP-LLM on a mini-split of nuScenes-QA dataset with two sensory modalities: RGB camera views and LiDAR point clouds.
Install pytorch first and then install nuscenes-devkit with
pip install nuscenes-devkit
Install all requirements with
pip install -r requirements.txt
There might be some requirements missing in the file. Please refer to the error logs when running our code.
The dataset we used in our experiments is adapted from the nuScenes-QA dataset v1.0. To create the train and validation splits for day and night scenes:
-
Download nuScenes-mini split from nuScenes website
-
Navigate to
nuqamini
folder and create pathnuqamini/dataset/
and move the extracted nuScenes-mini split to it. The correct path of the dataset should look likenuqamini/dataset/v1.0-mini/data/sets/nuscenes/
. Then create a path ofnuqamini/dataset/v1.0-mini/data/sets/range_projection_outputs/
. -
Navigate to
nuqamini
folder and runmini_lidar_dataset_creator.py
to generate range projection of the LiDAR point cloud. -
Navigate to
nuqamini
folder and runnuqamini_dataset_create.ipynb
. Four data splits will be created in Arrow format in the directories:day/train/ day/validation/ night_80dimgaussian7/train/ night_80dimgaussian7/validation/
Alternatively, you could download our processed dataset from huggingface. Check the dataset page here.
We use ViT-small for RGB camera views, which will be automatically downloaded when you run our training code. But we also need a pre-trained RangeViT to perceive the LiDAR inputs. Please download the pre-trained RangeViT here and put the downloaded model file under model/
.
Navigate to example/mpnp_llm/
. We first do offline training with RGB modality on day-train split and evaluate on both day-validation split and night-validation split:
python offline_train.py
Due to low accuracy on night-split, we want to switch to LiDAR modality for better perception:
python switch_lidar.py
Alternatively, we can include both RGB and LiDAR modalities:
python add_lidar.py
Since we generate a relatively small dataset for training and validation, the obtained accuracy may have small variations due to randomness.
@article{huang2023modality,
title={Modality Plug-and-Play: Elastic Modality Adaptation in Multimodal LLMs for Embodied AI},
author={Huang, Kai and Yang, Boyuan and Gao, Wei},
journal={arXiv preprint arXiv:2312.07886},
year={2023}
}