CMTCoop - Cross Modal Transformers for Cooperative perception

This work is based on the work from "Cross Modal Transformer: Towards Fast and Robust 3D Object Detection"

Introduction

CMT is a transformer-based robust 3D detector for end-to-end 3D multi-modal detection. This model is extended to cooperative perception in CMTCoop to perform deep multi-model multi-view feature fusion for 3D object detection. Through extensive, studies this work shows that the proposed model provides a mAP of 97.3% on multi-modal cooperative fusion (+6.2% increase over vehicular perception) and 96.7% on LiDAR only cooperative perception (CMTCoop-L) which runs at near-real time FPS, and a 2.1% performance gain over the current SoTA, BEVFusionCoop.

Preparation

Docker installation

Docker provides an easy way to deal with package dependencies. Use the Dockerfile provided to build the image.

docker build . -t cmt-coop

Then run the image with the following command

nvidia-docker run -it --rm \
    --ipc=host --gpus all \
    -v <Path_to_datasets>:/mnt/datasets \
    -v <Path_to_pretrained_models>:/home/pretrained \
    --name cmt-coop \
    cmt-coop bash

Manual Installation

Create an new environment with Anaconda or venv if required

conda create -n cmt-coop
conda activate cmt-coop

Install the following packages

Python == 3.8
CUDA == 11.1
pytorch == 1.9.1
mmcv-full == 1.6.2
mmdet == 2.28.2
mmsegmentation == 0.30.0
mmdet3d == 1.0.0rc6
spconv-cu111 == 2.1.21
flash-attn == 0.2.2
pypcd
open3d

Note that the repository was tested on the above versions, but may also work with later versions.

Dataset

Follow the mmdet3d to process the nuScenes dataset. This is only required to repeat tests on the CMT model.

The dataset links will be released soon.

Download the TUMTraf Dataset Development Kit and follow the instructions to split the TUMTraf intersection dataset into train and val sets.The TUMTraf cooperative dataset is already split into train and val sets.

${Root}
└── datasets
    ├── tumtraf_intersection_dataset
    |    └── train
    |    └── val
    └── tumtraf_cooperative_dataset
         └── train
         └── val

Finally ensure that the dataset folder has been soft linked to the CMTCoop/data folder.

ln -s /path_to_data_folder CMTCoop/data

Data preparation

The TUMTraf dataset must be converted from Openlabel format to be compatible with mmdet3D framework

TUMTraf Intersection Dataset

Run this script for data preparation:

python ./tools/create_data.py a9_nusc \\
--root-path /home/CMTCoop/data/tumtraf_intersection_dataset \\
--out-dir /home/CMTCoop/data/tumtraf_intersection_processed \\
--splits training,validation

After data preparation, you will be able to see the following directory structure:

├── data
│   ├── tumtraf_intersection_dataset
|   |   ├── train
|   |   ├── val
|   ├── tumtraf_intersection_processed
│   │   ├── a9_nusc_gt_database
|   |   ├── train
|   |   ├── val
│   │   ├── a9_nusc_infos_train.pkl
│   │   ├── a9_nusc_infos_val.pkl
│   │   ├── a9_nusc_dbinfos_train.pkl

TraffiX Cooperative Dataset

Run this script for data preparation:

python ./tools/create_data.py a9coop_nusc \\
--root-path /home/CMTCoop/data/tumtraf_cooperative_dataset \\
--out-dir /home/CMTCoop/data/tumtraf_cooperative_processed \\
--splits training,validation

After data preparation, you will be able to see the following directory structure:

├── data
│   ├── tumtraf_cooperative_dataset
|   |   ├── train
|   |   ├── val
|   ├── tumtraf_cooperative_processed
│   │   ├── a9_nusc_coop_gt_database
|   |   ├── train
|   |   ├── val
│   │   ├── a9_nusc_coop_infos_train.pkl
│   │   ├── a9_nusc_coop_infos_val.pkl
│   │   ├── a9_nusc_coop_dbinfos_train.pkl

Train & inference

# train
bash tools/dist_train.sh /path_to_your_config 8
# inference
bash tools/dist_test.sh /path_to_your_config /path_to_your_pth 8 --eval bbox

Main Results

Results on the TUMTraf cooperative validation set. The FPS is evaluated on a single RTX3080 GPU.

Evaluation Results of CMTCoop model on TUMTraf Cooperative Dataset Test Set

Domain	Modality	mAP_BEV	mAP_3D Easy	mAP_3D Mod.	mAP_3D Hard	mAP_3D Avg.
Vehicle	Camera	69.76	68.76	79.85	66.44	69.30
Vehicle	LiDAR	88.17	87.94	88.53	71.99	84.72
Vehicle	Cam+LiDAR	91.65	84.83	91.32	72.18	85.57
Infra.	Camera	71.89	70.86	80.38	58.72	71.66
Infra.	LiDAR	94.42	91.28	95.60	77.48	91.89
Infra.	Camera + LiDAR	96.09	91.94	95.15	82.35	92.16
Coop.	Camera	84.07	81.03	90.05	77.94	83.43
Coop.	LiDAR	96.68	92.18	96.77	82.20	93.43
Coop.	Camera + LiDAR	97.31	93.70	96.65	79.84	94.10

Evaluation Results of Infrastructure-only models on TUMTraf Intersection Dataset Test Set

Model	FOV	Modality	mAP_3D Easy	mAP_3D Mod.	mAP_3D Hard	mAP_3D Avg.
InfraDet3D	South 1	LiDAR	75.81	47.66	42.16	55.21
BEVFusionCoop	South 1	LiDAR	76.24	48.23	35.19	69.47
CMTCoop	South 1	LiDAR	80.62	64.46	50.41	72.68
InfraDet3D	South 2	LiDAR	38.92	46.60	43.86	43.13
BEVFusionCoop	South 2	LiDAR	74.97	55.55	39.96	69.94
CMTCoop	South 2	LiDAR	79.34	60.81	45.53	70.31
InfraDet3D	South 1	Camera + LiDAR	67.08	31.38	35.17	44.55
BEVFusionCoop	South 1	Camera + LiDAR	75.68	45.63	45.63	66.75
CMTCoop	South 1	Cam+LiDAR	80.86	61.37	45.32	70.65
InfraDet3D	South 2	Camera + LiDAR	58.38	19.73	33.08	37.06
BEVFusionCoop	South 2	Camera + LiDAR	74.73	53.46	41.96	66.89
CMTCoop	South 2	Cam+LiDAR	78.92	52.67	39.76	67.21

Visualization

Performance of Vehicular only model (CMT) from infrastructure perspective (left) and vehicular perspective (right)

Performance of Cooperative model (CMTCoop - left) vs. Vehicular only model (CMT - right) from infrastructure perspective.

Resource

Refer the following links for other resources related to this project:

Citation

Please consider citing the original work on CMT if you find this work helpful.

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
docs		docs
figs		figs
projects		projects
tools		tools
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md

License

suren3141/CMT-Cooperative-Perception

Folders and files

Latest commit

History

Repository files navigation