MCAM

This code is for our paper named MCAM: Multimodal Causal Analysis Model for Ego-Vehicle-Level Driving Video Understanding.

the paper has been accept in ICCV 2025.

<video width="100%" controls autoplay muted loop>
  <source src="https://github.com/SixCorePeach/MCAM/raw/main/poster/the%20visual%20result%20of%20MCAM.mp4" type="video/mp4">
</video>

This Work is based on the swin-video-transformer and ADAPT. And the CAM is inspired from LLCP, Thanks for them superior work, the cite is as following.

@article{liu2021video,
  title={Video Swin Transformer},
  author={Liu, Ze and Ning, Jia and Cao, Yue and Wei, Yixuan and Zhang, Zheng and Lin, Stephen and Hu, Han},
  journal={arXiv preprint arXiv:2106.13230},
  year={2021}
}
@article{jin2023adapt,
  title={ADAPT: Action-aware Driving Caption Transformer},
  author={Jin, Bu and Liu, Xinyu and Zheng, Yupeng and Li, Pengfei and Zhao, Hao and Zhang, Tong and Zheng, Yuhang and Zhou, Guyue and Liu, Jingjing},
  journal={arXiv preprint arXiv:2302.00673},
  year={2023}
}
@inproceedings{chen2024llcp,
  title={LLCP: Learning Latent Causal Processes for Reasoning-based Video Question Answer},
  author={Chen, Guangyi and Li, Yuke and Liu, Xiao and Li, Zijian and Al Suradi, Eman and Wei, Donglai and Zhang, Kun},
  booktitle={ICLR},
  year={2024}
}

our environment setting is likely, as following.

First, we need install the anoconda and pytorch.

conda create --name MCAM python=3.8

conda activate MCAM

Install Pytorch torch版本可以按照自己的设备进行调整，只要符合torch本身的架构要求就好

pip install torch==1.13.1+cu117 torchaudio==0.13.1+cu117 torchvision==0.14.1+cu117 -f https://download.pytorch.org/whl/torch_stable.html

Install apex 可以选择手动下载 apex的zip包，然后解压到指定文件夹下

#git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --no-cache-dir --no-build-isolation --global-option="--cpp_ext" --global-option="--cuda_ext" --global-option="--deprecated_fused_adam" --global-option="--xentropy" --global-option="--fast_multihead_attn" ./

Install mpi4py 安装必要的包

conda install -c conda-forge mpi4py openmpi

安装其他的依赖项，依赖项中有不符合的，在运行过程中，缺少的部分使用 pip install --对应名字即可

pip install -r requirements.txt

这里给出文件夹的分布情况

${REPO_DIR}
|-- checkpoints
|-- datasets  
|   |-- BDDX
|   |   |-- frame_tsv
|   |   |-- captions_BDDX.json
|   |   |-- training_32frames_caption_coco_format.json
|   |   |-- training_32frames.yaml
|   |   |-- training.caption.lineidx
|   |   |-- training.caption.lineidx.8b
|   |   |-- training.caption.linelist.tsv
|   |   |-- training.caption.tsv
|   |   |-- training.img.lineidx
|   |   |-- training.img.lineidx.8b
|   |   |-- training.img.tsv
|   |   |-- training.label.lineidx
|   |   |-- training.label.lineidx.8b
|   |   |-- training.label.tsv
|   |   |-- training.linelist.lineidx
|   |   |-- training.linelist.lineidx.8b
|   |   |-- training.linelist.tsv
|   |   |-- validation...
|   |   |-- ...
|   |   |-- validation...
|   |   |-- testing...
|   |   |-- ...
|   |   |-- testing...
|-- datasets_part
|-- docs
|-- models
|   |-- basemodel
|   |-- captioning
|   |-- video_swin_transformer
|-- scripts 
|-- src
|-- README.md 
|-- ... 
|-- ...

由于项目中还用到了CoVLA数据集，这里也给出相应的文件分布，大同小异

|-- dataset
|   |-- frame_tsv
|   |-- captions_BDDX.json
|   |-- training_32frames_caption_coco_format.json
|   |-- training_32frames.yaml
|   |-- training.caption.lineidx
|   |-- training.caption.lineidx.8b
|   |-- training.caption.linelist.tsv
|   |-- training.caption.tsv
|   |-- training.img.lineidx
|   |-- training.img.lineidx.8b
|   |-- training.img.tsv
|   |-- training.label.lineidx
|   |-- training.label.lineidx.8b
|   |-- training.label.tsv
|   |-- training.linelist.lineidx
|   |-- training.linelist.lineidx.8b
|   |-- training.linelist.tsv
|   |-- validation...
|   |-- ...
|   |-- validation...
|   |-- testing...
|   |-- ...
|   |-- testing...
|-- models
|-- output
|-- readme.md
|-- scripts
|   |-- CoVLA_adapt_caption.sh  CoVLA_only_caption.sh  other_scripts
|-- src
|   |-- configs
|   |-- datasets
|   |-- evalcap
|   |-- layers
|   |-- modeling
|   |-- prepro
|   |-- pytorch_grad_cam
|   |-- solver
|   |-- tags
|   |-- tasks
|   |-- timm
|   |-- utils

在准备好上述文件之后，只需要设置好script/xxx.bash 中的内容，执行以下指令就好:

bash xxx.bash

xxx.bash 内容可以是如下:

#CUDA_VISIBLE_DEVICES=4,5,6,7 \
#NCCL_P2P_DISABLE=1 \
#OMPI_COMM_WORLD_SIZE="4" \
#python -m torch.distributed.launch --nproc_per_node=4 --nnodes=1 --node_rank=0 --master_port=45978 src/tasks/run_adapt.py \
CUDA_VISIBLE_DEVICES=0 \
python src/tasks/run_MCAM.py\
        --config src/configs/VidSwinBert/BDDX_multi_default.json \
        --train_yaml BDDX/training_32frames.yaml \
        --val_yaml BDDX/testing_32frames.yaml \
        --per_gpu_train_batch_size 16\
        --per_gpu_eval_batch_size 16 \
        --num_train_epochs 40 \
        --learning_rate 0.0002 \
        --max_num_frames 32 \
        --pretrained_2d 0 \
        --backbone_coef_lr 0.05 \
        --mask_prob 0.5 \
        --max_masked_token 45 \
        --zero_opt_stage 1 \
        --mixed_precision_method deepspeed \
        --deepspeed_fp16 \
        --gradient_accumulation_steps 4 \
        --learn_mask_enabled \
        --loss_sparse_w 0.1 \
        --use_sep_cap \
        --multitask \
        --signal_types course speed \
        --loss_sensor_w 0.05 \
        --max_grad_norm 1 \
        --output_dir ./output/multitask/sensor_course_speed/MCAM/

We will upload our code one by one, due to the file which could not be move and adjust online. if this work could give you some help, please cite:

@InProceedings{Cheng_2025_ICCV,
    author    = {Cheng, Tongtong and Li, Rongzhen and Xiong, Yixin and Zhang, Tao and Wang, Jing and Liu, Kai},
    title     = {MCAM: Multimodal Causal Analysis Model for Ego-Vehicle-Level Driving Video Understanding},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2025},
    pages     = {5479-5489}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MCAM

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
configs		configs
datasets		datasets
layers/bert		layers/bert
modeling		modeling
poster		poster
scripts		scripts
solver		solver
utils		utils
README.md		README.md
multienco_cause_attention.py		multienco_cause_attention.py
readme_src		readme_src
src		src

Folders and files

Latest commit

History

Repository files navigation

MCAM

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages