Skip to content

SixCorePeach/MCAM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MCAM

This code is for our paper named MCAM: Multimodal Causal Analysis Model for Ego-Vehicle-Level Driving Video Understanding.

Rocket Iconthe paper has been accept in ICCV 2025.

<video width="100%" controls autoplay muted loop>
  <source src="https://github.com/SixCorePeach/MCAM/raw/main/poster/the%20visual%20result%20of%20MCAM.mp4" type="video/mp4">
</video>

This Work is based on the swin-video-transformer and ADAPT. And the CAM is inspired from LLCP, Thanks for them superior work, the cite is as following.

@article{liu2021video,
  title={Video Swin Transformer},
  author={Liu, Ze and Ning, Jia and Cao, Yue and Wei, Yixuan and Zhang, Zheng and Lin, Stephen and Hu, Han},
  journal={arXiv preprint arXiv:2106.13230},
  year={2021}
}
@article{jin2023adapt,
  title={ADAPT: Action-aware Driving Caption Transformer},
  author={Jin, Bu and Liu, Xinyu and Zheng, Yupeng and Li, Pengfei and Zhao, Hao and Zhang, Tong and Zheng, Yuhang and Zhou, Guyue and Liu, Jingjing},
  journal={arXiv preprint arXiv:2302.00673},
  year={2023}
}
@inproceedings{chen2024llcp,
  title={LLCP: Learning Latent Causal Processes for Reasoning-based Video Question Answer},
  author={Chen, Guangyi and Li, Yuke and Liu, Xiao and Li, Zijian and Al Suradi, Eman and Wei, Donglai and Zhang, Kun},
  booktitle={ICLR},
  year={2024}
}

our environment setting is likely, as following.

First, we need install the anoconda and pytorch.

conda create --name MCAM python=3.8
conda activate MCAM

Install Pytorch torch版本可以按照自己的设备进行调整,只要符合torch本身的架构要求就好

pip install torch==1.13.1+cu117 torchaudio==0.13.1+cu117 torchvision==0.14.1+cu117 -f https://download.pytorch.org/whl/torch_stable.html

Install apex 可以选择手动下载 apex的zip包,然后解压到指定文件夹下

#git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --no-cache-dir --no-build-isolation --global-option="--cpp_ext" --global-option="--cuda_ext" --global-option="--deprecated_fused_adam" --global-option="--xentropy" --global-option="--fast_multihead_attn" ./

Install mpi4py 安装必要的包

conda install -c conda-forge mpi4py openmpi

安装其他的依赖项,依赖项中有不符合的,在运行过程中,缺少的部分 使用 pip install --对应名字 即可

pip install -r requirements.txt

这里给出文件夹的分布情况

${REPO_DIR}
|-- checkpoints
|-- datasets  
|   |-- BDDX
|   |   |-- frame_tsv
|   |   |-- captions_BDDX.json
|   |   |-- training_32frames_caption_coco_format.json
|   |   |-- training_32frames.yaml
|   |   |-- training.caption.lineidx
|   |   |-- training.caption.lineidx.8b
|   |   |-- training.caption.linelist.tsv
|   |   |-- training.caption.tsv
|   |   |-- training.img.lineidx
|   |   |-- training.img.lineidx.8b
|   |   |-- training.img.tsv
|   |   |-- training.label.lineidx
|   |   |-- training.label.lineidx.8b
|   |   |-- training.label.tsv
|   |   |-- training.linelist.lineidx
|   |   |-- training.linelist.lineidx.8b
|   |   |-- training.linelist.tsv
|   |   |-- validation...
|   |   |-- ...
|   |   |-- validation...
|   |   |-- testing...
|   |   |-- ...
|   |   |-- testing...
|-- datasets_part
|-- docs
|-- models
|   |-- basemodel
|   |-- captioning
|   |-- video_swin_transformer
|-- scripts 
|-- src
|-- README.md 
|-- ... 
|-- ... 

由于项目中还用到了CoVLA数据集,这里也给出相应的文件分布,大同小异

|-- dataset
|   |-- frame_tsv
|   |-- captions_BDDX.json
|   |-- training_32frames_caption_coco_format.json
|   |-- training_32frames.yaml
|   |-- training.caption.lineidx
|   |-- training.caption.lineidx.8b
|   |-- training.caption.linelist.tsv
|   |-- training.caption.tsv
|   |-- training.img.lineidx
|   |-- training.img.lineidx.8b
|   |-- training.img.tsv
|   |-- training.label.lineidx
|   |-- training.label.lineidx.8b
|   |-- training.label.tsv
|   |-- training.linelist.lineidx
|   |-- training.linelist.lineidx.8b
|   |-- training.linelist.tsv
|   |-- validation...
|   |-- ...
|   |-- validation...
|   |-- testing...
|   |-- ...
|   |-- testing...
|-- models
|-- output
|-- readme.md
|-- scripts
|   |-- CoVLA_adapt_caption.sh  CoVLA_only_caption.sh  other_scripts
|-- src
|   |-- configs
|   |-- datasets
|   |-- evalcap
|   |-- layers
|   |-- modeling
|   |-- prepro
|   |-- pytorch_grad_cam
|   |-- solver
|   |-- tags
|   |-- tasks
|   |-- timm
|   |-- utils

在准备好上述文件之后,只需要设置好script/xxx.bash 中的内容,执行以下指令就好:

bash xxx.bash

xxx.bash 内容可以是如下:

#CUDA_VISIBLE_DEVICES=4,5,6,7 \
#NCCL_P2P_DISABLE=1 \
#OMPI_COMM_WORLD_SIZE="4" \
#python -m torch.distributed.launch --nproc_per_node=4 --nnodes=1 --node_rank=0 --master_port=45978 src/tasks/run_adapt.py \
CUDA_VISIBLE_DEVICES=0 \
python src/tasks/run_MCAM.py\
        --config src/configs/VidSwinBert/BDDX_multi_default.json \
        --train_yaml BDDX/training_32frames.yaml \
        --val_yaml BDDX/testing_32frames.yaml \
        --per_gpu_train_batch_size 16\
        --per_gpu_eval_batch_size 16 \
        --num_train_epochs 40 \
        --learning_rate 0.0002 \
        --max_num_frames 32 \
        --pretrained_2d 0 \
        --backbone_coef_lr 0.05 \
        --mask_prob 0.5 \
        --max_masked_token 45 \
        --zero_opt_stage 1 \
        --mixed_precision_method deepspeed \
        --deepspeed_fp16 \
        --gradient_accumulation_steps 4 \
        --learn_mask_enabled \
        --loss_sparse_w 0.1 \
        --use_sep_cap \
        --multitask \
        --signal_types course speed \
        --loss_sensor_w 0.05 \
        --max_grad_norm 1 \
        --output_dir ./output/multitask/sensor_course_speed/MCAM/

We will upload our code one by one, due to the file which could not be move and adjust online. if this work could give you some help, please cite:

@InProceedings{Cheng_2025_ICCV,
    author    = {Cheng, Tongtong and Li, Rongzhen and Xiong, Yixin and Zhang, Tao and Wang, Jing and Liu, Kai},
    title     = {MCAM: Multimodal Causal Analysis Model for Ego-Vehicle-Level Driving Video Understanding},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2025},
    pages     = {5479-5489}
}

About

This code is for our paper MCAM.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors