Codes of Enhancing Multimodal Cooperation via Sample-level Modality Valuation

Here is the official PyTorch implementation of ''Enhancing Multimodal Cooperation via Sample-level Modality Valuation'', which aims to balance the uni-modal contribution during joint multimodal training by re-sample strategy. Please refer to our CVPR 2024 paper for more details.

Paper Title: "Enhancing multimodal Cooperation via Sample-level Modality Valuation"

Authors: Yake Wei, Ruoxuan Feng, Zihe Wang and Di Hu

Accepted by: IEEE Conference on Computer Vision and Pattern Recognition(CVPR 2024)

Challenge of sample-level modality discrepancy

The imbalanced multimodal learning problem, where most existing models cannot jointly utilize all modalities well Some methods, has raised lots of attention. However, the former methods only consider the global modality discrepancy at dataset-level, and achieve improvement on common curated dataset (as Kinetics Sounds dataset in Figure 1).

But under realistic scenarios, the modality discrepancy could vary across different samples. For example, Figure 2 (a) and (b) show two audio-visual samples of motorcycling category. The motorcycle in Sample 1 is hard to observe while the wheel of motorcycle in Sample 2 is quite clear. This could make audio or visual modality contribute more to the final prediction respectively for these two samples. This fine-grained modality discrepancy is hard to perceive by existing methods. Hence, how to reasonably observe and improve multimodal cooperation at sample-level is still expected to be resolved. To highlight this sample-level modality discrepancy, we propose the global balanced MM-Debiased dataset where the dataset-level modality discrepancy is no longer significant (as Figure 2 (d)). Not surprisingly, existing imbalanced multimodal learning methods which only consider dataset-level discrepancy fail on MM-Debiased dataset, as shown in Figure 1.

In this paper, we introduce a sample-level modality valuation metric, to observe the contribution of each modality during prediction for each sample. Then, we propose the fine-grained as well as effective sample-level re-sample method and the coarse but efficient modality-level re-sample method. As Figure 1, our methods considering the sample-level modality discrepancy achieves considerable improvement on both existing curated and global balanced dataset.

Code instruction

Data Preparation

Public datasets

The original datasets we used can be found in： Kinetics-Sounds, UCF101.

MM-Debiased dataset

For the proposed MM-Debiased dataset, the json files of data samples are here.

Samples of MM-Debiased dataset are selected from VGG-Sound and Kinetics-400 datasets.

This is one example data sample from VGGSound dataset:


"RTWs-Y_usjs_000017_000027": {"subset": "validation", "label": "motorcycling"}
# "RTWs-Y_usjs_000017_000027": sample id of VGGSound

This is one example data sample from Kinetics-400 dataset:


"ZUJ5LJGX9oc_20": {"subset": "validation", "label": "lawn mowing"}
# "ZUJ5LJGX9oc_20": sample id of Kinetics-400

Run

You can simply run the code using:


python code/baseline.py  # Joint training baseline


python code/sample_level.py  # Sample-level method


python code/modality_level.py  # Modality-level method

Citation

If you find this work useful, please consider citing it.


@inproceedings{wei2024enhancing,
  title={Enhancing multimodal Cooperation via Sample-level Modality Valuation},
  author={Wei, Yake and Feng, Ruoxuan and Wang, Zihe and Hu, Di},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
MM-Debiased		MM-Debiased
code		code
pics		pics
.DS_Store		.DS_Store
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MM-Debiased

MM-Debiased

code

code

pics

pics

.DS_Store

.DS_Store

README.md

README.md

Repository files navigation

Codes of Enhancing Multimodal Cooperation via Sample-level Modality Valuation

Challenge of sample-level modality discrepancy

Code instruction

Data Preparation

Public datasets

MM-Debiased dataset

Run

Citation

About

Releases

Packages

Languages

GeWu-Lab/Valuate-and-Enhance-Multimodal-Cooperation

Folders and files

Latest commit

History

Repository files navigation

Codes of Enhancing Multimodal Cooperation via Sample-level Modality Valuation

Challenge of sample-level modality discrepancy

Code instruction

Data Preparation

Public datasets

MM-Debiased dataset

Run

Citation

About

Resources

Stars

Watchers

Forks

Languages