Skip to content

marmot-xy/CMBS

Repository files navigation

Cross Modal Background Suppression for Audio-Visual Event Localization

This is a pytorch implementation for CVPR 2022 paper "Cross Modal Background Suppression for Audio-Visual Event Localization".

Introduction

We are concerned about an important problem: audio-visual event localization, which requires the model to recognize the event category and localize the event boundary when the event is both audible and visible at the same time.

Unlike previous methods, we consider the problem of audio-visual event localization from the viewpoint of cross-modal background suppression. We first define the "background" category from two aspects: 1) If the audio and visual information in the small video segment do not represent the same event, then the video segment will be labeled as background. 2) If an event only occurs in one modality but has a low probability in another, then this event category will be labeled as background in this video, i.e., offscreen voice.

Hence, this paper proposes a novel cross-modal background suppression method considering two aspects: time-level and event-level, which allow the audio and visual modalities to serve as the supervisory signals complementing each other to solve the AVE task problems.

AVE

Prerequisites

This package has the following requirements:

  • Python 3.7.6
  • Pytorch 1.10.2
  • CUDA 11.4
  • h5py 2.10.0
  • numpy 1.21.5

Data preparation

The VGG visual features can be downloaded from Visual_feature.

The VGG-like audio features can be downloaded from Audio_feature.

The noisy visual features used for weakly-supervised setting can be downloaded from Noisy_visual_feature.

After downloading the features, please place them into the data folder.

If you are interested in the AVE raw videos, please refer to this repo and download the AVE dataset.

Training and Evaluating CMBS

Fully-Supervised Setting

The configs/main.json contains the main hyper-parameters used for fully-supervised training.

Training

bash supv_train.sh

Evaluating

bash supv_test.sh

Weakly-Supervised Setting

The configs/weak.json contains the main hyper-parameters used for weakly-supervised training.

Training

bash weak_train.sh

Evaluating

bash weak_test.sh

Pretrained model

The pretrained models can be downloaded from Supervised model and WeaklySupervised model.

After downloading the pretrained models, please place them into the Exps folder.

You can try different parameters or random seeds if you want to retrain the model, the results may be better.

Acknowledgement

Part of our code is borrowed from the following repositories.

We thank to the authors for releasing their codes. Please also consider citing their works.

About

cross modal background suppression for audio-visual event localization

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published