Skip to content

roger-tseng/av-superb

Repository files navigation

AV-SUPERB

Paper, Submission leaderboard

ICASSP 2024

Plug and play pretrained audio-visual models

See extract_feats.py for feature extraction examples. We currently support the following models:

We also include handcrafted features to serve as baselines. Pull requests are welcome for adding more models.

Model Evaluation

1. Using our toolkit:

Installation:

conda create -n av python=3.9 -y
conda activate av
pip install -r requirements.txt

Downstream Task Evaluation:

python run_downstream.py -m train \
  -u <upstream model name> \
  -d <downstream task name> \
  -s <feature type> \
  --pooled_features_path <path to save features>

2. Using our submission platform:

Researchers can also submit model code and weights to our submission platform to easily evaluate on the AV-SUPERB benchmark.

We expect two Python files to be submitted, expert.py, which implements the model forward pass and preprocessing functions for each of the two modalities, and hubconf.py, which downloads model weights.

Please refer to this example model and the submission platform for more details.

Citation

@article{tseng2023avsuperb,
  title={AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models},
  author={Yuan Tseng and Layne Berry and Yi-Ting Chen and I-Hsiang Chiu and Hsuan-Hao Lin and Max Liu and Puyuan Peng and Yi-Jen Shih and Hung-Yu Wang and Haibin Wu and Po-Yao Huang and Chun-Mao Lai and Shang-Wen Li and David Harwath and Yu Tsao and Shinji Watanabe and Abdelrahman Mohamed and Chi-Luen Feng and Hung-yi Lee},
  journal={arXiv preprint arXiv:2309.10787},
  year={2023}
}

License

AV-SUPERB is primarily distributed under the terms of both the MIT license and the Apache License (Version 2.0).

Using files and pretrained AV-HuBERT models under the upstream_models/vhubert folder requires accepting the terms in the AV-HuBERT license agreement listed in this file.

See LICENSE-APACHE, LICENSE-MIT, COPYRIGHT for details.

Acknowledgement

Source code is based on S3PRL.

About

A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models (ICASSP 2024)

Topics

Resources

License

Apache-2.0, MIT licenses found

Licenses found

Apache-2.0
LICENSE-APACHE
MIT
LICENSE-MIT

Stars

Watchers

Forks

Packages

No packages published