Skip to content

kaistmm/SSLalignment

Repository files navigation

Sound Source Localization is All About Alignment (ICCV’23)

Official PyTorch implementation of our following papers:

Sound Source Localization is All About Cross-Modal Alignment

Arda Senocak*, Hyeonggon Ryu*, Junsik Kim*, Tae-Hyun Oh, Hanspeter Pfister, Joon Son Chung (* Equal Contribution)

ICCV 2023

Aligning Sight and Sound: Advanced Sound Source Localization Through Audio-Visual Alignment

Arda Senocak*, Hyeonggon Ryu*, Junsik Kim*, Tae-Hyun Oh, Hanspeter Pfister, Joon Son Chung (* Equal Contribution)

arXiV 2024

Index

Overview

Pipeline

Interactive Synthetic Sound Source (IS3) Dataset

is3

IS3 dataset is available here

The IS3 data is organized as follows:

Note that in IS3 dataset, each annotation is saved as a separate file. For example; the sample accordion_baby_10467 image contains two annotations for accordion and baby objects. These annotations are saved as accordion_baby_10467_accordion and accordion_baby_10467_baby for straightforward use. You can always project bounding boxes or segmentation maps onto the original image to see them all at once.

images and audio_waw folders contain all the image and audio files respectively.

IS3_annotation.json file contains ground truth bounding box and category information of each annotation.

gt_segmentation folder contains segmentation maps in binary image format for each annotation. You can query the file name in IS3_annotation.json to get semantic category of each segmentation map.

Environment

Model Checkpoints

The model checkpoints are available for the following experiments:

Training Set Test Set Model Type Performance (cIoU) Checkpoint
VGGSound-144K VGG-SS NN w/ Sup. Pre. Enc. 39.94 Link
VGGSound-144K VGG-SS NN w/ Self-Sup. Pre. Enc. 39.16 Link
VGGSound-144K VGG-SS NN w/ Sup. Pre. Enc. Pre-trained Vision 41.42 Link
Flickr-SoundNet-144K Flickr-SoundNet NN w/ Sup. Pre. Enc. 85.20 Link
Flickr-SoundNet-144K Flickr-SoundNet NN w/ Self-Sup. Pre. Enc. 84.80 Link
Flickr-SoundNet-144K Flickr-SoundNet NN w/ Sup. Pre. Enc. Pre-trained Vision 86.00 Link

Inference

Put checkpoint files into the 'checkpoints' directory:

inference
│
└───checkpoints
│       ours_sup_previs.pth.tar
│       ours_sup.pth.tar
│       ours_selfsup.pth.tar
│   test.py
│   datasets.py
│   model.py

To evaluate a trained model run

python test.py --testset {testset_name} --pth_name {pth_name}
Test Set testset_name
VGG-SS vggss
Flickr-SoundNet flickr
IS3 is3

Evaluate other methods

Simply save the checkpoint files from the methods as '{method_name}_{put_your_own_message}.pth', such as 'ezvsl_flickr.pth'. We have already handled the trivial settings.

Paper title pth_name must contains
Localizing Visual Sounds the Hard Way (CVPR 21) [Paper] lvs
Localizing Visual Sounds the Easy Way (ECCV 22) [Paper] ezvsl
A Closer Look at Weakly-Supervised Audio-Visual Source Localization (NeurIPS 22) [Paper] slavc
Exploiting Transformation Invariance and Equivariance for Self-supervised Sound Localisation (ACMMM 22) [Paper] ssltie
Learning Audio-Visual Source Localization via False Negative Aware Contrastive Learning (CVPR 23) [Paper] fnac

Example

python test.py --testset flickr --pth_name ezvsl_flickr.pth

Training

Training code is coming soon!

Citation

If you find this code useful, please consider giving a star ⭐ and citing us:

@inproceedings{senocak2023sound,
  title={Sound source localization is all about cross-modal alignment},
  author={Senocak, Arda and Ryu, Hyeonggon and Kim, Junsik and Oh, Tae-Hyun and Pfister, Hanspeter and Chung, Joon Son},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={7777--7787},
  year={2023}
}

If you use this dataset, please consider giving a star ⭐ and citing us:

@article{senocak2024align,
  title={Aligning Sight and Sound: Advanced Sound Source Localization Through Audio-Visual Alignment},
  author={Senocak, Arda and Ryu, Hyeonggon and Kim, Junsik and Oh, Tae-Hyun and Pfister, Hanspeter and Chung, Joon Son},
  journal={arXiv preprint arXiv:2407.13676},
  year={2024}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published