The Sound Localization dataset can be downloaded from the following link:
https://drive.google.com/open?id=1P93CTiQV71YLZCmBbZA0FvdwFxreydLt
This dataset contains 5k image-sound pairs and their annotations in XML format. Each XML file has annotations of 3 annotators.
test_list.txt file includes the id of every pair that is used for testing.
If you end up using the dataset, we ask you to cite the following paper:
@InProceedings{Senocak_2018_CVPR,
author = {Senocak, Arda and Oh, Tae-Hyun and Kim, Junsik and Yang, Ming-Hsuan and So Kweon, In},
title = {Learning to Localize Sound Source in Visual Scenes},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2018}
}
Image-sound pairs are collected by using the Flickr-SoundNet dataset. Thus, please cite the Yahoo dataset the Yahoo dataset and SoundNet paper as well.
The dataset must be used for research purposes only.