VIG Dataset

This repository includes Visually Indicated sound Generation (VIG) dataset mentioned in Visually Indicated Sound Generation by Perceptually Optimized Classification (Best paper in ECCV MULA workshop 2018).

Introduction

Visually Indicated Sound Generation aims to predict visually consistent sound from the video content. Previous methods in Visually Indicated Sounds addressed this problem by creating a single generative model that ignores the distinctive characteristics of various sound categories. Nowadays, state-of-the-art sound classification networks are available to capture semantic-level information in audio modality, which can also serve for the purposeof visually indicated sound generation.

Framework

We explore generating fine-grained sound from a variety of sound classes, and leverage pre-trained sound classification networks to improve the audio generation quality. We propose a novel Perceptually Optimized Classification based Audio generation Network (POCAN), which generates sound conditioned on the sound class predicted from visual information. Additionally, a perceptual loss is calculated via a pre-trained sound classification network to align the semantic information between the generated sound and its ground truth during training. The framework of POCAN is shown below.

VIG dataset download

Data processing is based on Python 2.7

We provide the Youtube ID for each video in the file vig_dl.lst. You may use tools like youtube-dl to download these videos (A sample download script download_vig.py is provided in this repository). In file vig_dl.lst, each youtube video is mapped to a file ID in each line. The files vig_train.lst and vig_test.lst specify the training and test videos by these file IDs respectively. For annotation files, we provide start time (key name start_time), end time (key name end_time) and sound class label (key name vig_label) in the file vig_annotation.pkl. The sound class is mapped to a class ID in the annotation file. The map between class name and class ID is provided in the file vig_class_map.pkl.

Some demo video clips as well as sound waveform and spectrogram are shown in the figure below.

Performance of POCAN on VIG

We choose the recall at top K (R@K) as the metric for retrieving sound in the test set in VIG. The performance of POCAN is listed in the table below.

Model	K = 1	K = 5	K = 10
Owens et al [1]	0.0997	0.2888	0.4640
POCAN	0.1223	0.3625	0.4802

More details can be found in the paper.

Citation

If you find the repository is useful for your research, please consider citing the following work:

@inproceedings{chen2018visually,
  title={Visually Indicated Sound Generation by Perceptually Optimized Classification},
  author={Chen*, Kan and Zhang*, Chuanxi and Fang, Chen and Wang, Zhaowen and Bui, Trung and Nevatia, Ram}
  booktitle={ECCV MULA Workshop},
  year={2018}
}

Reference

[1] Owens, Andrew, Phillip Isola, Josh McDermott, Antonio Torralba, Edward H. Adelson, and William T. Freeman. "Visually indicated sounds." In CVPR. 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

img

img

LICENSE

LICENSE

README.md

README.md

download_vig.py

download_vig.py

vig_annotation.pkl

vig_annotation.pkl

vig_class_map.pkl

vig_class_map.pkl

vig_dl.lst

vig_dl.lst

vig_test.lst

vig_test.lst

vig_train.lst

vig_train.lst

Repository files navigation

VIG Dataset

Introduction

Framework

VIG dataset download

Performance of POCAN on VIG

Citation

Reference

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
img		img
LICENSE		LICENSE
README.md		README.md
download_vig.py		download_vig.py
vig_annotation.pkl		vig_annotation.pkl
vig_class_map.pkl		vig_class_map.pkl
vig_dl.lst		vig_dl.lst
vig_test.lst		vig_test.lst
vig_train.lst		vig_train.lst

License

kanchen-usc/VIG

Folders and files

Latest commit

History

Repository files navigation

VIG Dataset

Introduction

Framework

VIG dataset download

Performance of POCAN on VIG

Citation

Reference

About

Topics

Resources

License

Stars

Watchers

Forks

Languages