GitHub - GitHub30/2.5D-Visual-Sound: 2.5D visual sound

2.5D Visual Sound

[Project Page] [arXiv] [Video] [Dataset]

2.5D Visual Sound
Ruohan Gao¹ and Kristen Grauman²
¹UT Austin, ²Facebook AI Research
In Conference on Computer Vision and Pattern Recognition (CVPR), 2019

If you find our code or project useful in your research, please cite:

    @inproceedings{gao2019visualsound,
      title={2.5D Visual Sound},
      author={Gao, Ruohan and Grauman, Kristen},
      booktitle={CVPR},
      year={2019}
    }

FAIR-Play Dataset

The FAIR-Play repository contains the dataset we collected and used in our paper. It contains 1,871 video clips and their corresponding binaural audio clips recorded in a music room. The code provided can be used to train mon2binaural models on this dataset.

Training and Testing

(The code has beed tested under the following system environment: Ubuntu 16.04.6 LTS, CUDA 9.0, Python 2.7.15, PyTorch 1.0.0)

Download the FAIR-Play dataset and prepare the hdf5 splits files accordingly by adding the correct root prefix.
[OPTIONAL] Preprocess the audio files using reEncodeAudio.py to accelerate the training process.
Use the following command to train the mono2binaural model:

python train.py --hdf5FolderPath /YOUR_CODE_PATH/2.5d_visual_sound/hdf5/ --name mono2binaural --model audioVisual --checkpoints_dir /YOUR_CHECKPOINT_PATH/ --save_epoch_freq 50 --display_freq 10 --save_latest_freq 100 --batchSize 256 --learning_rate_decrease_itr 10 --niter 1000 --lr_visual 0.0001 --lr_audio 0.001 --nThreads 32 --gpu_ids 0,1,2,3,4,5,6,7 --validation_on --validation_freq 100 --validation_batches 50 --tensorboard True |& tee -a mono2binaural.log

Use the following command to test your trained mono2binaural model:

python demo.py --input_audio_path /BINAURAL_AUDIO_PATH --video_frame_path /VIDEO_FRAME_PATH --weights_visual /VISUAL_MODEL_PATH --weights_audio /AUDIO_MODEL_PATH --output_dir_root /YOUT_OUTPUT_DIR/ --hop_size 0.05

Acknowlegements

Portions of the code are adapted from the CycleGAN implementation (https://github.com/junyanz/CycleGAN) and the Sound-of-Pixels implementation (https://github.com/hangzhaomit/Sound-of-Pixels). Please also refer to the original License of these projects.

Licence

The code for 2.5D Visual Sound is CC BY 4.0 licensed, as found in the LICENSE file.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
models		models
options		options
util		util
2.5D_visual_sound.png		2.5D_visual_sound.png
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
demo.py		demo.py
reEncodeAudio.py		reEncodeAudio.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

models

models

options

options

util

util

2.5D_visual_sound.png

2.5D_visual_sound.png

CODE_OF_CONDUCT.md

CODE_OF_CONDUCT.md

CONTRIBUTING.md

CONTRIBUTING.md

LICENSE

LICENSE

README.md

README.md

demo.py

demo.py

reEncodeAudio.py

reEncodeAudio.py

train.py

train.py

Repository files navigation

2.5D Visual Sound

FAIR-Play Dataset

Training and Testing

Acknowlegements

Licence

About

Releases

Packages

Languages

License

GitHub30/2.5D-Visual-Sound

Folders and files

Latest commit

History

Repository files navigation

2.5D Visual Sound

FAIR-Play Dataset

Training and Testing

Acknowlegements

Licence

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Languages