Skip to content
2.5D visual sound
Python
Branch: master
Clone or download
Ruohan Gao
Latest commit 77a9b54 Aug 26, 2019
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
data initial commit Jul 25, 2019
models initial commit Jul 25, 2019
options fix demo bug: input audio length Aug 26, 2019
util initial commit Jul 25, 2019
2.5D_visual_sound.png first commit Jul 19, 2019
CODE_OF_CONDUCT.md first commit Jul 19, 2019
CONTRIBUTING.md first commit Jul 19, 2019
LICENSE first commit Jul 19, 2019
README.md fix demo bug: input audio length Aug 26, 2019
demo.py fix demo bug: input audio length Aug 26, 2019
reEncodeAudio.py initial commit Jul 25, 2019
train.py initial commit Jul 25, 2019

README.md

2.5D Visual Sound

[Project Page] [arXiv] [Video] [Dataset]


2.5D Visual Sound
Ruohan Gao1 and Kristen Grauman2
1UT Austin, 2Facebook AI Research
In Conference on Computer Vision and Pattern Recognition (CVPR), 2019


If you find our code or project useful in your research, please cite:

    @inproceedings{gao2019visualsound,
      title={2.5D Visual Sound},
      author={Gao, Ruohan and Grauman, Kristen},
      booktitle={CVPR},
      year={2019}
    }

FAIR-Play Dataset

The FAIR-Play repository contains the dataset we collected and used in our paper. It contains 1,871 video clips and their corresponding binaural audio clips recorded in a music room. The code provided can be used to train mon2binaural models on this dataset.

Training and Testing

(The code has beed tested under the following system environment: Ubuntu 16.04.6 LTS, CUDA 9.0, Python 2.7.15, PyTorch 1.0.0)

  1. Download the FAIR-Play dataset and prepare the hdf5 splits files accordingly by adding the correct root prefix.

  2. [OPTIONAL] Preprocess the audio files using reEncodeAudio.py to accelerate the training process.

  3. Use the following command to train the mono2binaural model:

python train.py --hdf5FolderPath /YOUR_CODE_PATH/2.5d_visual_sound/hdf5/ --name mono2binaural --model audioVisual --checkpoints_dir /YOUR_CHECKPOINT_PATH/ --save_epoch_freq 50 --display_freq 10 --save_latest_freq 100 --batchSize 256 --learning_rate_decrease_itr 10 --niter 1000 --lr_visual 0.0001 --lr_audio 0.001 --nThreads 32 --gpu_ids 0,1,2,3,4,5,6,7 --validation_on --validation_freq 100 --validation_batches 50 --tensorboard True |& tee -a mono2binaural.log
  1. Use the following command to test your trained mono2binaural model:
python demo.py --input_audio_path /BINAURAL_AUDIO_PATH --video_frame_path /VIDEO_FRAME_PATH --weights_visual /VISUAL_MODEL_PATH --weights_audio /AUDIO_MODEL_PATH --output_dir_root /YOUT_OUTPUT_DIR/ --input_audio_length 10 --hop_size 0.05

Acknowlegements

Portions of the code are adapted from the CycleGAN implementation (https://github.com/junyanz/CycleGAN) and the Sound-of-Pixels implementation (https://github.com/hangzhaomit/Sound-of-Pixels). Please also refer to the original License of these projects.

Licence

The code for 2.5D Visual Sound is CC BY 4.0 licensed, as found in the LICENSE file.

You can’t perform that action at this time.