Skip to content

GitHub30/2.5D-Visual-Sound

 
 

Repository files navigation

2.5D Visual Sound

[Project Page] [arXiv] [Video] [Dataset]


2.5D Visual Sound
Ruohan Gao1 and Kristen Grauman2
1UT Austin, 2Facebook AI Research
In Conference on Computer Vision and Pattern Recognition (CVPR), 2019


If you find our code or project useful in your research, please cite:

    @inproceedings{gao2019visualsound,
      title={2.5D Visual Sound},
      author={Gao, Ruohan and Grauman, Kristen},
      booktitle={CVPR},
      year={2019}
    }

FAIR-Play Dataset

The FAIR-Play repository contains the dataset we collected and used in our paper. It contains 1,871 video clips and their corresponding binaural audio clips recorded in a music room. The code provided can be used to train mon2binaural models on this dataset.

Training and Testing

(The code has beed tested under the following system environment: Ubuntu 16.04.6 LTS, CUDA 9.0, Python 2.7.15, PyTorch 1.0.0)

  1. Download the FAIR-Play dataset and prepare the hdf5 splits files accordingly by adding the correct root prefix.

  2. [OPTIONAL] Preprocess the audio files using reEncodeAudio.py to accelerate the training process.

  3. Use the following command to train the mono2binaural model:

python train.py --hdf5FolderPath /YOUR_CODE_PATH/2.5d_visual_sound/hdf5/ --name mono2binaural --model audioVisual --checkpoints_dir /YOUR_CHECKPOINT_PATH/ --save_epoch_freq 50 --display_freq 10 --save_latest_freq 100 --batchSize 256 --learning_rate_decrease_itr 10 --niter 1000 --lr_visual 0.0001 --lr_audio 0.001 --nThreads 32 --gpu_ids 0,1,2,3,4,5,6,7 --validation_on --validation_freq 100 --validation_batches 50 --tensorboard True |& tee -a mono2binaural.log
  1. Use the following command to test your trained mono2binaural model:
python demo.py --input_audio_path /BINAURAL_AUDIO_PATH --video_frame_path /VIDEO_FRAME_PATH --weights_visual /VISUAL_MODEL_PATH --weights_audio /AUDIO_MODEL_PATH --output_dir_root /YOUT_OUTPUT_DIR/ --hop_size 0.05

Acknowlegements

Portions of the code are adapted from the CycleGAN implementation (https://github.com/junyanz/CycleGAN) and the Sound-of-Pixels implementation (https://github.com/hangzhaomit/Sound-of-Pixels). Please also refer to the original License of these projects.

Licence

The code for 2.5D Visual Sound is CC BY 4.0 licensed, as found in the LICENSE file.

About

2.5D visual sound

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%