Skip to content

jin-woo-lee/nfs-binaural

Repository files navigation

Neural Fourier Shift for Binaural Speech Rendering

Prepare dataset

Download the binaural speech dataset and unzip it. See WarpNet repo for more details about the data. When unzipped, create a symbolinc link to the directory that contains 'testset' and 'trainset' subdirectories as follows.

rm dataset/benchmark   # remove the existing symbolic link
ls /path/to/binaural_dataset
# testset, trainset
ln -s /path/to/binaural_dataset dataset/benchmark

Install requirements

Install third-party dependencies. This project was built and tested on RTX 2080 with CUDA 11.2.

pip install -r requirements.txt

Off-the-shelf Rendering

If you just want to render your audio into binaural, simply running render.py would definitely be a great choice. Specify the directory path where the mono .wav files. You can either specify the position file in .txt, OR the code will generate a sample position (that will rotate around you in a circular movement). We also release the pre-trained NFS model described in the paper.

python render.py --gpu 0 --ckpt path/to/ckpt/file.pt --root_dir path/to/load/dir --save_dir path/to/save/dir

Training

To train NFS, run main.py with --train argument. You can also run several variants of NFS that appears in the ablation study by passing additional arguments as follows:

python main.py --gpus 0 --train

Inference

You can synthesize binaural audio by running inference.py. Specify the directory path where the mono .wav files and position .txt files for each are located.

python inference.py --gpu 0 --ckpt path/to/ckpt/file.pt --root_dir path/to/load/dir --save_dir path/to/save/dir
python inference.py --gpu 0 --ckpt path/to/ckpt/file.pt --root_dir dataset/benchmark/testset  --save_dir ./benchmark_eval --is_eval_set

Demo

This script creates a video showing exactly what NFS is doing over time.

python make_demo.py --gpu 0 --ckpt path/to/ckpt/file.pt --root_dir dataset/benchmark/testset --is_eval_set --save_dir ./demo

Citation

If you find our work helpful, please cite it as below.

@inproceedings{lee2023neural,
  title={Neural fourier shift for binaural speech rendering},
  author={Lee, Jin Woo and Lee, Kyogu},
  booktitle={ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  pages={1--5},
  year={2023},
  organization={IEEE}
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors