- This repository provides overall framework for training and evaluating Neural Fourier Shift (NFS) proposed in Neural Fourier Shift for Binaural Speech Rendering (ICASSP 2023)
Download the binaural speech dataset and unzip it. See WarpNet repo for more details about the data. When unzipped, create a symbolinc link to the directory that contains 'testset' and 'trainset' subdirectories as follows.
rm dataset/benchmark # remove the existing symbolic link
ls /path/to/binaural_dataset
# testset, trainset
ln -s /path/to/binaural_dataset dataset/benchmarkInstall third-party dependencies. This project was built and tested on RTX 2080 with CUDA 11.2.
pip install -r requirements.txtIf you just want to render your audio into binaural, simply running render.py would definitely be a great choice. Specify the directory path where the mono .wav files. You can either specify the position file in .txt, OR the code will generate a sample position (that will rotate around you in a circular movement). We also release the pre-trained NFS model described in the paper.
python render.py --gpu 0 --ckpt path/to/ckpt/file.pt --root_dir path/to/load/dir --save_dir path/to/save/dirTo train NFS, run main.py with --train argument. You can also run several variants of NFS that appears in the ablation study by passing additional arguments as follows:
python main.py --gpus 0 --trainYou can synthesize binaural audio by running inference.py. Specify the directory path where the mono .wav files and position .txt files for each are located.
python inference.py --gpu 0 --ckpt path/to/ckpt/file.pt --root_dir path/to/load/dir --save_dir path/to/save/dir
python inference.py --gpu 0 --ckpt path/to/ckpt/file.pt --root_dir dataset/benchmark/testset --save_dir ./benchmark_eval --is_eval_setThis script creates a video showing exactly what NFS is doing over time.
python make_demo.py --gpu 0 --ckpt path/to/ckpt/file.pt --root_dir dataset/benchmark/testset --is_eval_set --save_dir ./demoIf you find our work helpful, please cite it as below.
@inproceedings{lee2023neural,
title={Neural fourier shift for binaural speech rendering},
author={Lee, Jin Woo and Lee, Kyogu},
booktitle={ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
pages={1--5},
year={2023},
organization={IEEE}
}