Skip to content

We present a model that can generate accurate 3D sound fields of human bodies from headset microphones and body pose as inputs.

License

Notifications You must be signed in to change notification settings

facebookresearch/SoundingBodies

Repository files navigation

SoundingBodies: Modeling 3D Spatial Sound of Humans Using Body Pose and Audio

Xudong Xu · Dejan Marković · Jacob Sandakly · Todd Keebler · Steven Krenn · Alexander Richard

Advances in Neural Information Processing Systems 36 (NeurIPS 2023)

Paper PDF

Supplemental Video

supplemental_video.mp4

Data

The Sounding Bodies dataset is hosted on AWS S3. We recommend using the AWS command line interface (see AWS CLI installation instructions).

To download the dataset run:

aws s3 cp --recursive --no-sign-request s3://fb-baas-f32eacb9-8abb-11eb-b2b8-4857dd089e15/SoundingBodies/ SoundingBodies/

or use sync to avoid transferring existing files:

aws s3 sync --no-sign-request s3://fb-baas-f32eacb9-8abb-11eb-b2b8-4857dd089e15/SoundingBodies/ SoundingBodies/

The dataset takes around 680GB of space. If necessary, in configs/config_main.py adjust data_dir and mic_loc_file to point to your download location.

NOTE: The published datased does not include speech data from subject7 and has no data from subject8. With respect to data used in the paper, this brings the total capture time from 4.4 hours to 3.6 hours. Below we provide pretrained model and updated evaluation numbers for the published dataset.

Code

Third-party dependencies:

  • tqdm
  • numpy
  • gitpython
  • mmcv
  • torch
  • torchaudio

To train the network, run:

python train.py --config configs/config_main.py

To evaluate the performance of the model, in configs/config_main.py change test_info_file to desired test set: ./data_info/test/nonspeech_data.json for non-speech data, and ./data_info/test/speech_data.json for speech data, and run:

python evaluate.py --config configs/config_main.py --test_epoch best-accumulated_loss --out_name test 

To save the output .wav files add --save option, for example:

python evaluate.py --config configs/config_main.py --test_epoch epoch-100 --out_name test --save

Pretrained model

We provide the model trained on the published training set in ./checkpoint/neurips/pretrained/. To evaluate the model, run:

python evaluate.py --config configs/config_pretrained.py --test_epoch best-accumulated_loss --out_name neurips_evaluation 

The updated evaluation metrics are:

NON-SPEECH

SDR:                3.052
amplitude (x10^3):  0.832
phase:              0.314

SPEECH

SDR:                9.635
amplitude (x10^3):  0.701
phase:              0.464

NOTE: For the speech metrics reported in the paper, speech audio data was erroneously amplified by 10, resulting in the amplitude error being multiplied by 10, and the phase error being higher due to more silence/noise segments passing the energy threshold.

Citation

If you use this code or the dataset, please cite

@inproceedings{xu2023soundingbodies,
  title={Sounding Bodies: Modeling 3D Spatial Sound of Humans Using Body Pose and Audio},
  author={Xu, Xudong and Markovic, Dejan and Sandakly, Jacob and Keebler, Todd and Krenn, Steven and Richard, Alexander},
  booktitle={Conference on Neural Information Processing Systems},
  year={2023}
}

License

The code and dataset are released under CC-BY-NC 4.0 license.

About

We present a model that can generate accurate 3D sound fields of human bodies from headset microphones and body pose as inputs.

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages