Skip to content

nttcslab/apwd-dataset

Repository files navigation

Audio Pair with Difference dataset (APwD dataset)

APwD dataset is a pair of sounds with differences and text describing the differences. It is prepared by Daiki Takeuchi and members in NTT CS lab. The APwD dataset is designed for research that introduces auxilary textual information into content-based audio retrieval. Similar sound pairs are synthesized from the existing datasets for audio tagging, FSD50K and ESC50, and the differences are described based on synthesizing method. For details, please refer to the paper [1]. If you use the APwD dataset in your work, please cite this paper where it was introduced.

[1] Daiki Takeuchi, Yasunori Ohishi, Daisuke Niizumi, Noboru Harada and Kunio Kashino, "Introducing auxiliary text query-modifier to content-based audio retrieval," in Proc of INTERSPEECH, 2022.

Usage

  1. Preparing FSD50K and ESC-50
    Download FSD50K and ESC50. You can download them from the following URLs
    FSD50k: https://zenodo.org/record/4060432
    ESC-50: https://github.com/karolpiczak/ESC-50
    After downloading, make a note of the directory where each wav file is saved (it will be used in the next step).

  2. Modifying setting
    In utils.py, rewrite the contents of the two variables (FSD50K and ESC50) to your environment The variables are defined at the beginning of the file as follows: directories FSD50K and ESC-50. Please enter the directory of the data saved in the previous step.

  3. Synthesizing dataset
    Run synthesize_dataset.sh.

License

See the file named LICENSE

Authors

Daiki Takeuchi
Yasunori Ohishi
Daisuke Niizumi
Noboru Harada
Kunio Kashino

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published