AD-YOLO: You Look Only Once in Training Multiple Sound Event Localization and Detection
(ICASSP 2023)

This is a Pytorch implementation of AD-YOLO: You Look Only Once in Training Multiple Sound Event Localization and Detection. We share an overall framework used to train and evaluate models/formats on DCASE 2020~2022 Task3 (SELD) datasets.

AD-YOLO tackles the SELD problem under an unknown polyphony environment. Taking the notion of angular distance, we adapt the approach of You Only Look Once (YOLO) algorithm to SELD. Experimental results demonstrate the potential of AD-YOLO to outperform the existing formats and show the robustness of handling class-homogenous polyphony.

Below figure depicts an example how AD-YOLO designates the responsible predictions for each ground truth targets at a single time frame.

Environment Supports & Python Requirements

We recommend you to visit Previous Versions (v1.10.0) for PyTorch installation including torchvision==0.11.0 and torchaudio==0.10.0.

Use the requirements.txt to install the rest of Python dependencies.
Ubuntu-Soundfile and conda-ffmpeg packages are also required, and you can install them as below.

$ pip install -r requirements.txt
$ apt-get install python3-soundfile
$ conda install -c conda-forge ffmpeg

Usage

1. Prepare Datasets

The datasets can be downloaded from here:

TAU-NIGENS Spatial Sound Events 2020
TAU-NIGENS Spatial Sound Events 2021
[DCASE2022 Task 3] Synthetic SELD mixtures for baseline training
STARSS22: Sony-TAu Realistic Spatial Soundscapes 2022 dataset

For detailed information on file hierarchies and structures, please see:

AD-YOLO/data/DCASE2020_SELD ; DCASE2021_SELD ; DCASE2022_SELD

2. Preprocess Train Data

The first Python command below will slice the audio/label of training data into uniform time chunks. You can give a specific annual dataset as an argument, such as "DCASE2020", "DCASE2021" and "DCASE2022".

If you give scaler as an action, this will compute and save the stats, mean and standard deviation, of acoustic feature from training data.

Hyperparameters stated in data configurations (e.g. hyp_data_DCASE2022.yaml) involves with this procedure.

$ python src/preprocess.py chunking --dataset all
$ python src/preprocess.py scaler --dataset all

3-1. Initiate Model Training Pipeline

If you want to initiate the pipeline directly, use as an example below:

$ cd src
$ python main.py train --encoder se-resnet34 --loss adyolo --dataset DCASE2021 --device cuda:0

Or you would manage the experiment easier using run.sh.

$ sh run.sh

The pipeline will first create the result folder to save the setups, predictions, model weights and checkpoint of the experiment. You can check that from src/results/.

If you have an account at neptune.ai, you can give --logger argument on command to record the training procedure. (Go src/configs/logging_meta_config.yaml and configure your neptune_project & neptune_api_token first.)

Giving --logger, an experiment ID created at your neptune.ai [project] will become a name and ID of the output folder.
Or else, without --logger argument, the pipeline will automatically create the output folder and its ID as local-YYYYMMDD-HHmmss

You can find more detailed description for command arguments in src/main.py (see also src/configs/ for hyperparameters).

$ python main.py -h

3-2. Resume Interrupted Experiment

This will restart(resume) the pipeline from the checkpoint with the name (ID; e.g. local-YYYYMMDD-HHmmss) of the experiment folder.

Give an experiment ID/name on --resume_pth.

$ cd src
$ python main.py train --resume_pth local-YYYYMMDD-HHmmss --device cuda:0

3-3. Evalutate Experimental Result

You can also use the ID to evaluate the best-validated model.

An ID/name of the experiment is required to --eval_pth

$ cd src
$ python main.py test --eval_pth local-YYYYMMDD-HHmmss --device cuda:0

You can check the valid set score by giving val as an action.

$ python main.py val --eval_pth local-YYYYMMDD-HHmmss --device cuda:0

3-4. Make Inferences

Give infer action and configure --eval_pth & --infer_pth argument to make an inference on .wav audio files.

--infer_pth is a folder contains audio files that you want to make inferences.

$ cd src
$ python main.py infer --eval_pth local-YYYYMMDD-HHmmss --infer_pth ~/folder-somewhere/audiofile-exists/ --device cuda:0

Citation

@article{kim2023ad,
  title={AD-YOLO: You Look Only Once in Training Multiple Sound Event Localization and Detection},
  author={Kim, Jin Sob and Park, Hyun Joon and Shin, Wooseok and Han, Sung Won},
  journal={arXiv preprint arXiv.2303.15703},
  year={2023}
}

@inproceedings{kim2023ad,
  title={AD-YOLO: You Look Only Once in Training Multiple Sound Event Localization and Detection},
  author={Kim, Jin Sob and Park, Hyun Joon and Shin, Wooseok and Han, Sung Won},
  booktitle={ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  year={2023},
  organization={IEEE}
}

License

This repository is released under the MIT license.

The file src/utils/seld_metrics.py was adapted from the sharathadavanne/seld-dcase2022, released under the MIT license. We modified some parts to fit the repository structure and added some classes & functions for exclusive evaluation under polyphony circumstances.

Name		Name	Last commit message	Last commit date
Latest commit History 84 Commits
data		data
img		img
src		src
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

img

img

src

src

LICENSE

LICENSE

README.md

README.md

requirements.txt

requirements.txt

run.sh

run.sh

Repository files navigation

AD-YOLO: You Look Only Once in Training Multiple Sound Event Localization and Detection
(ICASSP 2023)

Environment Supports & Python Requirements

Usage

1. Prepare Datasets

2. Preprocess Train Data

3-1. Initiate Model Training Pipeline

3-2. Resume Interrupted Experiment

3-3. Evalutate Experimental Result

3-4. Make Inferences

Citation

License

About

Releases 1

Packages

Languages

License

sadPororo/AD-YOLO

Folders and files

Latest commit

History

Repository files navigation

AD-YOLO: You Look Only Once in Training Multiple Sound Event Localization and Detection(ICASSP 2023)

Environment Supports & Python Requirements

Usage

1. Prepare Datasets

2. Preprocess Train Data

3-1. Initiate Model Training Pipeline

3-2. Resume Interrupted Experiment

3-3. Evalutate Experimental Result

3-4. Make Inferences

Citation

License

About

Topics

Resources

License

Stars

Watchers

Forks

Languages

AD-YOLO: You Look Only Once in Training Multiple Sound Event Localization and Detection
(ICASSP 2023)