BC learning for sounds

Implementation of Learning from Between-class Examples for Deep Sound Recognition by Yuji Tokozume, Yoshitaka Ushiku, and Tatsuya Harada (ICLR 2018).

This also contains training of EnvNet: Learning Environmental Sounds with End-to-end Convolutional Neural Network (Yuji Tokozume and Tatsuya Harada, ICASSP 2017).¹

News

(2018/02/16) Add support to the latest ESC datasets
(2018/01/29) Our paper was accepted by ICLR 2018

Between-class (BC) learning
- We generate between-class examples by mixing two training examples belonging to different classes with a random ratio.
- We then input the mixed data to the model and train the model to output the mixing ratio.
Training of EnvNet and EnvNet-v2 on ESC-50, ESC-10 [1], and UrbanSound8K [2] datasets
- EnvNet-v2: a deeper version of EnvNet. The performance of it on ESC-50 surpasses the human level when using BC learning.

Setup

Install Chainer v1.24 on a machine with CUDA GPU.
Prepare datasets following this page.

Training

Template:

  python main.py --dataset [esc50, esc10, or urbansound8k] --netType [envnet or envnetv2] --data path/to/dataset/directory/ (--BC) (--strongAugment)

Recipes:

Standard learning of EnvNet on ESC-50 (around 29% error²):

  python main.py --dataset esc50 --netType envnet --data path/to/dataset/directory/

BC learning of EnvNet on ESC-50 (around 24% error):

  python main.py --dataset esc50 --netType envnet --data path/to/dataset/directory/ --BC

BC learning of EnvNet-v2 on ESC-50 with strong data augmentation (around 15% error, the best performance):

  python main.py --dataset esc50 --netType envnetv2 --data path/to/dataset/directory/ --BC --strongAugment

Notes:
- Validation accuracy is calculated using 10-crop testing.
- By default, it performs K-fold cross validation using the original fold settings. You can run on a particular split by using --split command.
- Please check opts.py for other command line arguments.

Results

Error rate (Standard learning → BC learning)

Model	ESC-50	ESC-10	UrbanSound8K
EnvNet	29.2 → 24.1	12.8 → 11.3	33.7 → 28.9
EnvNet-v2	25.6 → 18.2	14.2 → 10.6	30.9 → 23.4
EnvNet-v2 + strong augment	21.2 → 15.1	10.9 → 8.6	24.9 → 21.7
Humans [1]	18.7	4.3	-

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
dataset_gen		dataset_gen
models		models
.gitignore		.gitignore
README.md		README.md
dataset.py		dataset.py
main.py		main.py
opts.py		opts.py
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dataset_gen

dataset_gen

models

models

.gitignore

.gitignore

README.md

README.md

dataset.py

dataset.py

main.py

main.py

opts.py

opts.py

train.py

train.py

utils.py

utils.py

Repository files navigation

BC learning for sounds

News

Contents

Setup

Training

Results

See also

Reference

About

Releases

Packages

Languages

mil-tokyo/bc_learning_sound

Folders and files

Latest commit

History

Repository files navigation

BC learning for sounds

News

Contents

Setup

Training

Results

See also

Reference

About

Resources

Stars

Watchers

Forks

Languages