This repo is modified from https://github.com/qiuqiangkong/panns_transfer_to_gtzan
If you want view all pretrains, please go to https://github.com/qiuqiangkong/audioset_tagging_cnn

Tomofun 狗音辨識 AI 百萬挑戰賽 classification finetuned on pretrained audio neural networks (PANNs)

Audio tagging is a task to classify audio clips into classes. Tomofun 狗音辨識 AI 百萬挑戰賽 is a competetion containing 1200 5-second audio clips with 6 classes ['Barking', 'Howling', 'Crying', 'COSmoke', 'GlassBreaking','Other']. In this codebase, we fine-tune PANNs [1] to build audio classification systems.

TODO

Add confusion matrix
Plot Curve Notebooks
Grad Cam

Dataset

The dataset can be downloaded from https://tbrain.trendmicro.com.tw/Competitions/Details/15

Run the code

0. Prepare data

Download and upzip data, the data looks like:

meta_train.csv

train
├── train_00001.wav
├── train_00002.wav
├── ...
└── train_01200.wav

public_test
├── public_00001.wav
├── public_00002.wav
├── ...
└── public_10000.wav

private_test
├── private_00001.wav
├── private_00002.wav
├── ...
└── private_20000.wav

where meta_train.csv should in this format:

Filename	Label	Remark
train_00001	0	Barking
train_00002	0	Barking
...	...	...
train_01200	5	Dishes

1. Requirements

python 3.7.6

# in requirements.txt
matplotlib==3.4.2
dotmap==1.3.23
tensorflow==2.3.1
numpy==1.16.6
librosa==0.8.1
pandas==1.2.4
tqdm==4.61.1

pip install requirements.txt
bash download_cnn14.sh

2. Prepare Dataset

Run all code in Reorder_File.ipynb # note : the file train_01046.wav is omitted.
```
bash prepare_hdf5.sh
```

# it cost about 2.5 hour in my machine
# if you only need to training, please do not run the last 2 lines in prepare_hdf5.sh.

Note : if you want to try your own dataset, please modify following files

Reorder_File.ipynb
prepare_hdf5.sh
./utils/config.py

3. Start Training & Evaluate

bash train.sh # note : set augmentation to "none" is better in this dataset in our experiment

Run all code in UseFinetunedModelToPredict.ipynb # if you have modified the parameters, please correct them in the config section.
Check softmax_then_mean_from_panns_transfer_to_gtzan.csv

Model

A 14-layer CNN of PANNs is fine-tuned. We use 10-fold cross validation for Tomofun 狗音辨識 AI 百萬挑戰賽 classification. That is, 1080 audio clips are used for training, and 120 audio clips are used for validation.

Results

The system takes around 8 minutes to fit 600 mini-batch with a single card GeForce GTX 1080 Ti GPU card. Here is the result on 1nd fold. The results on different folds can be different.

Sun, 20 Jun 2021 19:34:01 main.py[line:69] INFO Namespace(augmentation='mixup', batch_size=32, cuda=True, dataset_dir='./train_transfered', filename='main', freeze_base=False, holdout_fold='1', learning_rate=0.0001, loss_type='clip_nll', mode='train', model_type='Transfer_Cnn14', pretrained_checkpoint_path='./Cnn14_mAP=0.431.pth', resume_iteration=0, stop_iteration=600, suffix='_train', workspace='.')
Sun, 20 Jun 2021 19:34:01 main.py[line:72] INFO Using GPU.
Sun, 20 Jun 2021 19:34:03 main.py[line:85] INFO Load pretrained model from ./Cnn14_mAP=0.431.pth
Sun, 20 Jun 2021 19:34:14 main.py[line:151] INFO ------------------------------------
Sun, 20 Jun 2021 19:34:14 main.py[line:152] INFO Iteration: 10
Sun, 20 Jun 2021 19:34:15 main.py[line:157] INFO Validate accuracy: 0.250
Sun, 20 Jun 2021 19:34:15 main.py[line:158] INFO Validate loss: 0.29280
Sun, 20 Jun 2021 19:34:15 utilities.py[line:103] INFO     Dump statistics to ./statistics/main/holdout_fold=1/Transfer_Cnn14/pretrain=True/loss_type=clip_nll/augmentation=mixup/batch_size=32/freeze_base=False/statistics.pickle
Sun, 20 Jun 2021 19:34:15 utilities.py[line:104] INFO     Dump statistics to ./statistics/main/holdout_fold=1/Transfer_Cnn14/pretrain=True/loss_type=clip_nll/augmentation=mixup/batch_size=32/freeze_base=False/statistics_2021-06-20_19-34-03.pkl
Sun, 20 Jun 2021 19:34:15 main.py[line:168] INFO Train time: 6.956 s, validate time: 1.354 s
Sun, 20 Jun 2021 19:34:21 main.py[line:151] INFO ------------------------------------

............

Sun, 20 Jun 2021 19:36:40 main.py[line:151] INFO ------------------------------------
Sun, 20 Jun 2021 19:36:40 main.py[line:152] INFO Iteration: 200
Sun, 20 Jun 2021 19:36:41 main.py[line:157] INFO Validate accuracy: 0.892
Sun, 20 Jun 2021 19:36:41 main.py[line:158] INFO Validate loss: 0.05802
Sun, 20 Jun 2021 19:36:41 utilities.py[line:103] INFO     Dump statistics to ./statistics/main/holdout_fold=1/Transfer_Cnn14/pretrain=True/loss_type=clip_nll/augmentation=mixup/batch_size=32/freeze_base=False/statistics.pickle
Sun, 20 Jun 2021 19:36:41 utilities.py[line:104] INFO     Dump statistics to ./statistics/main/holdout_fold=1/Transfer_Cnn14/pretrain=True/loss_type=clip_nll/augmentation=mixup/batch_size=32/freeze_base=False/statistics_2021-06-20_19-34-03.pkl
Sun, 20 Jun 2021 19:36:41 main.py[line:168] INFO Train time: 6.185 s, validate time: 1.518 s
Sun, 20 Jun 2021 19:36:42 main.py[line:182] INFO Model saved to ./checkpoints/main/holdout_fold=1/Transfer_Cnn14/pretrain=True/loss_type=clip_nll/augmentation=mixup/batch_size=32/freeze_base=False/200_iterations.pth
Sun, 20 Jun 2021 19:36:48 main.py[line:151] INFO ------------------------------------

............

Sun, 20 Jun 2021 19:39:16 main.py[line:151] INFO ------------------------------------
Sun, 20 Jun 2021 19:39:16 main.py[line:152] INFO Iteration: 400
Sun, 20 Jun 2021 19:39:17 main.py[line:157] INFO Validate accuracy: 0.925
Sun, 20 Jun 2021 19:39:17 main.py[line:158] INFO Validate loss: 0.04342
Sun, 20 Jun 2021 19:39:17 utilities.py[line:103] INFO     Dump statistics to ./statistics/main/holdout_fold=1/Transfer_Cnn14/pretrain=True/loss_type=clip_nll/augmentation=mixup/batch_size=32/freeze_base=False/statistics.pickle
Sun, 20 Jun 2021 19:39:17 utilities.py[line:104] INFO     Dump statistics to ./statistics/main/holdout_fold=1/Transfer_Cnn14/pretrain=True/loss_type=clip_nll/augmentation=mixup/batch_size=32/freeze_base=False/statistics_2021-06-20_19-34-03.pkl
Sun, 20 Jun 2021 19:39:17 main.py[line:168] INFO Train time: 6.161 s, validate time: 1.515 s
Sun, 20 Jun 2021 19:39:17 main.py[line:182] INFO Model saved to ./checkpoints/main/holdout_fold=1/Transfer_Cnn14/pretrain=True/loss_type=clip_nll/augmentation=mixup/batch_size=32/freeze_base=False/400_iterations.pth
Sun, 20 Jun 2021 19:39:24 main.py[line:151] INFO ------------------------------------

............

Citation

[1] Kong, Qiuqiang, Yin Cao, Turab Iqbal, Yuxuan Wang, Wenwu Wang, and Mark D. Plumbley. "PANNs: Large-scale pretrained audio neural networks for audio pattern recognition." arXiv preprint arXiv:1912.10211 (2019).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Tomofun 狗音辨識 AI 百萬挑戰賽 classification finetuned on pretrained audio neural networks (PANNs)

TODO

Dataset

Run the code

Model

Results

Citation

Files

README.md

Latest commit

History

README.md

File metadata and controls

Tomofun 狗音辨識 AI 百萬挑戰賽 classification finetuned on pretrained audio neural networks (PANNs)

TODO

Dataset

Run the code

Model

Results

Citation