This is the implementation of our submission for the challenge DCASE 2023 task 5.
We invite you to take a look at the workshop paper version for more details and ablation studies.
We ranked 2nd in the challenge. For more informations about the challenge results, click here
Our approach consists in :
- Training a feature extractor on the training set
- Training a linear classifier on each audio of the validation set
Firstly, we create the spectrograms of the training set :
create_train.py
: with argument --traindir
for the folder containing the training datasets.
To train the feature extractor :
train.py
: with arguments --traindir
(the same as above), --device
the device to train on, and others concerning training and data augmentation hyperparameters that can be found in args.py
with default values that we used.
To validate the learned feature extractor using 5-shots :
evaluate.py
: with arguments --valdir
for the folder containing the validation datasets, and others concerning hyperparameters that can also be found in args.py
. Add :
For submission 1 :
--ft 0 --ftlr 0.01 --ftepochs 20 --method ce --adam
For submission 2 :
--ft 1 --ftlr 0.001 --ftepochs 40 --method ce --adam
For submission 3 :
--ft 2 --ftlr 0.001 --ftepochs 40 --method ce --adam
For submission 4 :
--ft 3 --ftlr 0.001 --ftepochs 40 --method ce --adam
To get the scores :
evaluation_metrics/evaluation.py
: with arguments -pred_file
for the predictions csv file created by evaluate.py
(the file is in : traindir/../../outputs/eval.csv'), -ref_files
for the path of validation datasets, and -save_path
for the folder where to save the scores json file
If you any question or a problem with the code/results do not hesitate to mail me on : ilyass.moummad@imt-atlantique.fr or open an issue on this repository, I am very responsive.
We are thankful for the challenge baseline code that helped us make this repository : https://github.com/ilyassmoummad/dcase-few-shot-bioacoustic/tree/main
Take a look at our newer work accepted at ICASSP 2024 where we improve the pre-training loss as well as the inference strategy (which is more stable) : https://github.com/ilyassmoummad/RCL_FS_BSED
@techreport{Moummad2023,
Author = "Moummad, Ilyass and Serizel, Romain and Farrugia, Nicolas",
title = "SUPERVISED CONTRASTIVE LEARNING FOR PRE-TRAINING BIOACOUSTIC FEW SHOT SYSTEMS",
institution = "DCASE2023 Challenge",
year = "2023",
month = "June",
}
@inproceedings{moummad,
author = "Moummad, Ilyass and Serizel, Romain and Farrugia, Nicolas",
title = "Pretraining Representations for Bioacoustic Few-Shot Detection Using Supervised Contrastive Learning",
booktitle = "Proceedings of the 8th Detection and Classification of Acoustic Scenes and Events 2023 Workshop (DCASE2023)",
address = "Tampere, Finland",
month = "September",
year = "2023",
pages = "136--140",
}