This repository contains the code to reproduce the results of the Truchan_LUH submission to DCASE24 Task 1 "Data-Efficient Low-Complexity Acoustic Scene Classification" challenge.
The codebase for this repository is the baseline for task1: here
Create a conda environment
conda create -n asc python=3.10
conda activate asc
Download the dataset from this location and extract the files.
There are a total of 5 architectures:
- Isotropic
- Siren
- Adverserial
- RSC
- ASC Domain
Only Isotropic, Siren and RSC were submitted. In each experiment folder there is a dataset
folder with a dcase24.py
file, where the path to the datset has to be specified:
dataset_dir = None
All experiments have an argument split
which specifies the corresponding split: 5, 10, 25, 50,100 are available
The device impulse response augmentation has shown great success in previous submissions and is also used in in this submission. The device impulse responses are provided by MicIPR. All files are shared via Creative Commons license. All credits go to MicIRP & Xaudia.com.
Run isotropic training
python run_isotropic_training.py
Run isotropic with mixstyle from here
python run_isotropic_training.py --model=mix
Run isotropic without activation motivated from here
python run_isotropic_training.py --model=noact
Navigate to Isotropic_HPO and run isotropic hyper parameter optimization.
python run_isotropic_hpo.py
Isotropic Notebook Demonstrator
Run siren training
python run_siren_training.py
Previous domain generalization techniques have used augmentation to generalization. For next two achitectures we conduct representation learning experiments with the isotropic architecture as a backbone model. Two representation learning techniques from DeepDG were chosen:
- Domain Adverserial Neural Network (here called adverserial)
- Representation Self Challenging (RSC)
Run adverserial training
python run_adv_training.py
Run RSC training
python run_rsc_training.py
ASC Domain combines the adverserial approach with knowledge distillation. The training procedure and teacher models were taken from cpjku_dcase23 and EfficientAT. We train a total of 4 differenct architectures:
- MobileNet
- Dynamic MobileNet
- CP-ResNet
- PaSST each with different training setups and version of the architecture leading to a total of 22 teacher models. Each teacher model is trained on the 5 splits, resulting in 110 models.
Run teacher model training
Run MobileNet training. Argument width=[0.4, 0.5, 1.0]
python run_mn_training.py --width=0.4
Run Dynamic MobileNet training. Argument width=[0.4, 1.0]
python run_dymn_training.py --width=0.4
Run PaSST training
python run_passt_training.py
Run CP-ResNet training
python run_cp-resnet_training.py
Run single teacher student training with teacher and Isotropic as student
python run_convmixer_training.py --teacher=<teacher_name>
Example: Run single teacher student training with PaSST teacher and Isotropic as student
python run_convmixer_training.py --teacher=passt_dir_fms
Run ensemble teacher student training with teacher and Isotropic as student
python run_convmixer_training.py --teacher=best
Run ensemble teacher student training with teacher and Siren as student
python run_siren_training.py --teacher=best
Run ensemble teacher student adverserial training with teacher and Isotropic as student
python run_convmixer_adv_training.py --teacher=best
Run ensemble teacher student adverserial training with teacher and Siren as student
python run_siren_adv_training.py --teacher=best
The ensemble is selected by a forward stepwise selection algorithm:
- Start with empty ensemble
- Add the model to the ensemble that minimizes the ensemble validation loss
- Repeat step 2 until no improvement can be achieved
- Return ensemble
The implementation of the ensemble selection can be seen in ensemble_selection.ipynb
.