Release Pre-Trained Models and PaSST ensemble predictions · fschmid56/EfficientAT

In this release, we upload pre-trained models as well as the ensembled PaSST logits we used for Knowledge Distillation.

passt_enemble_logits_mAP_495.npy: Ensembled Logits of 9 different PaSST models on AudioSet, Ensemble achieves a mAP of .495
mn<width_mult>_<dataset>: denotes width_mult used to scale the width of MobileNetV3 and the dataset the model was trained on ('as' stands for AudioSet), check out the Readme file for further details
dymn<width_mult>_<dataset>: denotes width_mult used to scale the width of a dynamic MobileNetV3 and the dataset the model was trained on ('as' stands for AudioSet), check out the Readme file for further details
fc denotes that the model is trained with a fully-convolutional head
s<num,num,num,num> denotes models trained with reduced strides; default: 2222
no_im_pre: no ImageNet pre-training before training on AudioSet
hop denotes the time resolution of spectrograms that the model is trained on (hop size in milliseconds)
mels denotes the number of mel bins (frequency resolution of spectrograms) that the model is trained on
Default: hop=10ms, mels=128 bands

Models are automatically downloaded when argument pretrained_name is set to the correct name.

Provide feedback