Skip to content

Pre-Trained Models and PaSST ensemble predictions

Latest
Compare
Choose a tag to compare
@fschmid56 fschmid56 released this 17 Nov 14:37
· 64 commits to main since this release

In this release, we upload pre-trained models as well as the ensembled PaSST logits we used for Knowledge Distillation.

  • passt_enemble_logits_mAP_495.npy: Ensembled Logits of 9 different PaSST models on AudioSet, Ensemble achieves a mAP of .495
  • mn<width_mult>_<dataset>: denotes width_mult used to scale the width of MobileNetV3 and the dataset the model was trained on ('as' stands for AudioSet), check out the Readme file for further details
  • dymn<width_mult>_<dataset>: denotes width_mult used to scale the width of a dynamic MobileNetV3 and the dataset the model was trained on ('as' stands for AudioSet), check out the Readme file for further details
  • fc denotes that the model is trained with a fully-convolutional head
  • s<num,num,num,num> denotes models trained with reduced strides; default: 2222
  • no_im_pre: no ImageNet pre-training before training on AudioSet
  • hop denotes the time resolution of spectrograms that the model is trained on (hop size in milliseconds)
  • mels denotes the number of mel bins (frequency resolution of spectrograms) that the model is trained on
  • Default: hop=10ms, mels=128 bands

Models are automatically downloaded when argument pretrained_name is set to the correct name.