Finetuning pretrained embeddings #2061

tryharder31 · 2023-07-02T12:22:14Z

tryharder31
Jul 2, 2023

Hello,

I am building a speaker/age/gender recognition system for Icelandic, and my plan is to pick a pretrained embedding and fine tune it on my Icelandic dataset, and then use that as features for the system. As far as I understand, the closest work to what I'm trying to do is in this paper, according to which WavLM is the best performer, but there are also models like language-and-voice-lab/whisper-large-icelandic-62640-steps-967h which are specifically trained for Icelandic. So I was wondering if anyone has experience on how these options trade off against each other, or is it just a matter of trying out different options.

And then, once I choose which pretrained model to use, do I understand correctly that the common practice is to further finetune it on the train portion of my dataset with access to the training label (speaker/age/gender), and then use those embeddings as features for the classifier?

Also, does anyone know which model card I should actually load to access the ones from the paper I cited. The authors don't mention where they saved their finetuned models, and I couldn't find any multilingual models in Speechbrain's Huggingface profile.

Thanks!

Adel-Moumen · 2023-07-03T09:16:45Z

Adel-Moumen
Jul 3, 2023
Maintainer

Hey @mravanelli could you please have a look? Thanks :)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Finetuning pretrained embeddings #2061

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

Finetuning pretrained embeddings #2061

tryharder31 Jul 2, 2023

Replies: 1 comment

Adel-Moumen Jul 3, 2023 Maintainer

tryharder31
Jul 2, 2023

Adel-Moumen
Jul 3, 2023
Maintainer