Finetuning pretrained embeddings #2061
tryharder31
started this conversation in
General
Replies: 1 comment
-
Hey @mravanelli could you please have a look? Thanks :) |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello,
I am building a speaker/age/gender recognition system for Icelandic, and my plan is to pick a pretrained embedding and fine tune it on my Icelandic dataset, and then use that as features for the system. As far as I understand, the closest work to what I'm trying to do is in this paper, according to which WavLM is the best performer, but there are also models like language-and-voice-lab/whisper-large-icelandic-62640-steps-967h which are specifically trained for Icelandic. So I was wondering if anyone has experience on how these options trade off against each other, or is it just a matter of trying out different options.
And then, once I choose which pretrained model to use, do I understand correctly that the common practice is to further finetune it on the train portion of my dataset with access to the training label (speaker/age/gender), and then use those embeddings as features for the classifier?
Also, does anyone know which model card I should actually load to access the ones from the paper I cited. The authors don't mention where they saved their finetuned models, and I couldn't find any multilingual models in Speechbrain's Huggingface profile.
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions