Issue Regarding Mozilla Common Voice 5.0 Dataset Speaker Identification #2379
Unanswered
newton2149
asked this question in
Q&A
Replies: 1 comment
-
Hello @newton2149, I am not very familiar with speaker recognition, but maybe you should try to split your audio in smaller audio files (maybe with a VAD/doing chunks of audio). I suppose that our model has been trained on short utterances and therefore fails on very long audios. Maybe @mravanelli can be of any helps here. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Describe the bug
When I try to use the Pre-Trained model _speechbrain/spkrec-ecapa-voxceleb_ for Speaker Recognition Task for a sample of 30 audios from the corpus by manually verifying the audio I get only 17-19 correctly identified.
Sharing the code
So what I am doing is I convert the audio to embeddings using Speechbrain Speech recogonition than check the cosine similarity with other embeddings in the file_speaker_pairs then I assign the speaker id
Currently in a sample of 30 audio files i get the speaker accurately for only 17 - 19 files. So is there any modification I can do to increase the accuracy?
Expected behaviour
Need to get almost 100% accuracy
To Reproduce
No response
Environment Details
No response
Relevant Log Output
No response
Additional Context
No response
Beta Was this translation helpful? Give feedback.
All reactions