You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, when I tried visualizing the voices, it is shown that there is one sample (female voice) that is actually far away from the male speaker's utterances (which is expected).
However, when I compute the cosine similarity between the female's utterance versus the male ones, the value is quite high (0.88). I don't know if I perform the cosine similarity correctly here.
I just figured out by chance that if we load the audio into numpy array (by librosa or scipy) in prior to feeding into preprocess_wav() function in resemblyzer.audio module, we need to make sure that we resample the data to 16,000Hz, or we can just feed the whole audio wav path to the preprocess_wav() instead.
This is trivial but really hard to find the mistake.
Hi, when I tried visualizing the voices, it is shown that there is one sample (female voice) that is actually far away from the male speaker's utterances (which is expected).
However, when I compute the cosine similarity between the female's utterance versus the male ones, the value is quite high (0.88). I don't know if I perform the cosine similarity correctly here.
Any help is very much appreciated !
The text was updated successfully, but these errors were encountered: