Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cosine similarity is inconsistent with the cluster #42

Closed
tranctan opened this issue Nov 13, 2020 · 1 comment
Closed

Cosine similarity is inconsistent with the cluster #42

tranctan opened this issue Nov 13, 2020 · 1 comment

Comments

@tranctan
Copy link

tranctan commented Nov 13, 2020

Hi, when I tried visualizing the voices, it is shown that there is one sample (female voice) that is actually far away from the male speaker's utterances (which is expected).

However, when I compute the cosine similarity between the female's utterance versus the male ones, the value is quite high (0.88). I don't know if I perform the cosine similarity correctly here.

embed_1 = encoder.embed_utterance(y1)
embed_2 = encoder.embed_utterance(y2)
cosine_sim = embed_1 @ embed_2

Any help is very much appreciated !

@tranctan
Copy link
Author

tranctan commented Nov 19, 2020

I just figured out by chance that if we load the audio into numpy array (by librosa or scipy) in prior to feeding into preprocess_wav() function in resemblyzer.audio module, we need to make sure that we resample the data to 16,000Hz, or we can just feed the whole audio wav path to the preprocess_wav() instead.

This is trivial but really hard to find the mistake.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant