Cosine similarity is inconsistent with the cluster #42

tranctan · 2020-11-13T04:11:41Z

Hi, when I tried visualizing the voices, it is shown that there is one sample (female voice) that is actually far away from the male speaker's utterances (which is expected).

However, when I compute the cosine similarity between the female's utterance versus the male ones, the value is quite high (0.88). I don't know if I perform the cosine similarity correctly here.

embed_1 = encoder.embed_utterance(y1)
embed_2 = encoder.embed_utterance(y2)
cosine_sim = embed_1 @ embed_2

Any help is very much appreciated !

The text was updated successfully, but these errors were encountered:

tranctan · 2020-11-19T03:37:24Z

I just figured out by chance that if we load the audio into numpy array (by librosa or scipy) in prior to feeding into preprocess_wav() function in resemblyzer.audio module, we need to make sure that we resample the data to 16,000Hz, or we can just feed the whole audio wav path to the preprocess_wav() instead.

This is trivial but really hard to find the mistake.

tranctan closed this as completed Nov 19, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cosine similarity is inconsistent with the cluster #42

Cosine similarity is inconsistent with the cluster #42

tranctan commented Nov 13, 2020 •

edited

tranctan commented Nov 19, 2020 •

edited

Cosine similarity is inconsistent with the cluster #42

Cosine similarity is inconsistent with the cluster #42

Comments

tranctan commented Nov 13, 2020 • edited

tranctan commented Nov 19, 2020 • edited

tranctan commented Nov 13, 2020 •

edited

tranctan commented Nov 19, 2020 •

edited