You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm curious how to extract embeddings, and if that's the output of the compress function / command line tool, and whether that could be used to compare, via cosine similarity, how similar 2 audio files are?
The text was updated successfully, but these errors were encountered:
Good question, we actually haven't tried. We definitely believe that the model performs some "collapse" of similar audio on the same representation, and it eliminates some of the variability that might occur between two similar audios (e.g. phase difference, white noise components). Note that we have good reasons to believe the representation is mostly at the acoustic level. Thus semantic comparisons (e.g. two musics with the same genre, or two people talking of the same topic) wouldn't be close in the latent space.
❓ Questions
I'm curious how to extract embeddings, and if that's the output of the compress function / command line tool, and whether that could be used to compare, via cosine similarity, how similar 2 audio files are?
The text was updated successfully, but these errors were encountered: