-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to compare features to estimate the similarity of two signals? #268
Comments
I guess you could simply normalize and compare the MFCC features using DTW (Dynamic Time Warping), instead of using all the features. Most features like rms and energy won't even help distinguish two audio signals' perceptual difference to humans when they are sounds of two distinct instruments. |
Completely okay! For sound similarity, I would usually pick the audio features I care most about (for example, I might be looking for sounds that have similar brightnesses and noisinesses, but not care about the loudness, so I would pick spectral centroid and spectral flatness). Then, I would represent each sound as a vector of audio features (so If you would like to get a better similarity metric, you could build a model to weight each of the dimensions of the vector. You might for example have a UI that plays two pairs of sounds and ask a user to pick which "similarity" number is more accurate. From that data you could build a set of weightings for each dimension of the vector. Another approach, not using meyda, is to take your sounds and train a convolutional autoencoder on their signals directly. This will develop an embedding of the sounds, which you can use as a vector on which to measure distance with euclidean distance, but it's tuned specifically to your dataset. I hope one of those approaches helps! I'm going to leave this issue open as a reminder to me to write a guide on this - it should really be part of Meyda's docs. |
You might also want to check out this library. They used Shazam's paper to reconstruct the audio fingerprinting algorithm Shazam uses. In other words, you can take "fingerprints" of audio at set intervals, in the form of hashes. You can then compare future hashes to see if you're listening to a clip from the same audio. It might be different than what you're looking for, but thought I'd share anyway. A lot of the work is about finding similarities in sound clips. The Shazam paper is a very interesting read also, if you want to construct something for similar use cases. Package: https://www.npmjs.com/package/stream-audio-fingerprint Shazam Paper: |
This guide is unfinished, storing it here. fix #268
Hi, this is a question about how to apply
meyda
! Hope that's okay.How might one generally use extracted features to estimate the perceptual difference between two sounds? That is, I'm trying to define an error function that returns a high value when comparing a piano note to a snare drum, a low value when comparing two different snare drums, etc.
Right now I am taking a naive, straightforward approach - I loop through in bins of ~512 samples, extracting various features (
mfcc
,rms
,chroma
, etc.), and summing up the total difference in feature values. This sort of works as a rough baseline, but obviously it's very lacking - it tends to find very large feature differences between sounds that are perceptually identical to a human.Are there known ways of approaching this - e.g. combinations of features to use, or ways of calculating the error between two sets of extracted features?
Thanks!
The text was updated successfully, but these errors were encountered: