Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Idea] Compute key similarity over the log-scale Mel spectrogram #49

Open
ggerganov opened this issue Oct 2, 2022 · 0 comments
Open
Labels
enhancement New feature or request

Comments

@ggerganov
Copy link
Owner

Currently, we compute the cross-correlation between time-domain key waveforms to determine how similar 2 keys are.
Instead, we can compute the similarity metric over the Mel spectrograms of the signals. The Mel spectrogram seems to be the go-to choice for audio representation in modern state-of-the-art speech recognition algorithms, so why not give it a try in keytap.

Here is a sample implementation to compute the log-scaled Mel spectrogram of an audio, that I recently did for the whisper.cpp project:

https://github.com/ggerganov/whisper.cpp/blob/6d654d192a62e6cd9897d6ff683bdc97406827e9/main.cpp#L1962-L2063

@ggerganov ggerganov added the enhancement New feature or request label Oct 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant