MFCC Feature Request: Log vs dB, and Documentation #1093

Novak3 · 2020-03-31T19:22:42Z

There appear to be two rival methods for calculating MFCCs.

One, used in Librosa by default if a specific melspectrogram is not supplied, uses a (power) dB scaled melspectrogram. Per issue #573, this may be to comply with a reference implementation in Matlab, although I did not see where in the reference implementation that was happening.

The other, used in packages such as python_speech_features uses as log scaled melspectrogram. This technique matches my understanding of Davis and Mermelstein, although I welcome correction if I have read it wrong.

This causes considerable confusion when users compare different packages and get wildly different results. It is likely the underlying issue in issue 573, it comes up in stack exchange and other forums, and has caused torchaudio to implement an argument which switches between the two behaviors.

I suggest/request the following:

Documentation mentioning both main approaches and making clear what is the default behavior here
An example in the documentation showing how to force the existing implementation to conform with the other methodology (i.e., construct an alternate melspectrogram input and use it, rather than raw audio samples)
If possible, an argument similar to torchaudio's implementation which will, for raw audio only, switch between the two implementations.

bmcfee · 2020-04-02T21:23:19Z

There appear to be two rival methods for calculating MFCCs.

I think this might be severely under-estimating the degree of variability in MFCC implementations. 😁 Since there isn't really a single canonical reference implementation, the best we can do is provide a flexible API and defaults which behave sanely and correspond to a well-known reference.

Per issue #573, this may be to comply with a reference implementation in Matlab, although I did not see where in the reference implementation that was happening.

That's done in https://labrosa.ee.columbia.edu/matlab/rastamat/ (see: melfcc, powspec, audspec), where the starting point is a power spectrum (rather than magnitude spectrum).

This causes considerable confusion when users compare different packages and get wildly different results

This one difference shouldn't cause too much divergence in the results, since after log-scaling, the change between magnitude and power becomes a scaling factor of 2. It's been a while since I looked into it, but I would expect much larger sources of variation to come from differences in how the log is actually computed (eg, bias stabilization), the definition of the mel scale itself, and how the filter-banks are normalized. (Not to mention other parameters involving the STFT windows, pre-emphasis, liftering, etc).

1. Documentation mentioning both main approaches and making clear what is the default behavior here

I hesitate to go down this route because there are so many parameters to explore that documenting "both main approaches" is almost surely going to be inadequate and lead to more confusion.

2\. An example in the documentation showing how to force the existing implementation to conform with the other methodology (i.e., construct an alternate melspectrogram input and use it, rather than raw audio samples)

This is a great idea, and could be seen as building off the previous issue #804. I think the best way to go about this is to provide an "advanced example" notebook that demonstrates how to exactly replicate the behavior of one or two well-known implementations (eg, HTK).

3\. If possible, an argument similar to torchaudio's implementation which will, for raw audio only, switch between the two implementations.

This is already implicit via pass-through parameters from mfcc to melspectrogram. You can call mfcc with power=1 to get log amplitude (instead of dB) behavior for audio input.

bmcfee added the discussion Open-ended discussion for developers and users label Apr 2, 2020

bmcfee closed this as completed Oct 28, 2021

bmcfee mentioned this issue Nov 4, 2022

Sign of first MFCC #1598

Closed

skewballfox mentioned this issue Jul 1, 2023

migrate mfcc to depend on mel_spectrogram secretsauceai/mfcc-rust#26

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MFCC Feature Request: Log vs dB, and Documentation #1093

MFCC Feature Request: Log vs dB, and Documentation #1093

Novak3 commented Mar 31, 2020

bmcfee commented Apr 2, 2020

MFCC Feature Request: Log vs dB, and Documentation #1093

MFCC Feature Request: Log vs dB, and Documentation #1093

Comments

Novak3 commented Mar 31, 2020

bmcfee commented Apr 2, 2020