Adding normalized=True options to MelScale and MelSpectrogram modules #426

jongwook · 2020-02-06T01:15:38Z

This allows area normalization for Mel spectrogram (See #287 ), where the filterbank

In the same issue I also mentioned adding the Slaney mel frequencies in addition to HTK ones, but as the Slaney scale is quite cumbersome (manually connect the linear and logarithmic pieces), I'm not sure if the maintainers thinks it's worth adding it at the moment. So adding the area normalization only for now.

One caveat might be that the namining might be confusing since normalized is also used in the Spectrogram module, but I couldn't come up with better choice. Other sensible choices for naming include normalization=None | "area" or area_normalize=False | True, and I am happy to change the naming to either.

For the test, I've added one more case in the librosa consistency checks. CR appreciated!

to allow area normalization

jongwook · 2020-02-06T02:50:09Z

Fix incoming!

cpuhrsch · 2020-02-06T16:32:11Z

Is there value in pulling out this normalization approach and making it a separate transform?

jongwook · 2020-02-06T16:37:01Z

IMHO it wouldn't be necessary; it's still Mel spectrogram (and MFCC) conceptually, and people typically use whichever is the default in the toolkit. Many papers don't bother to report the Mel scale or normalization.

Still, It'd be nicer to have numerical interoperability as discussed in #287, hence this PR!

vincentqb · 2020-02-06T17:46:23Z

test/test_transforms.py

@@ -229,10 +229,10 @@ def _test_librosa_consistency_helper(n_fft, hop_length, power, n_mels, n_mfcc, s
            # test mel spectrogram
            melspect_transform = torchaudio.transforms.MelSpectrogram(
                sample_rate=sample_rate, window_fn=torch.hann_window,
-                hop_length=hop_length, n_mels=n_mels, n_fft=n_fft)
+                hop_length=hop_length, n_mels=n_mels, n_fft=n_fft, normalized=mel_norm == "slaney")


From comment, there seems to be more than one way to normalize, is that the case? If so, wouldn't it be cleaner to just say normalized="slaney"?

However, if we go this direction, I second @cpuhrsch, it may be valuable to have a SlaneyScale transform like the MelScale transform that torchaudio has instead. Thoughts?

Sorry for the confusion; the comments ("htk" vs "slaney") was about the mel-hz conversion methods, while this PR addresses an orthogonal option where the height of the triangular filters can be adjusted to have the same area. (figure)

This (and librosa's API) is somewhat confusing because there are the Mel scale from Slaney and Slaney normalization, both referring to the auditory toolbox by Malcolm Slaney.

So for this specific issue I'd suggest using normalized or area_normalize for the names.

NOT related to the PR but regarding the Mel scale options:

We can later consider supporting other Mel scales (which is good to have for better librosa compliance but not my top priority).

Even in that case, having SlaneyScale and MelScale would be quite confusing; both are Mel scales, and there is no "standard" Mel scale used in literature (which is quite unfortunate).

vincentqb · 2020-02-06T19:57:03Z

torchaudio/functional.py

@@ -406,6 +406,7 @@ def create_fb_matrix(n_freqs, f_min, f_max, n_mels, sample_rate):
        f_max (float): Maximum frequency (Hz)
        n_mels (int): Number of mel filterbanks
        sample_rate (int): Sample rate of the audio waveform
+        normalized (bool, optional): area-normalize the FB, so that each filter has equal area


nit: "If True, normalize each filter to have equal area. If False, normalize each filter to have equal height. (Default: False)"

Thoughts?

I love it! But before commiting these changes, I just learned about the upcoming chainge in librosa.filters.mel where the area normalizatioin has two variants, so people may choose from:

norm=None: same as torchaudio's default MelScale

norm="slaney": the very new default (since 0.7.2, in Jan 13, 2020); same as the area normalization in this PR, dividing the filter by the width between the adjacent mel frequencies. While this makes the L1 norm of each filter slightly different, this was preferred historically because it is regression tested to be bit identical with the MATLAB counterpart.

norm=1: area normalization that has been the default historically and behaves exactly the same as norm="slaney", but will change behavior in future librosa (likely 0.8), where the components will exactly sum to 1 (i.e. L1-normalized)

norm=p: similar to the above, using Lp-norm, but there's no practical reason to use this.

Do you think we should follow their convention instead?

That's good to know! Given how the implementation is just a small division at the end of the computation on the filterbanks, I was still considering something like MelScale, SlaneyScale. However, if we add a MelPScale with a parameter for p-norm, it seems like we are back to having to provide a parameter.

Let' just use their new convention.

Sure, I'll modify the current PR to have librosa-0.8-style norm= options instead.

(Again, below not pertaining to this PR)

I still think the naming SlaneyScale on top of MelScale would be confusing as commented above, because:

The Mel scale from the MATLAB auditory toolbox (referred to as "slaney") is also a Mel scale. It might give an impression that it is not, or less legitimate than the HTK Mel scale.

What the current MelScale class represents is actually a Mel filter bank, because the term "Mel scale" simply refer to a series of Mel frequencies mapped from Hz (by a function that is roughly linear below 1kHz and roughly logarithmic above 1kHz). The MelScale class contains a set of triangular filters constructed from the Mel scale. So MelFilterBank would have been a better naming.

With MelScale encapsulating both the scale AND the filterbank, it is not quite clear in the name SlaneyScale if "slaney" is referring to the scale or the normalization method, and we too seem to be confused throughout the comments on this PR.

Nobody uses the term "slaney scale"; when googled with quotes, I see only three irrelevant results and none in Google Scholar.

Regardless of the future naming of SlaneyScale (or mel_scale="htk"|"slaney" as I'd prefer), it should only apply to the Mel scale, and it is orthogonal to the norm options this PR is addressing.

jongwook · 2020-02-07T17:11:11Z

I'll re-open the PR once the librosa 0.8+ style norm options are ready.

vincentqb · 2020-02-07T17:45:31Z

I'll re-open the PR once the librosa 0.8+ style norm options are ready.

The change will be around standardizing p=1 and p=infinity, right? This is a reasonable standard to me. I'd also be ok going ahead since we know this will happen "soon".

jongwook · 2020-02-07T19:03:35Z

Yes! I didn't mean to block on them but rather getting the code ready on my side 😄

added normalized option in Mel modules

e623700

to allow area normalization

jongwook force-pushed the mel-area-normalization branch from b56daeb to e623700 Compare February 6, 2020 01:17

jongwook added 2 commits February 5, 2020 18:52

keeping create_fb_matrix backwards-compatible

d69ee09

fixed too many blank lines

12d94f4

cpuhrsch requested a review from vincentqb February 6, 2020 16:31

vincentqb reviewed Feb 6, 2020

View reviewed changes

jongwook closed this Feb 7, 2020

This was referenced Apr 27, 2020

add slaney normalization #589

Merged

enable mel_scale option #593

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding normalized=True options to MelScale and MelSpectrogram modules #426

Adding normalized=True options to MelScale and MelSpectrogram modules #426

jongwook commented Feb 6, 2020 •

edited

jongwook commented Feb 6, 2020

cpuhrsch commented Feb 6, 2020

jongwook commented Feb 6, 2020

vincentqb Feb 6, 2020

jongwook Feb 6, 2020 •

edited

vincentqb Feb 6, 2020

jongwook Feb 7, 2020

vincentqb Feb 7, 2020

jongwook Feb 7, 2020

jongwook commented Feb 7, 2020

vincentqb commented Feb 7, 2020

jongwook commented Feb 7, 2020

Adding normalized=True options to MelScale and MelSpectrogram modules #426

Adding normalized=True options to MelScale and MelSpectrogram modules #426

Conversation

jongwook commented Feb 6, 2020 • edited

jongwook commented Feb 6, 2020

cpuhrsch commented Feb 6, 2020

jongwook commented Feb 6, 2020

vincentqb Feb 6, 2020

Choose a reason for hiding this comment

jongwook Feb 6, 2020 • edited

Choose a reason for hiding this comment

vincentqb Feb 6, 2020

Choose a reason for hiding this comment

jongwook Feb 7, 2020

Choose a reason for hiding this comment

vincentqb Feb 7, 2020

Choose a reason for hiding this comment

jongwook Feb 7, 2020

Choose a reason for hiding this comment

jongwook commented Feb 7, 2020

vincentqb commented Feb 7, 2020

jongwook commented Feb 7, 2020

jongwook commented Feb 6, 2020 •

edited

jongwook Feb 6, 2020 •

edited