Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding normalized=True options to MelScale and MelSpectrogram modules #426

Closed
wants to merge 3 commits into from

Conversation

jongwook
Copy link

@jongwook jongwook commented Feb 6, 2020

This allows area normalization for Mel spectrogram (See #287 ), where the filterbank

In the same issue I also mentioned adding the Slaney mel frequencies in addition to HTK ones, but as the Slaney scale is quite cumbersome (manually connect the linear and logarithmic pieces), I'm not sure if the maintainers thinks it's worth adding it at the moment. So adding the area normalization only for now.

One caveat might be that the namining might be confusing since normalized is also used in the Spectrogram module, but I couldn't come up with better choice. Other sensible choices for naming include normalization=None | "area" or area_normalize=False | True, and I am happy to change the naming to either.

For the test, I've added one more case in the librosa consistency checks. CR appreciated!

to allow area normalization
@jongwook
Copy link
Author

jongwook commented Feb 6, 2020

Fix incoming!

@cpuhrsch
Copy link
Contributor

cpuhrsch commented Feb 6, 2020

Is there value in pulling out this normalization approach and making it a separate transform?

@jongwook
Copy link
Author

jongwook commented Feb 6, 2020

IMHO it wouldn't be necessary; it's still Mel spectrogram (and MFCC) conceptually, and people typically use whichever is the default in the toolkit. Many papers don't bother to report the Mel scale or normalization.

Still, It'd be nicer to have numerical interoperability as discussed in #287, hence this PR!

@@ -229,10 +229,10 @@ def _test_librosa_consistency_helper(n_fft, hop_length, power, n_mels, n_mfcc, s
# test mel spectrogram
melspect_transform = torchaudio.transforms.MelSpectrogram(
sample_rate=sample_rate, window_fn=torch.hann_window,
hop_length=hop_length, n_mels=n_mels, n_fft=n_fft)
hop_length=hop_length, n_mels=n_mels, n_fft=n_fft, normalized=mel_norm == "slaney")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From comment, there seems to be more than one way to normalize, is that the case? If so, wouldn't it be cleaner to just say normalized="slaney"?

However, if we go this direction, I second @cpuhrsch, it may be valuable to have a SlaneyScale transform like the MelScale transform that torchaudio has instead. Thoughts?

Copy link
Author

@jongwook jongwook Feb 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the confusion; the comments ("htk" vs "slaney") was about the mel-hz conversion methods, while this PR addresses an orthogonal option where the height of the triangular filters can be adjusted to have the same area. (figure)

This (and librosa's API) is somewhat confusing because there are the Mel scale from Slaney and Slaney normalization, both referring to the auditory toolbox by Malcolm Slaney.

So for this specific issue I'd suggest using normalized or area_normalize for the names.


NOT related to the PR but regarding the Mel scale options:

We can later consider supporting other Mel scales (which is good to have for better librosa compliance but not my top priority).

Even in that case, having SlaneyScale and MelScale would be quite confusing; both are Mel scales, and there is no "standard" Mel scale used in literature (which is quite unfortunate).

@@ -406,6 +406,7 @@ def create_fb_matrix(n_freqs, f_min, f_max, n_mels, sample_rate):
f_max (float): Maximum frequency (Hz)
n_mels (int): Number of mel filterbanks
sample_rate (int): Sample rate of the audio waveform
normalized (bool, optional): area-normalize the FB, so that each filter has equal area
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "If True, normalize each filter to have equal area. If False, normalize each filter to have equal height. (Default: False)"

Thoughts?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love it! But before commiting these changes, I just learned about the upcoming chainge in librosa.filters.mel where the area normalizatioin has two variants, so people may choose from:

  • norm=None: same as torchaudio's default MelScale
  • norm="slaney": the very new default (since 0.7.2, in Jan 13, 2020); same as the area normalization in this PR, dividing the filter by the width between the adjacent mel frequencies. While this makes the L1 norm of each filter slightly different, this was preferred historically because it is regression tested to be bit identical with the MATLAB counterpart.
  • norm=1: area normalization that has been the default historically and behaves exactly the same as norm="slaney", but will change behavior in future librosa (likely 0.8), where the components will exactly sum to 1 (i.e. L1-normalized)
  • norm=p: similar to the above, using Lp-norm, but there's no practical reason to use this.

Do you think we should follow their convention instead?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's good to know! Given how the implementation is just a small division at the end of the computation on the filterbanks, I was still considering something like MelScale, SlaneyScale. However, if we add a MelPScale with a parameter for p-norm, it seems like we are back to having to provide a parameter.

Let' just use their new convention.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll modify the current PR to have librosa-0.8-style norm= options instead.


(Again, below not pertaining to this PR)

I still think the naming SlaneyScale on top of MelScale would be confusing as commented above, because:

  • The Mel scale from the MATLAB auditory toolbox (referred to as "slaney") is also a Mel scale. It might give an impression that it is not, or less legitimate than the HTK Mel scale.
  • What the current MelScale class represents is actually a Mel filter bank, because the term "Mel scale" simply refer to a series of Mel frequencies mapped from Hz (by a function that is roughly linear below 1kHz and roughly logarithmic above 1kHz). The MelScale class contains a set of triangular filters constructed from the Mel scale. So MelFilterBank would have been a better naming.
  • With MelScale encapsulating both the scale AND the filterbank, it is not quite clear in the name SlaneyScale if "slaney" is referring to the scale or the normalization method, and we too seem to be confused throughout the comments on this PR.
  • Nobody uses the term "slaney scale"; when googled with quotes, I see only three irrelevant results and none in Google Scholar.

Regardless of the future naming of SlaneyScale (or mel_scale="htk"|"slaney" as I'd prefer), it should only apply to the Mel scale, and it is orthogonal to the norm options this PR is addressing.

@jongwook
Copy link
Author

jongwook commented Feb 7, 2020

I'll re-open the PR once the librosa 0.8+ style norm options are ready.

@jongwook jongwook closed this Feb 7, 2020
@vincentqb
Copy link
Contributor

I'll re-open the PR once the librosa 0.8+ style norm options are ready.

The change will be around standardizing p=1 and p=infinity, right? This is a reasonable standard to me. I'd also be ok going ahead since we know this will happen "soon".

@jongwook
Copy link
Author

jongwook commented Feb 7, 2020

Yes! I didn't mean to block on them but rather getting the code ready on my side 😄

This was referenced Apr 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants