InverseMelScale Implementation #366

jaeyeun97 · 2019-12-13T18:10:48Z

InverseMelScale implementation for #351
Test codes are coming!

Edit: It would be great if someone could chime in on the test code guideline for torchaudio?

vincentqb · 2019-12-13T19:28:04Z

If the code is in progress, you can prefix the pull request with [WIP] to indicate it is not yet ready for review/merging.

See also comment.

vincentqb

Thanks for working on this! This needs testing for jittability and correctness.

It would also be interesting to see how far InverseMelScale(MelScale(x)) is from x and MelScale(InverseMelScale(y)) from y.

vincentqb · 2019-12-14T23:03:47Z

The automated testing here picks up flake8 errors.

./torchaudio/transforms.py:206:9: E303 too many blank lines (2)
./torchaudio/transforms.py:206:51: W291 trailing whitespace
./torchaudio/transforms.py:210:1: W293 blank line contains whitespace
./torchaudio/transforms.py:222:37: W291 trailing whitespace

…lscale

jaeyeun97 · 2019-12-15T11:57:57Z

InverseMelScale code changed significantly to be JITable--it now uses SVD to compute the inverse of the Mel basis matrix.

jaeyeun97 · 2019-12-15T16:27:55Z

No longer the case; using SGD for optimization yields best results when auditioned, along with the lowest and most stable loss values. JITability will have to suffer, I guess?

vincentqb · 2019-12-18T15:31:09Z

No longer the case; using SGD for optimization yields best results when auditioned, along with the lowest and most stable loss values. JITability will have to suffer, I guess?

What do you mean "best results"? Can it be quantified? Can we compare the outputs with librosa, SGD and SVD? If the goal is to compute the inverse of a matrix, what other ways have you tried? inverse, pseudo-inverse? (btw this post may be relevant)

In general, I'd say jitability is important, and leave the code snippet with SGD in a comment here, and a TODO note in code with a link to here for when optimizers become jitable too :)

jaeyeun97 · 2019-12-18T20:00:17Z

Okay, I'll try to post reasonable test results between original, SVD, and SGD examples.

As a heads up--it's easier when you look at the resulting spectrograms of the tests, which shows a stark difference (mostly by having reasonable data where the missing bases result in random data on the case of SVD or pseudo-inverse methods).

…lscale

jaeyeun97 · 2019-12-25T17:25:46Z

Okay, I have some results here, although they aren't quantitative ones. They generally favor solving by SGD, although the difference between SGD and SVD methods are much smaller if n_fft == 2^n.

n_fft == 400 and n_mels == 50:

n_fft == 512 and n_mels == 64:

n_fft == 2048 and n_mels == 256:

@vincentqb Should I still just progress with SVD methods?

vincentqb · 2019-12-26T14:41:49Z

Thanks for running those tests.

Can you provide the codes for each version (e.g. using SVD and SGD)? in the comment here is fine.
We need to quantify the differences, though I do agree that we'll need to go with SGD if we measure that SVD is not stable.
Do you have a reference we can compare against? Something like librosa's that we could introduce as a test.

vincentqb

This PR needs a (quantitative) test against a reference, such as librosa.

…lscale

jaeyeun97 · 2019-12-30T13:57:27Z

You can find the code at https://gist.github.com/jaeyeun97/8651dff509d5b084636ac6c3a7547108, where the SGD version is the one you can currently find in this branch.
Comparing with the Librosa version I'd say SGD is the right way to go. Librosa uses L-BGFS for solving the minimum squared difference as well, so having such a process here seems natural. The only quantifying measure I can think of is comparing the Euclidean distance (torch.dist?) between the proposed method, librosa's method, and the original spectrogram, since torch.allclose does not really work on any of the methods.

jaeyeun97 · 2019-12-30T14:20:45Z

The following shows the results including librosa:

n_fft == 400, n_mels == 50:

n_fft == 2048, n_mels == 256:

vincentqb · 2020-01-17T16:27:04Z

Qualitatively, it's clear the the SVD implementation wouldn't do, so we'll end up with the SGD implementation even though that prevents the implementation from being jitable.

Let's look at torch.dist with p=0 and p=1 between

LibrosaInverse(melspec) and SGDInverse(melspec)
LibrosaInverse(melspec) and SVDInverse(melspec)

This will help us see the average difference and max difference. We'll need to add a test to make sure that the p=0 or p=1 distance between librosa and torchaudio is smaller than some threshold.

jaeyeun97 · 2020-01-27T18:41:16Z

Sorry, was on a long trip, will be finishing up this feature in the next few days.

jaeyeun97 · 2020-02-13T04:30:58Z

@vincentqb here are the torch.dist results:

	SGDInverse	SVDInverse
p=0	522046.	516637.
p=1	14831.9863	1549607.8750
p=2	77.0535	14585.5244

The above was with n_fft == 400 which may be the reason for such large differences, will test on other values and see what happens.

EDIT: I have done the same with other nfft values. While the p=2 values show a much smaller difference for case (i) than case (ii), I see little difference in p=1 or p=0 values. I am not quite sure why p=0 may be significant, since it is very unlikely (or so I presume) for any corresponding "cells" to be exactly the same after a numerical solving process.

It looks like it is much more difficult to show "similarity" of audio than I had previously thought.

…lscale

jaeyeun97 · 2020-02-14T05:09:49Z

The p=1 difference above should be enough to justify SGD?

Now for the test I'd say it makes more sense to compare the difference between:

torch.dist(LibrosaInverse(MelScale(spec), spec))
torch.dist(SGDInverse(MelScale(spec), spec))

if we are going to move ahead with SGDInverse.

vincentqb · 2020-02-14T19:09:39Z

Yes, based on these results, let's use the SGD variant. Thanks for providing a gist in comment with SVD. There's an alternative version of SVD coming up in pytorch/pytorch#29488, but we can revisit after the latter is merged.

I am not quite sure why p=0 may be significant, since it is very unlikely (or so I presume) for any corresponding "cells" to be exactly the same after a numerical solving process.

Oops, this was a typo :) I meant p=1 and p=2. Thanks for providing them!

vincentqb · 2020-02-14T19:12:12Z

Let's make the tests green :) Can you rebase your code onto master?

jaeyeun97 · 2020-02-15T05:51:14Z

Let's make the tests green :) Can you rebase your code onto master?

yep done!

vincentqb · 2020-02-18T16:20:21Z

Thanks! Can you do the following?

~~We put the computation part in torchaudio.functional and then wrap this in torchaudio.transforms. The documentation also needs to be updated for functionals.~~
We need tests (e.g. test/test_functional.py and test/test_transform.py) for correctness. Can you add in the corresponding test file Inverse(specgram) is close to librosa?
~~We need to show consistency: (1) Inverse(MelSpecgram(waveform)) is close to waveform, and (2) MelSpecgram(Inverse(specgram)) is close to specgram.~~

mthrok · 2020-02-26T20:45:30Z

Hi 👋

I am working to wrap up this nice PR. (@jaeyeun97 let me know if you have a compelling reason that I should not.)

Following @vincentqb 's suggestion, I extracted the main part of InverseMelScale from transform module to functional module, and then I noticed that functional module does not have an equivalent of MelScale implementation there.

So I had offline discussion with @vincentqb, and we agreed that it's okay to leave both MelScale and InverseMelScale in transform module.

The followings are the points brought up;

MelScale is creating filter banks and applying it with matmul. Filter bank generation is defined in functional module. So it's mostly done in functional module.
The consistency across MelScale and InverseMelScale is important here. (I was writing a numerical compatibility test for functional version of InverseMelScale, but writing it without an equivalent of MelScale was weird and code looks ugly.)

So I will keep the implementation of InverseMelScale transform as is and will add test for numerical compatibility to supplement this PR.

jaeyeun97 · 2020-02-27T06:24:27Z

@mthrok Ah sorry, I decided to seriously look for jobs now so that took priority.

I noticed that functional module does not have an equivalent of MelScale implementation there.

I was also hesitant on moving code to functional because of this issue, thanks for sorting it out!

This PR follows up pytorch#366 and adds test for `InverseMelScale` (and `MelScale`) for librosa compatibility. For `MelScale` compatibility test; 1. Generate spectrogram 2. Feed the spectrogram to `torchaudio.transforms.MelScale` instance 3. Feed the spectrogram to `librosa.feature.melspectrogram` function. 4. Compare the result from 2 and 3 elementwise. Element-wise numerical comparison is possible because under the hood their implementations use the same algorith. For `InverseMelScale` compatibility test, it is more elaborated than that. 1. Generate the original spectrogram 2. Convert the original spectrogram to Mel scale using `torchaudio.transforms.MelScale` instance 3. Reconstruct spectrogram using torchaudio implementation 3.1. Feed the Mel spectrogram to `torchaudio.transforms.InverseMelScale` instance and get reconstructed spectrogram. 3.2. Compute the sum of element-wise P1 distance of the original spectrogram and that from 3.1. 4. Reconstruct spectrogram using librosa 4.1. Feed the Mel spectrogram to `librosa.feature.inverse.mel_to_stft` function and get reconstructed spectrogram. 4.2. Compute the sum of element-wise P1 distance of the original spectrogram and that from 4.1. (this is the reference.) 5. Check that resulting P1 distance are in a roughly same value range. Element-wise numerical comparison is not possible due to the difference algorithms used to compute the inverse. The reconstructed spectrograms can have some values vary in magnitude. Therefore the strategy here is to check that P1 distance (reconstruction loss) is not that different from the value obtained using `librosa`. For this purpose, threshold was empirically chosen ``` print('p1 dist (orig <-> ta):', torch.dist(spec_orig, spec_ta, p=1)) print('p1 dist (orig <-> lr):', torch.dist(spec_orig, spec_lr, p=1)) >>> p1 dist (orig <-> ta): tensor(1482.1917) >>> p1 dist (orig <-> lr): tensor(1420.7103) ``` This value can vary based on the length and the kind of the signal being processed, so it was handpicked.

mthrok · 2020-02-28T15:21:42Z

@jaeyeun97 I see. Good luck with your job hunting!

jaeyeun97 · 2020-02-28T15:32:52Z

@mthrok Thanks! Just reviewed #448 and looks good to me.

jaeyeun97 · 2020-02-28T16:55:09Z

Some checks failed but they look like curl failures.

…lscale

This PR follows up pytorch#366 and adds test for `InverseMelScale` (and `MelScale`) for librosa compatibility. For `MelScale` compatibility test; 1. Generate spectrogram 2. Feed the spectrogram to `torchaudio.transforms.MelScale` instance 3. Feed the spectrogram to `librosa.feature.melspectrogram` function. 4. Compare the result from 2 and 3 elementwise. Element-wise numerical comparison is possible because under the hood their implementations use the same algorith. For `InverseMelScale` compatibility test, it is more elaborated than that. 1. Generate the original spectrogram 2. Convert the original spectrogram to Mel scale using `torchaudio.transforms.MelScale` instance 3. Reconstruct spectrogram using torchaudio implementation 3.1. Feed the Mel spectrogram to `torchaudio.transforms.InverseMelScale` instance and get reconstructed spectrogram. 3.2. Compute the sum of element-wise P1 distance of the original spectrogram and that from 3.1. 4. Reconstruct spectrogram using librosa 4.1. Feed the Mel spectrogram to `librosa.feature.inverse.mel_to_stft` function and get reconstructed spectrogram. 4.2. Compute the sum of element-wise P1 distance of the original spectrogram and that from 4.1. (this is the reference.) 5. Check that resulting P1 distance are in a roughly same value range. Element-wise numerical comparison is not possible due to the difference algorithms used to compute the inverse. The reconstructed spectrograms can have some values vary in magnitude. Therefore the strategy here is to check that P1 distance (reconstruction loss) is not that different from the value obtained using `librosa`. For this purpose, threshold was empirically chosen ``` print('p1 dist (orig <-> ta):', torch.dist(spec_orig, spec_ta, p=1)) print('p1 dist (orig <-> lr):', torch.dist(spec_orig, spec_lr, p=1)) >>> p1 dist (orig <-> ta): tensor(1482.1917) >>> p1 dist (orig <-> lr): tensor(1420.7103) ``` This value can vary based on the length and the kind of the signal being processed, so it was handpicked.

* Inverse Mel Scale Implementation * Inverse Mel Scale Docs * Better working version. * GPU fix * These shouldn't go on git.. * Even better one, but does not support JITability. * Remove JITability test * Flake8 * n_stft is a must * minor clean up of initialization * Add librosa consistency test This PR follows up #366 and adds test for `InverseMelScale` (and `MelScale`) for librosa compatibility. For `MelScale` compatibility test; 1. Generate spectrogram 2. Feed the spectrogram to `torchaudio.transforms.MelScale` instance 3. Feed the spectrogram to `librosa.feature.melspectrogram` function. 4. Compare the result from 2 and 3 elementwise. Element-wise numerical comparison is possible because under the hood their implementations use the same algorith. For `InverseMelScale` compatibility test, it is more elaborated than that. 1. Generate the original spectrogram 2. Convert the original spectrogram to Mel scale using `torchaudio.transforms.MelScale` instance 3. Reconstruct spectrogram using torchaudio implementation 3.1. Feed the Mel spectrogram to `torchaudio.transforms.InverseMelScale` instance and get reconstructed spectrogram. 3.2. Compute the sum of element-wise P1 distance of the original spectrogram and that from 3.1. 4. Reconstruct spectrogram using librosa 4.1. Feed the Mel spectrogram to `librosa.feature.inverse.mel_to_stft` function and get reconstructed spectrogram. 4.2. Compute the sum of element-wise P1 distance of the original spectrogram and that from 4.1. (this is the reference.) 5. Check that resulting P1 distance are in a roughly same value range. Element-wise numerical comparison is not possible due to the difference algorithms used to compute the inverse. The reconstructed spectrograms can have some values vary in magnitude. Therefore the strategy here is to check that P1 distance (reconstruction loss) is not that different from the value obtained using `librosa`. For this purpose, threshold was empirically chosen ``` print('p1 dist (orig <-> ta):', torch.dist(spec_orig, spec_ta, p=1)) print('p1 dist (orig <-> lr):', torch.dist(spec_orig, spec_lr, p=1)) >>> p1 dist (orig <-> ta): tensor(1482.1917) >>> p1 dist (orig <-> lr): tensor(1420.7103) ``` This value can vary based on the length and the kind of the signal being processed, so it was handpicked. * Address review feedbacks * Support arbitrary batch dimensions. * Add batch test * Use view for batch * fix sgd * Use negative indices and update docstring * Update threshold Co-authored-by: Charles J.Y. Yoon <jaeyeun97@gmail.com>

jaeyeun97 added 2 commits December 14, 2019 02:08

Inverse Mel Scale Implementation

d628f57

Inverse Mel Scale Docs

52b8de4

jaeyeun97 changed the title ~~Feature InverseMelScale~~ Inverse of MelScale Dec 13, 2019

vincentqb self-requested a review December 13, 2019 19:26

jaeyeun97 changed the title ~~Inverse of MelScale~~ [WIP] Inverse of MelScale Dec 13, 2019

vincentqb suggested changes Dec 14, 2019

View reviewed changes

jaeyeun97 added 3 commits December 15, 2019 20:37

Better working version.

c2a5ba2

GPU fix

a56e53d

Merge remote-tracking branch 'upstream/master' into feature_inverseme…

ee67105

…lscale

jaeyeun97 added 4 commits December 15, 2019 21:04

These shouldn't go on git..

b9207a1

Even better one, but does not support JITability.

f4c74e4

Remove JITability test

4aef817

Flake8

4557017

Merge remote-tracking branch 'upstream/master' into feature_inverseme…

b72864e

…lscale

jaeyeun97 requested a review from vincentqb December 24, 2019 10:40

n_stft is a must

81faf16

vincentqb suggested changes Dec 26, 2019

View reviewed changes

Merge remote-tracking branch 'upstream/master' into feature_inverseme…

1b917bd

…lscale

Merge remote-tracking branch 'upstream/master' into feature_inverseme…

12a5214

…lscale

mthrok mentioned this pull request Feb 27, 2020

Add test for InverseMelScale #448

Merged

jaeyeun97 changed the title ~~[WIP] Inverse of MelScale~~ InverseMelScale Implementation Feb 28, 2020

Support arbitrary batch dimensions.

590939e

Merge remote-tracking branch 'upstream/master' into feature_inverseme…

24afa42

…lscale

vincentqb closed this in #448 Feb 28, 2020

jaeyeun97 mentioned this pull request Jan 16, 2021

Issues with transforms.InverseMelScale #1149

Open

mthrok mentioned this pull request Oct 19, 2021

InverseMelScale does not work in inference mode #1902

Closed

mthrok mentioned this pull request Aug 23, 2022

Implement L-BFGS-B optimizer and update InverseMelScale #2643

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

InverseMelScale Implementation #366

InverseMelScale Implementation #366

jaeyeun97 commented Dec 13, 2019 •

edited

vincentqb commented Dec 13, 2019

vincentqb left a comment

vincentqb commented Dec 14, 2019 •

edited

jaeyeun97 commented Dec 15, 2019

jaeyeun97 commented Dec 15, 2019

vincentqb commented Dec 18, 2019

jaeyeun97 commented Dec 18, 2019

jaeyeun97 commented Dec 25, 2019 •

edited

vincentqb commented Dec 26, 2019

vincentqb left a comment •

edited

jaeyeun97 commented Dec 30, 2019

jaeyeun97 commented Dec 30, 2019

vincentqb commented Jan 17, 2020 •

edited

jaeyeun97 commented Jan 27, 2020

jaeyeun97 commented Feb 13, 2020 •

edited

jaeyeun97 commented Feb 14, 2020 •

edited

vincentqb commented Feb 14, 2020

vincentqb commented Feb 14, 2020

jaeyeun97 commented Feb 15, 2020

vincentqb commented Feb 18, 2020 •

edited

mthrok commented Feb 26, 2020

jaeyeun97 commented Feb 27, 2020

mthrok commented Feb 28, 2020

jaeyeun97 commented Feb 28, 2020

jaeyeun97 commented Feb 28, 2020

InverseMelScale Implementation #366

InverseMelScale Implementation #366

Conversation

jaeyeun97 commented Dec 13, 2019 • edited

vincentqb commented Dec 13, 2019

vincentqb left a comment

Choose a reason for hiding this comment

vincentqb commented Dec 14, 2019 • edited

jaeyeun97 commented Dec 15, 2019

jaeyeun97 commented Dec 15, 2019

vincentqb commented Dec 18, 2019

jaeyeun97 commented Dec 18, 2019

jaeyeun97 commented Dec 25, 2019 • edited

vincentqb commented Dec 26, 2019

vincentqb left a comment • edited

Choose a reason for hiding this comment

jaeyeun97 commented Dec 30, 2019

jaeyeun97 commented Dec 30, 2019

vincentqb commented Jan 17, 2020 • edited

jaeyeun97 commented Jan 27, 2020

jaeyeun97 commented Feb 13, 2020 • edited

jaeyeun97 commented Feb 14, 2020 • edited

vincentqb commented Feb 14, 2020

vincentqb commented Feb 14, 2020

jaeyeun97 commented Feb 15, 2020

vincentqb commented Feb 18, 2020 • edited

mthrok commented Feb 26, 2020

jaeyeun97 commented Feb 27, 2020

mthrok commented Feb 28, 2020

jaeyeun97 commented Feb 28, 2020

jaeyeun97 commented Feb 28, 2020

jaeyeun97 commented Dec 13, 2019 •

edited

vincentqb commented Dec 14, 2019 •

edited

jaeyeun97 commented Dec 25, 2019 •

edited

vincentqb left a comment •

edited

vincentqb commented Jan 17, 2020 •

edited

jaeyeun97 commented Feb 13, 2020 •

edited

jaeyeun97 commented Feb 14, 2020 •

edited

vincentqb commented Feb 18, 2020 •

edited