New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
InverseMelScale Implementation #366
Conversation
If the code is in progress, you can prefix the pull request with [WIP] to indicate it is not yet ready for review/merging. See also comment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this! This needs testing for jittability and correctness.
It would also be interesting to see how far InverseMelScale(MelScale(x)) is from x and MelScale(InverseMelScale(y)) from y.
The automated testing here picks up flake8 errors.
|
|
No longer the case; using SGD for optimization yields best results when auditioned, along with the lowest and most stable loss values. JITability will have to suffer, I guess? |
What do you mean "best results"? Can it be quantified? Can we compare the outputs with librosa, SGD and SVD? If the goal is to compute the inverse of a matrix, what other ways have you tried? inverse, pseudo-inverse? (btw this post may be relevant) In general, I'd say jitability is important, and leave the code snippet with SGD in a comment here, and a TODO note in code with a link to here for when optimizers become jitable too :) |
Okay, I'll try to post reasonable test results between original, SVD, and SGD examples. As a heads up--it's easier when you look at the resulting spectrograms of the tests, which shows a stark difference (mostly by having reasonable data where the missing bases result in random data on the case of SVD or pseudo-inverse methods). |
Okay, I have some results here, although they aren't quantitative ones. They generally favor solving by SGD, although the difference between SGD and SVD methods are much smaller if
@vincentqb Should I still just progress with SVD methods? |
Thanks for running those tests.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR needs a (quantitative) test against a reference, such as librosa.
|
Qualitatively, it's clear the the SVD implementation wouldn't do, so we'll end up with the SGD implementation even though that prevents the implementation from being jitable. Let's look at
This will help us see the average difference and max difference. We'll need to add a test to make sure that the |
Sorry, was on a long trip, will be finishing up this feature in the next few days. |
@vincentqb here are the
The above was with EDIT: I have done the same with other It looks like it is much more difficult to show "similarity" of audio than I had previously thought. |
The Now for the test I'd say it makes more sense to compare the difference between:
if we are going to move ahead with SGDInverse. |
Yes, based on these results, let's use the SGD variant. Thanks for providing a gist in comment with SVD. There's an alternative version of SVD coming up in pytorch/pytorch#29488, but we can revisit after the latter is merged.
Oops, this was a typo :) I meant |
Let's make the tests green :) Can you rebase your code onto master? |
yep done! |
Thanks! Can you do the following?
|
Hi 👋 I am working to wrap up this nice PR. (@jaeyeun97 let me know if you have a compelling reason that I should not.) Following @vincentqb 's suggestion, I extracted the main part of So I had offline discussion with @vincentqb, and we agreed that it's okay to leave both The followings are the points brought up;
So I will keep the implementation of InverseMelScale transform as is and will add test for numerical compatibility to supplement this PR. |
@mthrok Ah sorry, I decided to seriously look for jobs now so that took priority.
I was also hesitant on moving code to functional because of this issue, thanks for sorting it out! |
This PR follows up pytorch#366 and adds test for `InverseMelScale` (and `MelScale`) for librosa compatibility. For `MelScale` compatibility test; 1. Generate spectrogram 2. Feed the spectrogram to `torchaudio.transforms.MelScale` instance 3. Feed the spectrogram to `librosa.feature.melspectrogram` function. 4. Compare the result from 2 and 3 elementwise. Element-wise numerical comparison is possible because under the hood their implementations use the same algorith. For `InverseMelScale` compatibility test, it is more elaborated than that. 1. Generate the original spectrogram 2. Convert the original spectrogram to Mel scale using `torchaudio.transforms.MelScale` instance 3. Reconstruct spectrogram using torchaudio implementation 3.1. Feed the Mel spectrogram to `torchaudio.transforms.InverseMelScale` instance and get reconstructed spectrogram. 3.2. Compute the sum of element-wise P1 distance of the original spectrogram and that from 3.1. 4. Reconstruct spectrogram using librosa 4.1. Feed the Mel spectrogram to `librosa.feature.inverse.mel_to_stft` function and get reconstructed spectrogram. 4.2. Compute the sum of element-wise P1 distance of the original spectrogram and that from 4.1. (this is the reference.) 5. Check that resulting P1 distance are in a roughly same value range. Element-wise numerical comparison is not possible due to the difference algorithms used to compute the inverse. The reconstructed spectrograms can have some values vary in magnitude. Therefore the strategy here is to check that P1 distance (reconstruction loss) is not that different from the value obtained using `librosa`. For this purpose, threshold was empirically chosen ``` print('p1 dist (orig <-> ta):', torch.dist(spec_orig, spec_ta, p=1)) print('p1 dist (orig <-> lr):', torch.dist(spec_orig, spec_lr, p=1)) >>> p1 dist (orig <-> ta): tensor(1482.1917) >>> p1 dist (orig <-> lr): tensor(1420.7103) ``` This value can vary based on the length and the kind of the signal being processed, so it was handpicked.
@jaeyeun97 I see. Good luck with your job hunting! |
Some checks failed but they look like |
This PR follows up pytorch#366 and adds test for `InverseMelScale` (and `MelScale`) for librosa compatibility. For `MelScale` compatibility test; 1. Generate spectrogram 2. Feed the spectrogram to `torchaudio.transforms.MelScale` instance 3. Feed the spectrogram to `librosa.feature.melspectrogram` function. 4. Compare the result from 2 and 3 elementwise. Element-wise numerical comparison is possible because under the hood their implementations use the same algorith. For `InverseMelScale` compatibility test, it is more elaborated than that. 1. Generate the original spectrogram 2. Convert the original spectrogram to Mel scale using `torchaudio.transforms.MelScale` instance 3. Reconstruct spectrogram using torchaudio implementation 3.1. Feed the Mel spectrogram to `torchaudio.transforms.InverseMelScale` instance and get reconstructed spectrogram. 3.2. Compute the sum of element-wise P1 distance of the original spectrogram and that from 3.1. 4. Reconstruct spectrogram using librosa 4.1. Feed the Mel spectrogram to `librosa.feature.inverse.mel_to_stft` function and get reconstructed spectrogram. 4.2. Compute the sum of element-wise P1 distance of the original spectrogram and that from 4.1. (this is the reference.) 5. Check that resulting P1 distance are in a roughly same value range. Element-wise numerical comparison is not possible due to the difference algorithms used to compute the inverse. The reconstructed spectrograms can have some values vary in magnitude. Therefore the strategy here is to check that P1 distance (reconstruction loss) is not that different from the value obtained using `librosa`. For this purpose, threshold was empirically chosen ``` print('p1 dist (orig <-> ta):', torch.dist(spec_orig, spec_ta, p=1)) print('p1 dist (orig <-> lr):', torch.dist(spec_orig, spec_lr, p=1)) >>> p1 dist (orig <-> ta): tensor(1482.1917) >>> p1 dist (orig <-> lr): tensor(1420.7103) ``` This value can vary based on the length and the kind of the signal being processed, so it was handpicked.
* Inverse Mel Scale Implementation * Inverse Mel Scale Docs * Better working version. * GPU fix * These shouldn't go on git.. * Even better one, but does not support JITability. * Remove JITability test * Flake8 * n_stft is a must * minor clean up of initialization * Add librosa consistency test This PR follows up #366 and adds test for `InverseMelScale` (and `MelScale`) for librosa compatibility. For `MelScale` compatibility test; 1. Generate spectrogram 2. Feed the spectrogram to `torchaudio.transforms.MelScale` instance 3. Feed the spectrogram to `librosa.feature.melspectrogram` function. 4. Compare the result from 2 and 3 elementwise. Element-wise numerical comparison is possible because under the hood their implementations use the same algorith. For `InverseMelScale` compatibility test, it is more elaborated than that. 1. Generate the original spectrogram 2. Convert the original spectrogram to Mel scale using `torchaudio.transforms.MelScale` instance 3. Reconstruct spectrogram using torchaudio implementation 3.1. Feed the Mel spectrogram to `torchaudio.transforms.InverseMelScale` instance and get reconstructed spectrogram. 3.2. Compute the sum of element-wise P1 distance of the original spectrogram and that from 3.1. 4. Reconstruct spectrogram using librosa 4.1. Feed the Mel spectrogram to `librosa.feature.inverse.mel_to_stft` function and get reconstructed spectrogram. 4.2. Compute the sum of element-wise P1 distance of the original spectrogram and that from 4.1. (this is the reference.) 5. Check that resulting P1 distance are in a roughly same value range. Element-wise numerical comparison is not possible due to the difference algorithms used to compute the inverse. The reconstructed spectrograms can have some values vary in magnitude. Therefore the strategy here is to check that P1 distance (reconstruction loss) is not that different from the value obtained using `librosa`. For this purpose, threshold was empirically chosen ``` print('p1 dist (orig <-> ta):', torch.dist(spec_orig, spec_ta, p=1)) print('p1 dist (orig <-> lr):', torch.dist(spec_orig, spec_lr, p=1)) >>> p1 dist (orig <-> ta): tensor(1482.1917) >>> p1 dist (orig <-> lr): tensor(1420.7103) ``` This value can vary based on the length and the kind of the signal being processed, so it was handpicked. * Address review feedbacks * Support arbitrary batch dimensions. * Add batch test * Use view for batch * fix sgd * Use negative indices and update docstring * Update threshold Co-authored-by: Charles J.Y. Yoon <jaeyeun97@gmail.com>
InverseMelScale implementation for #351
Test codes are coming!
Edit: It would be great if someone could chime in on the test code guideline for torchaudio?