Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ISTFT #27

Open
keunwoochoi opened this issue Mar 14, 2019 · 24 comments
Open

ISTFT #27

keunwoochoi opened this issue Mar 14, 2019 · 24 comments

Comments

@keunwoochoi
Copy link
Owner

https://gist.github.com/keunwoochoi/2f349e72cc941f6f10d4adf9b0d3f37e

It works, but I'll probably not gonna make a PR to Pytorch (pytorch/pytorch#3775) because it's written in Python (torch.stft is written in C). Just for a future reference/usage.

@faroit
Copy link
Collaborator

faroit commented Mar 14, 2019

Great. Did you did test the reconstruction error?

@seungwonpark
Copy link

I think line 33-40 can be replaced with transpose convolution. (I've done that in my private repository, but also didn't make a PR to PyTorch since it's not written in C like torch.stft.)
Probably I could open-source my private implementation of inverse-STFT after some code refactoring. That is also based on librosa implementation of istft. Will that be helpful for us?

@keunwoochoi
Copy link
Owner Author

reconstruction error

@faroit Could you possibly test by yourself? Because I did, and the error was 0. Not sure if it's for real..

transpose convolution

@seungwonpark Yes. I think there's pros/cons though. torch.rfft would be a complexity of n log n in computation while transpose convolution would be n ** 2. But transpose conv layer might outspeed due to a better parallelization -- although the current approach can be also framed (with reshaping) first to remove the loop, and probably the torch.rfft is only executed for a couple of times.

That said, we can simply time it and compare :) so yes, please share it! Even better if you can measure them.

@seungwonpark
Copy link

@keunwoochoi Thanks for pointing out the difference! I'll let you know when I'm ready to share.

@seungwonpark
Copy link

Here it is. I've also measured the time performance, and looks like your implementation is much faster!
https://github.com/seungwonpark/istft-pytorch

By the way, I can't find out the actual difference between them. The major difference is that mine uses ifft and yours uses irfft. So, I've tried using irfft instead of ifft in my implementation, but no significant difference was found.

@faroit
Copy link
Collaborator

faroit commented Mar 14, 2019

@keunwoochoi cool, I tested yours in a small experiment where I run STFT of one library (librosa, scipy, tf, torch) against ISTFT of another and measured the reconstruction RMSE.

ISTFT --> ISTFT RMSE
=================================
test_torch --> test_torch 0.21650636
test_torch --> test_scipy 1.08139886e-07
test_torch --> test_librosa 1.1001805855710549e-07
test_torch --> test_tf 1.9412305e-06
test_scipy --> test_torch 0.21650635
test_scipy --> test_scipy 7.398381e-08
test_scipy --> test_librosa 7.647041059862848e-08
test_scipy --> test_tf 1.9329398e-06
test_librosa --> test_torch 0.21650635
test_librosa --> test_scipy 2.0918270288343244e-16
test_librosa --> test_librosa 2.294981695912998e-16
test_librosa --> test_tf 1.9328936e-06
test_tf --> test_torch 0.21650644
test_tf --> test_scipy 4.216027e-06
test_tf --> test_librosa 4.220019682546531e-06
test_tf --> test_tf 4.694827e-06

i think this is because of the combination of half overlap (you used //4) the hann window and the not existing normalization. Anyway this is easy to fix...

@faroit
Copy link
Collaborator

faroit commented Mar 14, 2019

@f0k btw. do you know a good way to write a unit test for STFT/ISTFT errors due to windowing when masking was applied before the ISTFT, e.g. normalizing using the window instead of the squared window....? I was always wondering how to catch these....

@keunwoochoi
Copy link
Owner Author

It’s probably due to the boundary. I don’t pre-pad, if you ignore the first and last half-windows in the eval it should be much better for the moment.

@nkundiushuti
Copy link

it would be interesting to compare with https://github.com/wslihgt/pyfasst/tree/master/pyfasst/tftransforms
I have used as a baseline the stft implementation of JL Durrieu which uses numpy. I think madmom (JKU Linz audio framework) also uses something similar.
oh btw you can find some tentative implementation of cqt using nsgt. but it's not reliable. and I don't know how you could port it in pytorch

@faroit
Copy link
Collaborator

faroit commented Mar 14, 2019

it would be interesting to compare with https://github.com/wslihgt/pyfasst/tree/master/pyfasst/tftransforms

cool. There also https://github.com/nils-werner/stft which I used for quite a while. I think we should stick to scipy.signal and librosa since they are heavily tested. However, I'm super interested to spot the differences and discuss which one has the best reconstruction performance.. but maybe thats out of scope for this repo ;-)

@seungwonpark
Copy link

Hi, recently we've found out that our previous time profiling code in my repo for comparing Choi's implementation(rfft) and mine(deconvolution with parallel ops) was wrong, and fixed it.

And the new conclusion is: deconvolution is much faster!
Please check https://github.com/seungwonpark/istft-pytorch.

@faroit
Copy link
Collaborator

faroit commented May 2, 2019

I think we should close this then and move this initiative over to pytorch/pytorch#3775

@seungwonpark are you willing to help out over there?

@keunwoochoi
Copy link
Owner Author

Yeah, we could close it, maybe re-open once we wanna work on a wrapper for it.

@faroit faroit closed this as completed May 2, 2019
@keunwoochoi
Copy link
Owner Author

@faroit Hey, so after closing it I think we sort of lost the track. I don't think it's actually necessary to wait for the pytorch istft. There are also some implementation details (e.g., deconv like @seungwonpark's vs rfft as I've done), but maybe we have our own tentative implementation with a less-tentative API? Seems like critical issues were resolved (faroit/stft-istft-experiments#1), and we can follow the API of STFT https://github.com/keunwoochoi/torchaudio-contrib/blob/master/torchaudio_contrib/layers.py#L35.

@keunwoochoi keunwoochoi reopened this May 13, 2019
@keunwoochoi
Copy link
Owner Author

  • Along with add InverseSTFT we need more tests for STFT including to check if the stft result is correct.

@faroit
Copy link
Collaborator

faroit commented May 13, 2019

Okay, what about we open up an PR with an API proposal to address ISTFT in pytorch/pytorch#3775. Then we could

  • implement ISTFT using the deconv "hack" here
  • switch to a cleaner ISTFT op later without changing the API, once available in pytorch

@keunwoochoi
Copy link
Owner Author

opening up a PR

You mean a PR here? (Yes let's do it)
Or in pytorch? (Would they care about accepting an API without an implementation? Also, we can easily change it later, at least in this repo)

@faroit
Copy link
Collaborator

faroit commented May 13, 2019

On pytorch.

Yes we could just add this ISTFT but then it's rather hacky (using conv layers) and would mean it won't be that easy to make it clean and readable. And once we reached that someone would have to start from scratch using libtorch, thus it wouldn't particularly sustainable ;-)

@faroit faroit changed the title gist: inverse-STFT ISTFT May 13, 2019
@keunwoochoi
Copy link
Owner Author

Ok, I'm not really sure if they'd care an API only. Given the history, they've been adapting what had implemented outside of Pytorch quite actively (in other words, somehow blindly if it seems good), at least for the case of audio processing. So, how about we do it here, maybe

def istft(fft, hop, etc etc, mode='conv'):
    """wrapper for istft"""

def conv_istft(fft, hop, etc etc):
    """one of the options"""

?

It wouldn't block a follow-up dev while we refining it.

@keunwoochoi
Copy link
Owner Author

(continued)

..Or, BOTH?! Since we have control and flexibility here -- and the API for ISTFT would be pretty much the same to that of STFT, let me add a tentative one (or persuade @seungwonpark to add one here :) while asking Pytorch. @faroit Could you open a PR for that?

@drscotthawley
Copy link
Contributor

Minor clarification question: If the STFT coefficients are allowed to evolve & train via autograd (I assume they are, yes?), then is the idea of for this ISTFT module to be either

(a) sort of a 'trainable' ISTFT which may not remain a perfect inverse to the trainable STFT? i.e. it is independent from the STFT routine? (@seungwonpark's deconv version looks that way)
or
(b) a routine that is to be somehow mathematically tied to & the (trainable) STFT and always guaranteed to invert it 'perfectly'?

I'm guessing (a). Yes? Mainly because I have no idea how you'd do (b), and can think of plenty of uses for (a), and proper training could enforce arbitrary levels of inversion-quality if you wanted it.

@keunwoochoi
Copy link
Owner Author

I also guess it'd be (a) in the case, i.e., the provided ISTFT is a static, non-trained module.

@drscotthawley
Copy link
Contributor

Ohhh, static & non-trained. Interesting. Thanks for the clarification.

@keunwoochoi
Copy link
Owner Author

ISTFT is being developed in the main repo pytorch/audio#130

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants