-
Notifications
You must be signed in to change notification settings - Fork 944
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Synthetic phase for inverse transforms #434
Comments
Sounds like a neat idea. DAn. On Fri, Nov 4, 2016 at 10:54 AM, Brian McFee notifications@github.com
|
I'm using this implementation of Griffin-Lim algorithm, and the restored audio sounds perceivably better than random/from-white-noise initialization of phases. def griffinlim(spectrogram, n_iter = 100, window = 'hann', n_fft = 2048, hop_length = -1, verbose = False):
if hop_length == -1:
hop_length = n_fft // 4
angles = np.exp(2j * np.pi * np.random.rand(*spectrogram.shape))
t = tqdm(range(n_iter), ncols=100, mininterval=2.0, disable=not verbose)
for i in t:
full = np.abs(spectrogram).astype(np.complex) * angles
inverse = librosa.istft(full, hop_length = hop_length, window = window)
rebuilt = librosa.stft(inverse, n_fft = n_fft, hop_length = hop_length, window = window)
angles = np.exp(1j * np.angle(rebuilt))
if verbose:
diff = np.abs(spectrogram) - np.abs(rebuilt)
t.set_postfix(loss=np.linalg.norm(diff, 'fro'))
full = np.abs(spectrogram).astype(np.complex) * angles
inverse = librosa.istft(full, hop_length = hop_length, window = window)
return inverse Is it something to be avoided due to the slow speed? In that case, I think @Jonathan-LeRoux's algorithm would be a nice addition, although I found it hard to write it efficiently only using Python. |
+1 for having Griffin-Lim + x (other phase reconstruction methods from magnitude spectrograms, e.g., https://ai2-s2-pdfs.s3.amazonaws.com/2042/3dffa92efd5371489e6b11b22779b0a2fc85.pdf) in |
How timely! I have been in the process of preparing my old phase reconstruction C code for release. |
Cython is the go-to these days, and it's the easiest to package and ship in my experience. |
Not necessarily -- the initial proposal here was for a generic phase reconstruction method that could apply both to i/stft and i/cqt. I'm not sure GL would work on CQT, but it's worth a shot. |
Thanks Brian, I'll give Cython a try. |
I tried Griffin-Lim algorithm in Constant Q, the results is bad. |
Check this out for a fast-and-not-too-dirty reconstruction: |
I uploaded my phase reconstruction code on Github and Pypi: |
Curious, what use cases are there when the phase from the forward transform can't be kept around for doing the inverse? 😲 |
The main one I have in mind is sonifying samples from a generative model of magnitude spectra. |
Wow, 0.6.1 milestone for phase retrieval ? That's pretty bold. I like it. Demo is here: http://www-math.mit.edu/~waldspur//wavelets_phase_retrieval.html |
Yes. This is even more difficult than inversion from CQT modulus because we have no guarantee that there exists a solution in the reproducing kernel Hilbert space (RKHS) associated to the CQT operator such that the modulus of the solution will yield the expected magnitude spectra. |
It's also very welcomed if we can have inverse transform |
Hey @jongwook can you please provide a concrete example of how to use the Griffin-Lim function? I am using the following code and it is throwing a inconsistent shape error while multiplying spectrogram and angles.
The error message is this
|
@deepanwayx this is not a Q&A board, but it seems that you're passing a Mel spectrogram to the function, which is designed to deal with the usual STFT spectrograms. They have different default set of parameters, so the shapes are different. Inverting a Mel spectrogram without any prior is ill-defined in general; you may try Tacotron 2 for inverting Mel spectrograms of speech |
Sharing the link to support @bmcfee idea of "sonifying samples from a generative model of magnitude spectra.", or related transforms. |
I'm having this issue when doing Audio Source Separation. The input to my neural net is a magnitude STFT features of the mixture and the output is the magnitude STFT features of the separated source. |
@vinaykumar2491 read this message: #434 (comment) |
Has anyone implemented the "phase retrieval for wavelet transforms" in python? |
@jeremyworger not that I know of, but it would be of immense value. |
This thread's spiraled pretty far off from the original issue, but that's okay! I've spent a decent amount of time this week hacking on various things in the space of spectrogram inversion using a slightly modified version of @jongwook's griffin-lim implementation. Despite having better options for phase retrieval out there (mentioned above in this thread), I still think it would be good to have a reference GL to rely on. With that in hand, we could easily provide (approximate) mfcc and mel inverters using pseudo-inverses; I have prototypes in local notebooks, and they sound not-perfect but not horrible either. I'll create a new issue with a narrowed down checklist of deliverables to include in the 0.7 release. |
The recently merged #843 includes (fast) griffin-lim, which I think covers most of the content of this issue. It does not (currently) support cqt phase retrieval, but we can add that later if there's demand. Any objection to closing this out? |
yes, let's close this one. then we can release griffin-lim in the next stable version, collect some user feedback, and eventually open a CQT-related issue around that feedback. |
the Griffin-Lim method causes unnatural artifacts in synthetic speech. |
Would it be useful to have inverse transforms
istft
,icqt
#165 synthesize phase when the input is magnitude spectra? @dpwe's repsonse to #424 provides an example of how to do this for inverting MFCCs by transferring the phase of the corresponding forward transform of white noise.I've test-driven this on my
icqt
prototype, and it sounds pretty good; much better than a magnitude-only reconstruction.It's a bit of a nuisance to do this by hand since the parameters and duration need to be matched to the input signal. It would be easy to do this from within the inverse transform though, since that information is all present. I'm thinking an optional (defaulting to False) parameter.
Thoughts?
The text was updated successfully, but these errors were encountered: