Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Synthetic phase for inverse transforms #434

Closed
bmcfee opened this issue Nov 4, 2016 · 26 comments
Closed

Synthetic phase for inverse transforms #434

bmcfee opened this issue Nov 4, 2016 · 26 comments
Assignees
Labels
enhancement Does this improve existing functionality? functionality Does this add new functionality?

Comments

@bmcfee
Copy link
Member

bmcfee commented Nov 4, 2016

Would it be useful to have inverse transforms istft, icqt #165 synthesize phase when the input is magnitude spectra? @dpwe's repsonse to #424 provides an example of how to do this for inverting MFCCs by transferring the phase of the corresponding forward transform of white noise.

I've test-driven this on my icqt prototype, and it sounds pretty good; much better than a magnitude-only reconstruction.

It's a bit of a nuisance to do this by hand since the parameters and duration need to be matched to the input signal. It would be easy to do this from within the inverse transform though, since that information is all present. I'm thinking an optional (defaulting to False) parameter.

Thoughts?

@bmcfee bmcfee added enhancement Does this improve existing functionality? question Issues asking for help doing something functionality Does this add new functionality? good for beginners Are you new here? These issues are for you! labels Nov 4, 2016
@dpwe
Copy link
Contributor

dpwe commented Nov 4, 2016

Sounds like a neat idea.

DAn.

On Fri, Nov 4, 2016 at 10:54 AM, Brian McFee notifications@github.com
wrote:

Would it be useful to have inverse transforms istft, icqt #165
#165 synthetisize phase when
the input is magnitude spectra? @dpwe https://github.com/dpwe's
repsonse to #424 #424 provides
an example of how to do this for inverting MFCCs by transferring the phase
of the corresponding forward transform of white noise.

I've test-driven this on my icqt prototype, and it sounds pretty good;
much better than a magnitude-only reconstruction.

It's a bit of a nuisance to do this by hand though, since the parameters
and duration need to be matched to the input signal. It would be easy to do
this from within the inverse transform though, since that information is
all present. I'm thinking an optional (defaulting to False) parameter.

Thoughts?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#434, or mute the thread
https://github.com/notifications/unsubscribe-auth/AAhs0Tl3ak63TyL5OhUUoiTI_kxCodboks5q60dCgaJpZM4KpoH1
.

@bmcfee bmcfee added this to the 0.5 milestone Nov 4, 2016
@bmcfee bmcfee self-assigned this Nov 4, 2016
@bmcfee bmcfee removed good for beginners Are you new here? These issues are for you! question Issues asking for help doing something labels Dec 12, 2016
@bmcfee bmcfee modified the milestones: 0.5, 0.5.1 Dec 23, 2016
@jongwook
Copy link

jongwook commented Apr 3, 2017

I'm using this implementation of Griffin-Lim algorithm, and the restored audio sounds perceivably better than random/from-white-noise initialization of phases.

def griffinlim(spectrogram, n_iter = 100, window = 'hann', n_fft = 2048, hop_length = -1, verbose = False):
    if hop_length == -1:
        hop_length = n_fft // 4

    angles = np.exp(2j * np.pi * np.random.rand(*spectrogram.shape))

    t = tqdm(range(n_iter), ncols=100, mininterval=2.0, disable=not verbose)
    for i in t:
        full = np.abs(spectrogram).astype(np.complex) * angles
        inverse = librosa.istft(full, hop_length = hop_length, window = window)
        rebuilt = librosa.stft(inverse, n_fft = n_fft, hop_length = hop_length, window = window)
        angles = np.exp(1j * np.angle(rebuilt))

        if verbose:
            diff = np.abs(spectrogram) - np.abs(rebuilt)
            t.set_postfix(loss=np.linalg.norm(diff, 'fro'))

    full = np.abs(spectrogram).astype(np.complex) * angles
    inverse = librosa.istft(full, hop_length = hop_length, window = window)

    return inverse

Is it something to be avoided due to the slow speed? In that case, I think @Jonathan-LeRoux's algorithm would be a nice addition, although I found it hard to write it efficiently only using Python.

@stefan-balke
Copy link
Member

stefan-balke commented Apr 4, 2017

+1 for having Griffin-Lim + x (other phase reconstruction methods from magnitude spectrograms, e.g., https://ai2-s2-pdfs.s3.amazonaws.com/2042/3dffa92efd5371489e6b11b22779b0a2fc85.pdf) in librosa.

@Jonathan-LeRoux
Copy link

How timely! I have been in the process of preparing my old phase reconstruction C code for release.
The MEX version and accompanying Matlab script had been 95% ready for a year, and I finalized them last week, but I was considering preparing a Python wrapper before releasing the whole thing. I have never used C code in Python, and I was stopped in my tracks by the number of options. Any input on whether I should use ctypes, Cython, SWIG, or Python-C-Api? I'm looking for something easy and quick to code, and easy to use by the user.

@bmcfee
Copy link
Member Author

bmcfee commented Apr 4, 2017

Any input on whether I should use ctypes, Cython, SWIG, or Python-C-Api? I'm looking for something easy and quick to code, and easy to use by the user.

Cython is the go-to these days, and it's the easiest to package and ship in my experience.

@bmcfee
Copy link
Member Author

bmcfee commented Apr 4, 2017

Is it something to be avoided due to the slow speed?

Not necessarily -- the initial proposal here was for a generic phase reconstruction method that could apply both to i/stft and i/cqt. I'm not sure GL would work on CQT, but it's worth a shot.

@Jonathan-LeRoux
Copy link

Thanks Brian, I'll give Cython a try.

@diggerdu
Copy link

I tried Griffin-Lim algorithm in Constant Q, the results is bad.

@lonce
Copy link

lonce commented Apr 16, 2017

Check this out for a fast-and-not-too-dirty reconstruction:
https://github.com/lonce/SPSI_Python
Works great as an initial phase estimate for Griffin-Lim, too.

@bmcfee bmcfee modified the milestones: 0.6.0, 0.5.1 Apr 21, 2017
@Jonathan-LeRoux
Copy link

I uploaded my phase reconstruction code on Github and Pypi:
https://github.com/Jonathan-LeRoux/lws
The core is C/C++, with wrappers for both Python (via Cython) and Matlab (via Mex).

@carlthome
Copy link
Contributor

Curious, what use cases are there when the phase from the forward transform can't be kept around for doing the inverse? 😲

@bmcfee
Copy link
Member Author

bmcfee commented Aug 31, 2017

Curious, what use cases are there when the phase from the forward transform can't be kept around for doing the inverse?

The main one I have in mind is sonifying samples from a generative model of magnitude spectra.

@bmcfee bmcfee removed this from the 0.6.0 milestone Sep 1, 2017
@bmcfee bmcfee added this to the 0.6.1 milestone Feb 13, 2018
@lostanlen
Copy link
Contributor

lostanlen commented Feb 18, 2018

Wow, 0.6.1 milestone for phase retrieval ? That's pretty bold. I like it.
FYI, the state of the art is Irène Walspurger's 2015 paper "phase retrieval for wavelet transforms":
https://arxiv.org/abs/1512.07024

Demo is here: http://www-math.mit.edu/~waldspur//wavelets_phase_retrieval.html
It does much, much better than Griffin-Lim, but is probably much slower in its original implementation.
The MATLAB code is here: https://www.ceremade.dauphine.fr/~waldspurger/wav_phase_retrieval.zip

@lostanlen
Copy link
Contributor

lostanlen commented Feb 18, 2018

The main one I have in mind is sonifying samples from a generative model of magnitude spectra.

Yes. This is even more difficult than inversion from CQT modulus because we have no guarantee that there exists a solution in the reproducing kernel Hilbert space (RKHS) associated to the CQT operator such that the modulus of the solution will yield the expected magnitude spectra.
BTW working on generative models in the time-frequency domain that would guarantee this RKHS property is AFAICT terra incognita in ML research ... but I digress.

@bmcfee bmcfee modified the milestones: 0.6.1, 0.6.2 Feb 19, 2018
@Pzoom522
Copy link

It's also very welcomed if we can have inverse transform imel

@deepanwayx
Copy link

deepanwayx commented Apr 21, 2018

Hey @jongwook can you please provide a concrete example of how to use the Griffin-Lim function? I am using the following code and it is throwing a inconsistent shape error while multiplying spectrogram and angles.

def griffinlim(spectrogram, n_iter = 100, window = 'hann', n_fft = 2048, hop_length = -1, verbose = False):
    
    if hop_length == -1:
        hop_length = n_fft // 4

    angles = np.exp(2j * np.pi * np.random.rand(*spectrogram.shape))

    t = tqdm(range(n_iter), ncols=100, mininterval=2.0, disable=not verbose)
    
    for i in t:
        
        print ('Iter:' , i , ', Spectrogram Shape:', spectrogram.shape)
        print ('Iter:' , i , ', Angles Shape:', angles.shape)

        full = np.abs(spectrogram).astype(np.complex) * angles
        inverse = librosa.istft(full, hop_length = hop_length, window = window)
        rebuilt = librosa.stft(inverse, n_fft = n_fft, hop_length = hop_length, window = window)
        angles = np.exp(1j * np.angle(rebuilt))

        if verbose:
            diff = np.abs(spectrogram) - np.abs(rebuilt)
            t.set_postfix(loss=np.linalg.norm(diff, 'fro'))

    full = np.abs(spectrogram).astype(np.complex) * angles
    inverse = librosa.istft(full, hop_length = hop_length, window = window)

    return inverse


y, sr = librosa.load('of6p3-xkgvy.wav')
melspec = librosa.feature.melspectrogram(y)
back = griffinlim(melspec)

The error message is this

Iter: 0 , Spectrogram Shape: (128, 1292)
Iter: 0 , Angles Shape: (128, 1292)
Iter: 1 , Spectrogram Shape: (128, 1292)
Iter: 1 , Angles Shape: (1025, 1292)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-30-3af6f8a9f93f> in <module>()
     31 
     32 melspec = librosa.feature.melspectrogram(y)
---> 33 back = griffinlim(melspec)

<ipython-input-30-3af6f8a9f93f> in griffinlim(spectrogram, n_iter, window, n_fft, hop_length, verbose)
     13         print ('Iter:' , i , ', Angles Shape:', angles.shape)
     14 
---> 15         full = np.abs(spectrogram).astype(np.complex) * angles
     16         inverse = librosa.istft(full, hop_length = hop_length, window = window)
     17         rebuilt = librosa.stft(inverse, n_fft = n_fft, hop_length = hop_length, window = window)

ValueError: operands could not be broadcast together with shapes (128,1292) (1025,1292) 

@jongwook
Copy link

@deepanwayx this is not a Q&A board, but it seems that you're passing a Mel spectrogram to the function, which is designed to deal with the usual STFT spectrograms. They have different default set of parameters, so the shapes are different.

Inverting a Mel spectrogram without any prior is ill-defined in general; you may try Tacotron 2 for inverting Mel spectrograms of speech

@bmcfee bmcfee removed this from the 0.6.2 milestone Jun 22, 2018
@rafaelvalle
Copy link

Sharing the link to support @bmcfee idea of "sonifying samples from a generative model of magnitude spectra.", or related transforms.
https://github.com/rafaelvalle/asrgen

@imflash217
Copy link

Curious, what use cases are there when the phase from the forward transform can't be kept around for doing the inverse?

I'm having this issue when doing Audio Source Separation. The input to my neural net is a magnitude STFT features of the mixture and the output is the magnitude STFT features of the separated source.

@lostanlen
Copy link
Contributor

@vinaykumar2491 read this message: #434 (comment)

@jeremydeanw
Copy link

Has anyone implemented the "phase retrieval for wavelet transforms" in python?

@lostanlen
Copy link
Contributor

@jeremyworger not that I know of, but it would be of immense value.

@bmcfee
Copy link
Member Author

bmcfee commented Mar 8, 2019

This thread's spiraled pretty far off from the original issue, but that's okay!

I've spent a decent amount of time this week hacking on various things in the space of spectrogram inversion using a slightly modified version of @jongwook's griffin-lim implementation. Despite having better options for phase retrieval out there (mentioned above in this thread), I still think it would be good to have a reference GL to rely on. With that in hand, we could easily provide (approximate) mfcc and mel inverters using pseudo-inverses; I have prototypes in local notebooks, and they sound not-perfect but not horrible either.

I'll create a new issue with a narrowed down checklist of deliverables to include in the 0.7 release.

@bmcfee
Copy link
Member Author

bmcfee commented May 5, 2019

The recently merged #843 includes (fast) griffin-lim, which I think covers most of the content of this issue. It does not (currently) support cqt phase retrieval, but we can add that later if there's demand. Any objection to closing this out?

@lostanlen
Copy link
Contributor

yes, let's close this one. then we can release griffin-lim in the next stable version, collect some user feedback, and eventually open a CQT-related issue around that feedback.
amazing work @bmcfee !!!

@walidbou6
Copy link

the Griffin-Lim method causes unnatural artifacts in synthetic speech.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Does this improve existing functionality? functionality Does this add new functionality?
Development

No branches or pull requests