Synthetic phase for inverse transforms #434

bmcfee · 2016-11-04T14:54:58Z

Would it be useful to have inverse transforms istft, icqt #165 synthesize phase when the input is magnitude spectra? @dpwe's repsonse to #424 provides an example of how to do this for inverting MFCCs by transferring the phase of the corresponding forward transform of white noise.

I've test-driven this on my icqt prototype, and it sounds pretty good; much better than a magnitude-only reconstruction.

It's a bit of a nuisance to do this by hand since the parameters and duration need to be matched to the input signal. It would be easy to do this from within the inverse transform though, since that information is all present. I'm thinking an optional (defaulting to False) parameter.

Thoughts?

The text was updated successfully, but these errors were encountered:

dpwe · 2016-11-04T14:59:15Z

Sounds like a neat idea.

DAn.

On Fri, Nov 4, 2016 at 10:54 AM, Brian McFee notifications@github.com
wrote:

Would it be useful to have inverse transforms istft, icqt #165
#165 synthetisize phase when
the input is magnitude spectra? @dpwe https://github.com/dpwe's
repsonse to #424 #424 provides
an example of how to do this for inverting MFCCs by transferring the phase
of the corresponding forward transform of white noise.

I've test-driven this on my icqt prototype, and it sounds pretty good;
much better than a magnitude-only reconstruction.

It's a bit of a nuisance to do this by hand though, since the parameters
and duration need to be matched to the input signal. It would be easy to do
this from within the inverse transform though, since that information is
all present. I'm thinking an optional (defaulting to False) parameter.

Thoughts?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#434, or mute the thread
https://github.com/notifications/unsubscribe-auth/AAhs0Tl3ak63TyL5OhUUoiTI_kxCodboks5q60dCgaJpZM4KpoH1
.

jongwook · 2017-04-03T20:37:44Z

I'm using this implementation of Griffin-Lim algorithm, and the restored audio sounds perceivably better than random/from-white-noise initialization of phases.

def griffinlim(spectrogram, n_iter = 100, window = 'hann', n_fft = 2048, hop_length = -1, verbose = False):
    if hop_length == -1:
        hop_length = n_fft // 4

    angles = np.exp(2j * np.pi * np.random.rand(*spectrogram.shape))

    t = tqdm(range(n_iter), ncols=100, mininterval=2.0, disable=not verbose)
    for i in t:
        full = np.abs(spectrogram).astype(np.complex) * angles
        inverse = librosa.istft(full, hop_length = hop_length, window = window)
        rebuilt = librosa.stft(inverse, n_fft = n_fft, hop_length = hop_length, window = window)
        angles = np.exp(1j * np.angle(rebuilt))

        if verbose:
            diff = np.abs(spectrogram) - np.abs(rebuilt)
            t.set_postfix(loss=np.linalg.norm(diff, 'fro'))

    full = np.abs(spectrogram).astype(np.complex) * angles
    inverse = librosa.istft(full, hop_length = hop_length, window = window)

    return inverse

Is it something to be avoided due to the slow speed? In that case, I think @Jonathan-LeRoux's algorithm would be a nice addition, although I found it hard to write it efficiently only using Python.

stefan-balke · 2017-04-04T08:24:06Z

+1 for having Griffin-Lim + x (other phase reconstruction methods from magnitude spectrograms, e.g., https://ai2-s2-pdfs.s3.amazonaws.com/2042/3dffa92efd5371489e6b11b22779b0a2fc85.pdf) in librosa.

Jonathan-LeRoux · 2017-04-04T13:34:07Z

How timely! I have been in the process of preparing my old phase reconstruction C code for release.
The MEX version and accompanying Matlab script had been 95% ready for a year, and I finalized them last week, but I was considering preparing a Python wrapper before releasing the whole thing. I have never used C code in Python, and I was stopped in my tracks by the number of options. Any input on whether I should use ctypes, Cython, SWIG, or Python-C-Api? I'm looking for something easy and quick to code, and easy to use by the user.

bmcfee · 2017-04-04T13:40:45Z

Any input on whether I should use ctypes, Cython, SWIG, or Python-C-Api? I'm looking for something easy and quick to code, and easy to use by the user.

Cython is the go-to these days, and it's the easiest to package and ship in my experience.

bmcfee · 2017-04-04T13:42:36Z

Is it something to be avoided due to the slow speed?

Not necessarily -- the initial proposal here was for a generic phase reconstruction method that could apply both to i/stft and i/cqt. I'm not sure GL would work on CQT, but it's worth a shot.

Jonathan-LeRoux · 2017-04-06T15:45:53Z

Thanks Brian, I'll give Cython a try.

diggerdu · 2017-04-15T13:23:15Z

I tried Griffin-Lim algorithm in Constant Q, the results is bad.

lonce · 2017-04-16T02:26:21Z

Check this out for a fast-and-not-too-dirty reconstruction:
https://github.com/lonce/SPSI_Python
Works great as an initial phase estimate for Griffin-Lim, too.

Jonathan-LeRoux · 2017-06-26T15:16:32Z

I uploaded my phase reconstruction code on Github and Pypi:
https://github.com/Jonathan-LeRoux/lws
The core is C/C++, with wrappers for both Python (via Cython) and Matlab (via Mex).

carlthome · 2017-08-31T21:30:08Z

Curious, what use cases are there when the phase from the forward transform can't be kept around for doing the inverse? 😲

bmcfee · 2017-08-31T21:33:18Z

Curious, what use cases are there when the phase from the forward transform can't be kept around for doing the inverse?

The main one I have in mind is sonifying samples from a generative model of magnitude spectra.

lostanlen · 2018-02-18T01:19:16Z

Wow, 0.6.1 milestone for phase retrieval ? That's pretty bold. I like it.
FYI, the state of the art is Irène Walspurger's 2015 paper "phase retrieval for wavelet transforms":
https://arxiv.org/abs/1512.07024

Demo is here: http://www-math.mit.edu/~waldspur//wavelets_phase_retrieval.html
It does much, much better than Griffin-Lim, but is probably much slower in its original implementation.
The MATLAB code is here: https://www.ceremade.dauphine.fr/~waldspurger/wav_phase_retrieval.zip

lostanlen · 2018-02-18T01:30:47Z

The main one I have in mind is sonifying samples from a generative model of magnitude spectra.

Yes. This is even more difficult than inversion from CQT modulus because we have no guarantee that there exists a solution in the reproducing kernel Hilbert space (RKHS) associated to the CQT operator such that the modulus of the solution will yield the expected magnitude spectra.
BTW working on generative models in the time-frequency domain that would guarantee this RKHS property is AFAICT terra incognita in ML research ... but I digress.

Pzoom522 · 2018-02-21T14:21:32Z

It's also very welcomed if we can have inverse transform imel

deepanwayx · 2018-04-21T12:35:57Z

Hey @jongwook can you please provide a concrete example of how to use the Griffin-Lim function? I am using the following code and it is throwing a inconsistent shape error while multiplying spectrogram and angles.

def griffinlim(spectrogram, n_iter = 100, window = 'hann', n_fft = 2048, hop_length = -1, verbose = False):
    
    if hop_length == -1:
        hop_length = n_fft // 4

    angles = np.exp(2j * np.pi * np.random.rand(*spectrogram.shape))

    t = tqdm(range(n_iter), ncols=100, mininterval=2.0, disable=not verbose)
    
    for i in t:
        
        print ('Iter:' , i , ', Spectrogram Shape:', spectrogram.shape)
        print ('Iter:' , i , ', Angles Shape:', angles.shape)

        full = np.abs(spectrogram).astype(np.complex) * angles
        inverse = librosa.istft(full, hop_length = hop_length, window = window)
        rebuilt = librosa.stft(inverse, n_fft = n_fft, hop_length = hop_length, window = window)
        angles = np.exp(1j * np.angle(rebuilt))

        if verbose:
            diff = np.abs(spectrogram) - np.abs(rebuilt)
            t.set_postfix(loss=np.linalg.norm(diff, 'fro'))

    full = np.abs(spectrogram).astype(np.complex) * angles
    inverse = librosa.istft(full, hop_length = hop_length, window = window)

    return inverse


y, sr = librosa.load('of6p3-xkgvy.wav')
melspec = librosa.feature.melspectrogram(y)
back = griffinlim(melspec)

The error message is this

Iter: 0 , Spectrogram Shape: (128, 1292)
Iter: 0 , Angles Shape: (128, 1292)
Iter: 1 , Spectrogram Shape: (128, 1292)
Iter: 1 , Angles Shape: (1025, 1292)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-30-3af6f8a9f93f> in <module>()
     31 
     32 melspec = librosa.feature.melspectrogram(y)
---> 33 back = griffinlim(melspec)

<ipython-input-30-3af6f8a9f93f> in griffinlim(spectrogram, n_iter, window, n_fft, hop_length, verbose)
     13         print ('Iter:' , i , ', Angles Shape:', angles.shape)
     14 
---> 15         full = np.abs(spectrogram).astype(np.complex) * angles
     16         inverse = librosa.istft(full, hop_length = hop_length, window = window)
     17         rebuilt = librosa.stft(inverse, n_fft = n_fft, hop_length = hop_length, window = window)

ValueError: operands could not be broadcast together with shapes (128,1292) (1025,1292)

jongwook · 2018-04-24T00:54:09Z

@deepanwayx this is not a Q&A board, but it seems that you're passing a Mel spectrogram to the function, which is designed to deal with the usual STFT spectrograms. They have different default set of parameters, so the shapes are different.

Inverting a Mel spectrogram without any prior is ill-defined in general; you may try Tacotron 2 for inverting Mel spectrograms of speech

rafaelvalle · 2018-07-07T05:22:13Z

Sharing the link to support @bmcfee idea of "sonifying samples from a generative model of magnitude spectra.", or related transforms.
https://github.com/rafaelvalle/asrgen

imflash217 · 2018-10-22T11:57:23Z

Curious, what use cases are there when the phase from the forward transform can't be kept around for doing the inverse?

I'm having this issue when doing Audio Source Separation. The input to my neural net is a magnitude STFT features of the mixture and the output is the magnitude STFT features of the separated source.

lostanlen · 2018-10-23T19:03:54Z

@vinaykumar2491 read this message: #434 (comment)

jeremydeanw · 2019-01-07T23:11:24Z

Has anyone implemented the "phase retrieval for wavelet transforms" in python?

lostanlen · 2019-01-08T16:59:38Z

@jeremyworger not that I know of, but it would be of immense value.

bmcfee · 2019-03-08T21:38:20Z

This thread's spiraled pretty far off from the original issue, but that's okay!

I've spent a decent amount of time this week hacking on various things in the space of spectrogram inversion using a slightly modified version of @jongwook's griffin-lim implementation. Despite having better options for phase retrieval out there (mentioned above in this thread), I still think it would be good to have a reference GL to rely on. With that in hand, we could easily provide (approximate) mfcc and mel inverters using pseudo-inverses; I have prototypes in local notebooks, and they sound not-perfect but not horrible either.

I'll create a new issue with a narrowed down checklist of deliverables to include in the 0.7 release.

bmcfee · 2019-05-05T21:28:55Z

The recently merged #843 includes (fast) griffin-lim, which I think covers most of the content of this issue. It does not (currently) support cqt phase retrieval, but we can add that later if there's demand. Any objection to closing this out?

lostanlen · 2019-05-05T22:24:13Z

yes, let's close this one. then we can release griffin-lim in the next stable version, collect some user feedback, and eventually open a CQT-related issue around that feedback.
amazing work @bmcfee !!!

walidbou6 · 2022-03-18T16:14:41Z

the Griffin-Lim method causes unnatural artifacts in synthetic speech.

bmcfee added enhancement Does this improve existing functionality? question Issues asking for help doing something functionality Does this add new functionality? good for beginners Are you new here? These issues are for you! labels Nov 4, 2016

bmcfee added this to the 0.5 milestone Nov 4, 2016

bmcfee self-assigned this Nov 4, 2016

bmcfee removed good for beginners Are you new here? These issues are for you! question Issues asking for help doing something labels Dec 12, 2016

bmcfee modified the milestones: 0.5, 0.5.1 Dec 23, 2016

bmcfee modified the milestones: 0.6.0, 0.5.1 Apr 21, 2017

pineking mentioned this issue Aug 15, 2017

Is it possible to convert spectrogram to wav? bpotard/idlak#14

Open

bmcfee removed this from the 0.6.0 milestone Sep 1, 2017

bmcfee added this to the 0.6.1 milestone Feb 13, 2018

bmcfee modified the milestones: 0.6.1, 0.6.2 Feb 19, 2018

bmcfee removed this from the 0.6.2 milestone Jun 22, 2018

bmcfee mentioned this issue Mar 8, 2019

Griffin-Lim, inverse mel, inverse mfcc #843

Closed

lostanlen closed this as completed May 5, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Synthetic phase for inverse transforms #434

Synthetic phase for inverse transforms #434

bmcfee commented Nov 4, 2016 •

edited

Loading

dpwe commented Nov 4, 2016

jongwook commented Apr 3, 2017

stefan-balke commented Apr 4, 2017 •

edited

Loading

Jonathan-LeRoux commented Apr 4, 2017

bmcfee commented Apr 4, 2017

bmcfee commented Apr 4, 2017

Jonathan-LeRoux commented Apr 6, 2017

diggerdu commented Apr 15, 2017

lonce commented Apr 16, 2017 •

edited

Loading

Jonathan-LeRoux commented Jun 26, 2017

carlthome commented Aug 31, 2017

bmcfee commented Aug 31, 2017

lostanlen commented Feb 18, 2018 •

edited

Loading

lostanlen commented Feb 18, 2018 •

edited

Loading

Pzoom522 commented Feb 21, 2018

deepanwayx commented Apr 21, 2018 •

edited

Loading

jongwook commented Apr 24, 2018

rafaelvalle commented Jul 7, 2018

imflash217 commented Oct 22, 2018

lostanlen commented Oct 23, 2018

jeremydeanw commented Jan 7, 2019

lostanlen commented Jan 8, 2019

bmcfee commented Mar 8, 2019

bmcfee commented May 5, 2019

lostanlen commented May 5, 2019

walidbou6 commented Mar 18, 2022

Synthetic phase for inverse transforms #434

Synthetic phase for inverse transforms #434

Comments

bmcfee commented Nov 4, 2016 • edited Loading

dpwe commented Nov 4, 2016

jongwook commented Apr 3, 2017

stefan-balke commented Apr 4, 2017 • edited Loading

Jonathan-LeRoux commented Apr 4, 2017

bmcfee commented Apr 4, 2017

bmcfee commented Apr 4, 2017

Jonathan-LeRoux commented Apr 6, 2017

diggerdu commented Apr 15, 2017

lonce commented Apr 16, 2017 • edited Loading

Jonathan-LeRoux commented Jun 26, 2017

carlthome commented Aug 31, 2017

bmcfee commented Aug 31, 2017

lostanlen commented Feb 18, 2018 • edited Loading

lostanlen commented Feb 18, 2018 • edited Loading

Pzoom522 commented Feb 21, 2018

deepanwayx commented Apr 21, 2018 • edited Loading

jongwook commented Apr 24, 2018

rafaelvalle commented Jul 7, 2018

imflash217 commented Oct 22, 2018

lostanlen commented Oct 23, 2018

jeremydeanw commented Jan 7, 2019

lostanlen commented Jan 8, 2019

bmcfee commented Mar 8, 2019

bmcfee commented May 5, 2019

lostanlen commented May 5, 2019

walidbou6 commented Mar 18, 2022

bmcfee commented Nov 4, 2016 •

edited

Loading

stefan-balke commented Apr 4, 2017 •

edited

Loading

lonce commented Apr 16, 2017 •

edited

Loading

lostanlen commented Feb 18, 2018 •

edited

Loading

lostanlen commented Feb 18, 2018 •

edited

Loading

deepanwayx commented Apr 21, 2018 •

edited

Loading