Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inverse CQT #165

Closed
bmcfee opened this issue Feb 5, 2015 · 21 comments
Closed

inverse CQT #165

bmcfee opened this issue Feb 5, 2015 · 21 comments
Assignees
Labels
functionality Does this add new functionality?
Milestone

Comments

@bmcfee
Copy link
Member

bmcfee commented Feb 5, 2015

(cc @ebattenberg @dpwe)

Seems like we ought to be able to do this, if we preserve phase in the cqt.

The main barrier comes from variable_hop, which uses oversampling and aggregation to handle large hop-sizes with small filters. This works fine in magnitude, but I don't see a clean way to make it work out in the complex domain.

Perhaps we can add a dtype parameter to cqt: if the user requests a complex-type output, we preserve phase, and throw an exception on variable-hop aggregation. Otherwise, we let it through in the current state.

Are there other barriers to icqt that I'm not seeing at the moment?

@bmcfee bmcfee added enhancement Does this improve existing functionality? question Issues asking for help doing something functionality Does this add new functionality? labels Feb 5, 2015
@dpwe
Copy link
Contributor

dpwe commented Feb 5, 2015

Typically you need to have a very high frame rate (close to the original
sampling rate, depending on the Q and the top frequency) to preserve all
the information in the highest bin.

What do we provide as the phase for under sampled wide-bandwidth bins?

DAn.

On Thursday, February 5, 2015, Brian McFee notifications@github.com wrote:

(cc @ebattenberg https://github.com/ebattenberg @dpwe
https://github.com/dpwe)

Seems like we ought to be able to do this, if we preserve phase in the cqt.

The main barrier comes from variable_hop, which uses oversampling and
aggregation to handle large hop-sizes with small filters. This works fine
in magnitude, but I don't see a clean way to make it work out in the
complex domain.

Perhaps we can add a dtype parameter to cqt: if the user requests a
complex-type output, we preserve phase, and throw an exception on
variable-hop aggregation. Otherwise, we let it through in the current state.

Are there other barriers to icqt that I'm not seeing at the moment?


Reply to this email directly or view it on GitHub
#165.

@ebattenberg
Copy link
Contributor

I don't have any experience doing an inverse CQT. As far as I understand it, the inverse transform is approximate (not exactly invertible). Some things to consider that would affect the quality of the approximation are:

  1. The sparsity threshold for the frequency kernels.
  2. The hop-size. I not sure how this would work with any type of over-sampling aggregation. I also don't think the pseudo CQT (in the hybrid-cqt) could be inverted. Other than those cases, for bands that are sampled at a small enough hop-size, does there need to be some sort of normalization to account for the fact that some bands would be oversampled (compared to the typically half overlapping hop-size of the invertible lapped transforms like the MDCTs).

I'm also picturing trying to invert the CQT of an impulse and it seems like it would difficult to avoid extensive smearing in the lower bands of the reconstruction.

@dpwe
Copy link
Contributor

dpwe commented Feb 6, 2015

I think it's the high frequencies that are the limitation. If you have
constant Q, you have bandwidth increasing with frequency. A semitone CQ
bank might have a bandwidth of 6%, or a Q of 16. That means if you want
50% overlap of time windows, you can only afford to downsample by 8, i.e.,
for 16 kHz data, you need a frame rate of ~2 kHz (0.5 ms hop).

Of course, that's only for the top bin; you could certainly save space by
downsampling by 2 for every octave, but it's much more complicated to
describe and manage.

It's not enough to be able to analyze and reconstruct. What we want is
analyze, then modify, then reconstruct. You can fake this with pseudo
CQT (spectrogram blurred into CQ bins) if you re-expand it out (pseudo
inverse), then use it to modify magnitudes in original spectrogram.

I'm not sure how many applications need the ability to modify beyond the
level at the simple semitone scale / 10 ms frame level. So maybe some
interface that "resynthesizes" from a modified pseudo-CQT and the original
waveform
would be useful.

DAn.

On Fri, Feb 6, 2015 at 4:33 PM, Eric Battenberg notifications@github.com
wrote:

I don't have any experience doing an inverse CQT. As far as I understand
it, the inverse transform is approximate (not exactly invertible). Some
things to consider that would affect the quality of the approximation are:

  1. The sparsity threshold for the frequency kernels.
  2. The hop-size. I not sure how this would work with any type of
    over-sampling aggregation. I also don't think the pseudo CQT (in the
    hybrid-cqt) could be inverted. Other than those cases, for bands that are
    sampled at a small enough hop-size, does there need to be some sort of
    normalization to account for the fact that some bands would be oversampled
    (compared to the typically half overlapping hop-size of the invertible
    lapped transforms like the MDCTs).

I'm also picturing trying to invert the CQT of an impulse and it seems
like it would difficult to avoid extensive smearing in the lower bands of
the reconstruction.


Reply to this email directly or view it on GitHub
#165 (comment).

@vedhas
Copy link

vedhas commented Sep 4, 2015

https://bmcfee.github.io/librosa/librosa.html?highlight=cqt#librosa.core.cqt says "This implementation is based on the recursive sub-sampling method described by Schoerkhuber and Klapuri, 2010.
Schoerkhuber, Christian, and Anssi Klapuri. "Constant-Q transform toolbox for music processing.” 7th Sound and Music Computing Conference, Barcelona, Spain. 2010."

If that's the case, and if https://code.soundsoftware.ac.uk/projects/constant-q-toolbox provides inverse cqt, then why doesn't librosa have a ready function? Kindly reply!

@bmcfee
Copy link
Member Author

bmcfee commented Sep 4, 2015

If that's the case, and if https://code.soundsoftware.ac.uk/projects/constant-q-toolbox provides inverse cqt, then why doesn't librosa have a ready function? Kindly reply!

Nobody's volunteered to write it (yet), but pull requests are welcome!

Our implementation is based on the algorithm described in the paper, but is not bit-for-bit equivalent to the CQT toolbox implementation for a variety of reasons. Consequently, copy-pasting their code and translating the matlab statements into python probably won't work.

@bmcfee
Copy link
Member Author

bmcfee commented Feb 4, 2016

later...

As mentioned in #279, I have a working inverse implementation in a notebook. However, I don't think we should integrate this into the current (0.4) release line for two reasons: 1) it sounds awful on the default parameterization (as @dpwe originally suggested, we need high sample and frame rates), and 2) getting the quality up to that reported by S&K may take some additional changes to the forward CQT implementation.

Fixing these issues would break backwards compatibility, which we should avoid within a minor revision cycle. This can be revisited in 0.5 though, where we're already planning some API changes anyway.

@bmcfee bmcfee added this to the 0.5 milestone Feb 4, 2016
@bmcfee bmcfee self-assigned this Feb 4, 2016
@ebattenberg
Copy link
Contributor

What kind of changes are required to the forward CQT?

@bmcfee
Copy link
Member Author

bmcfee commented Feb 4, 2016

What kind of changes are required to the forward CQT?

I'm not sure yet, it'll take more detailed probing than I have time for at the moment. One of the things that popped up in reconstruction was the need for re-normalizing the signal (dividing by sqrt(2)) after up-sampling by 2. We don't do this in the forward direction, and it's not clear to me yet if this matters. (It shouldn't, in theory, but I could imagine it having some influence on the relative strength of each octave compared to the filter bank when doing the inverse.)

Otherwise, the default parameters should probably change to have resolution=1 (or, filter_scale=1 if that's the name we go with; S&K call it the "scaling factor").

Also, it'd still be good if we could support alternative window functions in the basis construction. S&K report their best results with blackman-harris, which breaks the logic for detecting when the passband exceeds nyquist. Adding this wouldn't break backwards compatibility.

@carlthome
Copy link
Contributor

What about adding Holighaus' CQT? It's invertible.

A MATLAB implementation exists here for inspiration.

@bmcfee
Copy link
Member Author

bmcfee commented Apr 27, 2016

What about adding Holighaus' CQT? It's invertible.

I don't think we should have multiple CQT implementations.

As I mentioned above, we have a working implementation of inverse cqt. I'm holding off on making a PR because it requires phase information, which is currently not produced by default.

If we provide icqt functionality, I want it to work "out of the box". This will be possible in 0.5.0 when the deprecated default parameter settings are replaced, and cqt is complex-valued by default.

@carlthome
Copy link
Contributor

@bmcfee, makes sense. Anyway, looking forward to seeing an inverse in librosa! It's a great package and I rely on it daily. 😄

Do you have a feeling regarding which CQT variant is more useful?

Schorkhuber & Klapuri's method is only invertible up to a degree, but maybe the SNR after reconstruction isn't particularly audible in practice?

Holighaus et al's's variant (they call it CQ-NSGT apparently, because it's built up from having several Gabor transforms) promises higher audio fidelity post-reconstruction, because just like FFT and STFT it should in theory at least be perfectly invertible. They also claim less computational time both directions. But their comparison in terms of sound quality to CQT is a bit lacking to be honest (I could only find spectrograms).

It would be interesting to compare CQT and CQ-NSGT more extensively. I'll be porting CQ-NSGT to Python 3 today, and then doing some experiments with librosa 0.5 could be fun!

@bmcfee
Copy link
Member Author

bmcfee commented May 3, 2016

I haven't looked at Holighaus's paper in great detail, but I don't see how that differs from S&K, who also use multiple (shifted) basis functions to get high-quality reconstruction. Note, recent bugs #341 have convinced me that redundant bases are a good thing, so we'll probably add that in 0.5 along with icqt anyway.

@dpwe
Copy link
Contributor

dpwe commented May 8, 2016

So, I'm reviewing the CQT code in detail for the first time, and I find there's something I want to say.

The CQT filters we use by default use Hann-windowed sinusoids at the log-spaced center frequencies. Then these complex band-pass filters are sampled at the same time intervals across all bins and octaves.

The ICQT that Brian is proposing, I think, takes the complex results of the inner products from the analysis, scales the conjugate (hand-windowed) impulse response, and overlap-adds at the original frame rate.

The complex scaling should take care of making the phases of overlapping filter kernels line up correctly, but the idea that we can just overlap a bunch of Hann-windowed grains and end with a nice smoothly-interpolate amplitude is not correct (and I think this is the assumption). Summing shifted Hann windows only gives a flat (or montonically-interpolated) envelope for n_hop = hann_len / (2^N). For values in-between, you get frame-rate amplitude ripple of differing depths. See this example of overlapping 16 point windows with a hop of 7 (i.e., overlap of 9 > win_len/2):

screen shot 2016-05-08 at 3 36 37 pm

You end up with amplitude modulation ~ 10%, not good.

Because every CQT bin has a different effective Hann len, we will get this all over the place regardless of our actual hop length.

You can compensate for this by calculating the effecting hand-overlap AM envelope and dividing it out. Ugly code, but ought to work.

@bmcfee
Copy link
Member Author

bmcfee commented May 8, 2016

Excellent points, and yeah, definitely not things I'd considered. My implementation is taken more-or-less vanilla from the method described in the paper.

It sounds like your proposed fix can be incorporated cleanly if we factor out the analogous logic from istft into its own helper function.

@dpwe
Copy link
Contributor

dpwe commented May 9, 2016

How about that? You're right, it's right there. Nice.

DAn.

On Sun, May 8, 2016 at 5:30 PM, Brian McFee notifications@github.com wrote:

Excellent points, and yeah, definitely not things I'd considered. My
implementation is taken more-or-less vanilla from the method described in
the paper.

It sounds like your proposed fix can be incorporated cleanly if we factor
out the analogous logic from istft into its own helper function.


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub

@ghost
Copy link

ghost commented Jun 3, 2016

Hello, I'm not sure is it proper to write this kind of thing on this baord, but I couldn't find any better. I love this package, I use it a lot especially constant q transform. Thank you for making cool useful tool for everyone.
By the way, I really need some high-quality inverse cqt.

  1. I tried the recursive icqt from the IPython notebook which was written by bmcfee, but as he said it sounded not so good.
  2. Next I tried the MATLAB cqt tool-kit, and it worked really well. But the problem is it needs more information than just cqt coefficients. Nyquist blabla and many others were required. Unfortunately I cannot go that way due to the idea of my research.
  3. Could you suggest any idea for me? Any other librarys or toolkit? Or maybe information about when you're gonna publish librosa 0.5 which includes icqt?

Thanks

@lostanlen
Copy link
Contributor

lostanlen commented Jun 5, 2016

@superkks The Nyquist frequency is equal to half the sample rate. librosa needs this information as a keyword argument just as much as any other toolbox would. FYI, that librosa sets the sample rate sr to 22050 by default, so this argument is optional whereas is it compulsory in other toolboxes.
What is your use case ?

@carlthome
Copy link
Contributor

carlthome commented Jun 14, 2016

@bmcfee, any news on the CQT/ICQT? Is it ready for use? What should be fixed, if not?

I'd like to take a real stab at it and submit a pr, because I'm in a real hurry to get a good Python-based CQT going, if you don't mind.

By the way, I noticed that Holighaus's ideas were kinda merged into Schörkhuber's latest Matlab Toolbox with varying Q. See here.

@bmcfee
Copy link
Member Author

bmcfee commented Jun 14, 2016

I'm waiting on #342 before doing this, because it will involve a substantial change to the basis construction interface.

I'd like to take a real stab at it and submit a pr, because I'm in a real hurry to get a good Python-based CQT going, if you don't mind.

I probably can't get to this on my own until next month, but I'm happy to help out if you want to take a stab at #342 (though it's complicated) and then this one.

@carlthome
Copy link
Contributor

Yeah, absolutely. Honestly, anything to get me out of MATLAB hell. I'll look through #342 this week and see if I manage. 👍

@bmcfee
Copy link
Member Author

bmcfee commented Oct 8, 2016

I've updated the notebook to work on the latest #417 PR, which fixes a number of attenuation continuity errors in addition to normalizing filters by length.

The results now look and sound good enough (IMO) to actively work toward implementation in librosa, once #417 is merged.

Here's an example of cqt(y) (left) and cqt(icqt(cqt(y))) (right):
image

More plots and examples are included in the notebook. There are still a couple of issues to work out (eg the unscaled case), and I think there might be a small amount of delay introduced by the icqt.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
functionality Does this add new functionality?
Development

No branches or pull requests

6 participants