New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
inverse CQT #165
Comments
Typically you need to have a very high frame rate (close to the original What do we provide as the phase for under sampled wide-bandwidth bins? DAn. On Thursday, February 5, 2015, Brian McFee notifications@github.com wrote:
|
I don't have any experience doing an inverse CQT. As far as I understand it, the inverse transform is approximate (not exactly invertible). Some things to consider that would affect the quality of the approximation are:
I'm also picturing trying to invert the CQT of an impulse and it seems like it would difficult to avoid extensive smearing in the lower bands of the reconstruction. |
I think it's the high frequencies that are the limitation. If you have Of course, that's only for the top bin; you could certainly save space by It's not enough to be able to analyze and reconstruct. What we want is I'm not sure how many applications need the ability to modify beyond the DAn. On Fri, Feb 6, 2015 at 4:33 PM, Eric Battenberg notifications@github.com
|
https://bmcfee.github.io/librosa/librosa.html?highlight=cqt#librosa.core.cqt says "This implementation is based on the recursive sub-sampling method described by Schoerkhuber and Klapuri, 2010. If that's the case, and if https://code.soundsoftware.ac.uk/projects/constant-q-toolbox provides inverse cqt, then why doesn't librosa have a ready function? Kindly reply! |
Nobody's volunteered to write it (yet), but pull requests are welcome! Our implementation is based on the algorithm described in the paper, but is not bit-for-bit equivalent to the CQT toolbox implementation for a variety of reasons. Consequently, copy-pasting their code and translating the matlab statements into python probably won't work. |
As mentioned in #279, I have a working inverse implementation in a notebook. However, I don't think we should integrate this into the current (0.4) release line for two reasons: 1) it sounds awful on the default parameterization (as @dpwe originally suggested, we need high sample and frame rates), and 2) getting the quality up to that reported by S&K may take some additional changes to the forward CQT implementation. Fixing these issues would break backwards compatibility, which we should avoid within a minor revision cycle. This can be revisited in 0.5 though, where we're already planning some API changes anyway. |
What kind of changes are required to the forward CQT? |
I'm not sure yet, it'll take more detailed probing than I have time for at the moment. One of the things that popped up in reconstruction was the need for re-normalizing the signal (dividing by Otherwise, the default parameters should probably change to have Also, it'd still be good if we could support alternative window functions in the basis construction. S&K report their best results with blackman-harris, which breaks the logic for detecting when the passband exceeds nyquist. Adding this wouldn't break backwards compatibility. |
What about adding Holighaus' CQT? It's invertible. A MATLAB implementation exists here for inspiration. |
I don't think we should have multiple CQT implementations. As I mentioned above, we have a working implementation of inverse cqt. I'm holding off on making a PR because it requires phase information, which is currently not produced by default. If we provide icqt functionality, I want it to work "out of the box". This will be possible in 0.5.0 when the deprecated default parameter settings are replaced, and |
@bmcfee, makes sense. Anyway, looking forward to seeing an inverse in librosa! It's a great package and I rely on it daily. 😄 Do you have a feeling regarding which CQT variant is more useful? Schorkhuber & Klapuri's method is only invertible up to a degree, but maybe the SNR after reconstruction isn't particularly audible in practice? Holighaus et al's's variant (they call it CQ-NSGT apparently, because it's built up from having several Gabor transforms) promises higher audio fidelity post-reconstruction, because just like FFT and STFT it should in theory at least be perfectly invertible. They also claim less computational time both directions. But their comparison in terms of sound quality to CQT is a bit lacking to be honest (I could only find spectrograms). It would be interesting to compare CQT and CQ-NSGT more extensively. I'll be porting CQ-NSGT to Python 3 today, and then doing some experiments with librosa 0.5 could be fun! |
I haven't looked at Holighaus's paper in great detail, but I don't see how that differs from S&K, who also use multiple (shifted) basis functions to get high-quality reconstruction. Note, recent bugs #341 have convinced me that redundant bases are a good thing, so we'll probably add that in 0.5 along with icqt anyway. |
So, I'm reviewing the CQT code in detail for the first time, and I find there's something I want to say. The CQT filters we use by default use Hann-windowed sinusoids at the log-spaced center frequencies. Then these complex band-pass filters are sampled at the same time intervals across all bins and octaves. The ICQT that Brian is proposing, I think, takes the complex results of the inner products from the analysis, scales the conjugate (hand-windowed) impulse response, and overlap-adds at the original frame rate. The complex scaling should take care of making the phases of overlapping filter kernels line up correctly, but the idea that we can just overlap a bunch of Hann-windowed grains and end with a nice smoothly-interpolate amplitude is not correct (and I think this is the assumption). Summing shifted Hann windows only gives a flat (or montonically-interpolated) envelope for n_hop = hann_len / (2^N). For values in-between, you get frame-rate amplitude ripple of differing depths. See this example of overlapping 16 point windows with a hop of 7 (i.e., overlap of 9 > win_len/2): You end up with amplitude modulation ~ 10%, not good. Because every CQT bin has a different effective Hann len, we will get this all over the place regardless of our actual hop length. You can compensate for this by calculating the effecting hand-overlap AM envelope and dividing it out. Ugly code, but ought to work. |
Excellent points, and yeah, definitely not things I'd considered. My implementation is taken more-or-less vanilla from the method described in the paper. It sounds like your proposed fix can be incorporated cleanly if we factor out the analogous logic from |
How about that? You're right, it's right there. Nice. DAn. On Sun, May 8, 2016 at 5:30 PM, Brian McFee notifications@github.com wrote:
|
Hello, I'm not sure is it proper to write this kind of thing on this baord, but I couldn't find any better. I love this package, I use it a lot especially constant q transform. Thank you for making cool useful tool for everyone.
Thanks |
@superkks The Nyquist frequency is equal to half the sample rate. librosa needs this information as a keyword argument just as much as any other toolbox would. FYI, that librosa sets the sample rate |
@bmcfee, any news on the CQT/ICQT? Is it ready for use? What should be fixed, if not? I'd like to take a real stab at it and submit a pr, because I'm in a real hurry to get a good Python-based CQT going, if you don't mind. By the way, I noticed that Holighaus's ideas were kinda merged into Schörkhuber's latest Matlab Toolbox with varying Q. See here. |
I'm waiting on #342 before doing this, because it will involve a substantial change to the basis construction interface.
I probably can't get to this on my own until next month, but I'm happy to help out if you want to take a stab at #342 (though it's complicated) and then this one. |
Yeah, absolutely. Honestly, anything to get me out of MATLAB hell. I'll look through #342 this week and see if I manage. 👍 |
I've updated the notebook to work on the latest #417 PR, which fixes a number of attenuation continuity errors in addition to normalizing filters by length. The results now look and sound good enough (IMO) to actively work toward implementation in librosa, once #417 is merged. Here's an example of More plots and examples are included in the notebook. There are still a couple of issues to work out (eg the unscaled case), and I think there might be a small amount of delay introduced by the icqt. |
(cc @ebattenberg @dpwe)
Seems like we ought to be able to do this, if we preserve phase in the cqt.
The main barrier comes from variable_hop, which uses oversampling and aggregation to handle large hop-sizes with small filters. This works fine in magnitude, but I don't see a clean way to make it work out in the complex domain.
Perhaps we can add a
dtype
parameter to cqt: if the user requests a complex-type output, we preserve phase, and throw an exception on variable-hop aggregation. Otherwise, we let it through in the current state.Are there other barriers to icqt that I'm not seeing at the moment?
The text was updated successfully, but these errors were encountered: