New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Variable-Q transform #1018
Variable-Q transform #1018
Conversation
Tagging @cwitkowitz |
One question that we should sort out sooner rather than later: does it make sense to have an entirely separate function for vqt, or should we expand cqt to support vqt as a general case? Arguments for two functions:
Arguments for one function:
Thoughts? |
Wow! A historic day for librosa!
Yes. It's a dilemma: I'm on the side of (1), but ideally we should have a utility to recover the littlewood-paley sum so that we can estimate dB from variable-Q transform. Another thing to note is that |
Another option is to add a new scaling mode, analogous to how
Indeed, that's why I tend to prefer VQT over NSGT, even though it's a relatively non-informative name. (Technically, a DFT has variable Q as well...) |
I am starting to think that our implementation might be cleaner if we choose instead to combine the CQT and VQT functions in constantq.py. From what I can tell, the only differences (assuming gamma=0 for CQT and gamma>0 or gamma=None for VQT) between the two functions is the generation and/or rescaling of the filters for each iteration (as opposed to reusing the same bases by simply downsampling them). I think we could probably make a few tweaks so that we have the VQT function looking similar to the way it does now, with the CQT function simply calling VQT with gamma=0. I think it would be easier to test this way as well. Thoughts? |
I raised this point earlier here. I'm still not convinced that they should be the same function, since that could confuse the API. Probably the best solution is to factor out the commonalities and put some light-weight API on top so that we still have I agree with your assessment that the only real difference is the rescaling logic for each octave. If we can generalize this to support |
I think that's the right way to move forward. What we have now seems to work well, so I wouldn't be opposed to adding some test cases and inserting it into master for now. I just feel like the functions are too similar to leave by themselves. As I mentioned in my current pull request for the VQT default gamma, maybe we should consider the implementation presented in the original VQT paper http://www.cs.tut.fi/sgn/arg/CQT/ (which we would have to port over to python), as it is my understanding that it covers all the CQT/VQT bases, and has a generalized inverse. If we want to just finish what we have now, I need to add another comment addressing the ERB constants, and then I could essentially copy the CQT test cases for the VQT function. Then, I believe we could call it a pull request (correct me if I am wrong). |
👍 but let's get everything stable and correct first, then worry about refactoring.
Indeed -- this is more or less what I had in mind for the default gamma setting. My main concern with implementing it exactly as stated is that their frequency binning (alpha, eq 11) is slightly different from ours: centered (theirs) vs left-aligned (ours, inherited from earlier work on cqt toolbox) in the frequency domain, and this leads to slightly different calculations for gamma. There are reasonable arguments to be had for both ways of specifying the bins, and I could envision supporting both modes down the road. @lostanlen and I whiteboarded this back in november, but didn't really come to a conclusion about how to proceed. I think this is ultimately a separate issue, but whatever we implement here should be consistent with how the bandwidths are set. |
(sorry, did not mean to close PR) |
Just rebased this one to current master so we can get tests running again. Aside from @cwitkowitz 's other pr on this #1045 , there are still a few points we've yet to resolve. The big one IMO has to do with centering vs left-aligning frequency ranges, since that's a huge difference between us and the reference implementation / paper. Arguments for keeping it left-aligned:
Arguments for centering:
Other minor things to work out:
|
One slight hiccup to the idea of merging The other minor barrier to merging implementations is that the vqt is going to be a little slower than the cqt, since it's building new bases for each octave. I don't think this constitutes a huge performance hit, but it's worth considering. As a quick anecdote: example track at 44.1K gives these timings on my machine: In [11]: %timeit librosa.cqt(y=y, sr=sr, bins_per_octave=5*12, n_bins=5*12*7)
1.57 s ± 7.81 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [14]: %timeit librosa.vqt(y=y, sr=sr, bins_per_octave=5*12, n_bins=5*12*7, gamma=0)
1.59 s ± 31.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) So, not appreciably different for a decently representative use case. |
d3d3008
to
ea081f6
Compare
Well that was a gnarly merge, but I think we're back in business. |
Current status:
Things left to decide:
This seems like a useful thing to have in general (and also for CQT), but maybe it doesn't need to be part of this PR? |
😆 Seriously though, I was envisioning this as a helper function like perceptual-weighting that you could use to normalize a cqt/vqt output after the fact. But I agree that it's not critical at this point / could be added later. |
Any objection to merging this? It's been through quite a bit of preliminary review already. |
Looks ready to me. I have a few questions. Is there any plan to do anything regarding the bandwidth convention in the future? From what I gather, future work so far consists only of adding an inverse VQT and griffinlim VQT. |
No plan as of now. The most likely path that I see is that we add a switch to change from left-aligned to centered (in frequency) filters. I guess it won't matter much until we get around to implementing asymmetric windows.
In terms of the filter centering, or did you mean something else?
As in, I'm fine with it.
Those are the main things, yes. GL will be easy once ivqt is in place. |
I remember considering that implementation as a possible way to combine cqt/vqt efficiently. However, we decided to avoid it for now because of the alternate bandwidth convention. I'm not sure that there is any strong benefit to do it one way or the other. I'm just wondering where you stood on that.
Yes, that's what I'm referring to. Again my reason for asking this stems from the original implementation proposed in the VQT paper. I'm just wondering whether we get any speed benefits or anything else from the original implementation. |
Ah. I benchmarked it earlier on some representative examples, and the performance hit relative to pure CQT was extremely small, so IMO it's worth it to have a simpler code base. I also have some performance improvements in mind #1144 that should make up the difference. |
Sorry, forgot to respond to that earlier. I think it's more or less independent of the cqt/vqt combination, since both can (maybe should?) be defined with centered bandwidth calculations. FWIW the original CQT toolbox paper used left-aligned, and then the implementation later moved to centered. Point is, I think both cqt and vqt can be implemented either way, and it doesn't change much, but we should provide a consistent interface. But this can come later IMO. |
Makes sense. Well, I have no objections to merging. |
I'll merge this tomorrow, unless any one objects. |
What does this implement/fix? Explain your changes.
This PR extends CQT to add support for variable-Q transforms. It's a work in progress, but the core functionality seems to work already. A new function (
vqt
) has been added that is similar to CQT, but allows for different filterbanks to be used in each downsampled octave.I'm opening this PR now to prevent duplication of effort between this and #1016 .
As I mentioned above, I don't consider this "done" yet, for a few reasons: