Variable-Q transform #1018

bmcfee · 2019-11-11T16:40:26Z

What does this implement/fix? Explain your changes.

This PR extends CQT to add support for variable-Q transforms. It's a work in progress, but the core functionality seems to work already. A new function (vqt) has been added that is similar to CQT, but allows for different filterbanks to be used in each downsampled octave.

I'm opening this PR now to prevent duplication of effort between this and #1016 .

As I mentioned above, I don't consider this "done" yet, for a few reasons:

The reference papers for CQT and VQT use slightly different conventions for determining filter bandwidths (low-frequency aligned vs centered), and we should reconcile this before merging.
There may be an issue of normalization / gain correction that we need to sort out. @lostanlen has thoughts about this (discussed offline), but we haven't had time to thoroughly work through it all. Getting this done may require introducing some new normalization or scaling parameters, or changing the behavior of cqt in a backwards-incompatible way.

bmcfee · 2019-11-11T16:41:17Z

Tagging @cwitkowitz

librosa/filters.py

bmcfee · 2019-11-12T15:56:25Z

One question that we should sort out sooner rather than later: does it make sense to have an entirely separate function for vqt, or should we expand cqt to support vqt as a general case?

Arguments for two functions:

Keeps CQT simple
Easy to discover by documentation

Arguments for one function:

Minimizes redundancy in the API
Easier to maintain

Thoughts?

lostanlen · 2019-11-12T16:26:46Z

Wow! A historic day for librosa!

There may be an issue of normalization / gain correction that we need to sort out. @lostanlen has thoughts about this (discussed offline), but we haven't had time to thoroughly work through it all. Getting this done may require introducing some new normalization or scaling parameters, or changing the behavior of cqt in a backwards-incompatible way.

Yes. It's a dilemma:
(1) if we don't renormalize, then there is no energy conservation, which means that measuring loudness (after perceptual_weighting) will be numerically wrong in ways that are difficult to interpret
(2) if we do renormalize, then the maximum amplitude of a single sine wave (or slow AM/FM additive model) will not be consistent across the CQT axis, which will cause problem for CNNs in the time--frequency domain after max-pooling

I'm on the side of (1), but ideally we should have a utility to recover the littlewood-paley sum so that we can estimate dB from variable-Q transform.

Another thing to note is that variable-Q looks a bit like a pun on constant-Q, and the VQT acronym is not really established. The received terminology is "Nonstationary Gabor transform" (NSGT), but the problem is that it's restrictive on the type of wavelet (Gabor means Gaussian envelope, whereas we use a Hann envelope by default), so technically it isn't right either. But that's something to keep in mind for the docs.

bmcfee · 2019-11-12T16:31:42Z

I'm on the side of (1), but ideally we should have a utility to recover the littlewood-paley sum so that we can estimate dB from variable-Q transform.

Another option is to add a new scaling mode, analogous to how fft has mode='ortho', in case people want that behavior. Or, we could do as suggested elsewhere (offline?) and build in the scaling to any perceptual weighting function, but that seems a little messy from the API perspective. I'm not sure there's a good solution overall here.

The received terminology is "Nonstationary Gabor transform" (NSGT), but the problem is that it's restrictive on the type of wavelet (Gabor means Gaussian envelope, whereas we use a Hann envelope by default), so technically it isn't right either.

Indeed, that's why I tend to prefer VQT over NSGT, even though it's a relatively non-informative name. (Technically, a DFT has variable Q as well...)

librosa/core/constantq.py

cwitkowitz · 2020-01-06T18:05:04Z

I am starting to think that our implementation might be cleaner if we choose instead to combine the CQT and VQT functions in constantq.py.

From what I can tell, the only differences (assuming gamma=0 for CQT and gamma>0 or gamma=None for VQT) between the two functions is the generation and/or rescaling of the filters for each iteration (as opposed to reusing the same bases by simply downsampling them). I think we could probably make a few tweaks so that we have the VQT function looking similar to the way it does now, with the CQT function simply calling VQT with gamma=0.

I think it would be easier to test this way as well.

Thoughts?

bmcfee · 2020-01-06T18:09:41Z

I am starting to think that our implementation might be cleaner if we choose instead to combine the CQT and VQT functions in constantq.py.

I raised this point earlier here. I'm still not convinced that they should be the same function, since that could confuse the API. Probably the best solution is to factor out the commonalities and put some light-weight API on top so that we still have cqt and vqt, but minimize redundant code.

I agree with your assessment that the only real difference is the rescaling logic for each octave. If we can generalize this to support gamma=0 directly, and not introduce additional branching paths (conditionals within the downsampling loop), that would be ideal.

cwitkowitz · 2020-01-06T18:35:02Z

I think that's the right way to move forward. What we have now seems to work well, so I wouldn't be opposed to adding some test cases and inserting it into master for now. I just feel like the functions are too similar to leave by themselves. As I mentioned in my current pull request for the VQT default gamma, maybe we should consider the implementation presented in the original VQT paper http://www.cs.tut.fi/sgn/arg/CQT/ (which we would have to port over to python), as it is my understanding that it covers all the CQT/VQT bases, and has a generalized inverse.

If we want to just finish what we have now, I need to add another comment addressing the ERB constants, and then I could essentially copy the CQT test cases for the VQT function. Then, I believe we could call it a pull request (correct me if I am wrong).

bmcfee · 2020-01-06T19:08:25Z

I think that's the right way to move forward.

👍 but let's get everything stable and correct first, then worry about refactoring.

As I mentioned in my current pull request for the VQT default gamma, maybe we should consider the implementation presented in the original VQT paper http://www.cs.tut.fi/sgn/arg/CQT/ (which we would have to port over to python), as it is my understanding that it covers all the CQT/VQT bases, and has a generalized inverse.

Indeed -- this is more or less what I had in mind for the default gamma setting. My main concern with implementing it exactly as stated is that their frequency binning (alpha, eq 11) is slightly different from ours: centered (theirs) vs left-aligned (ours, inherited from earlier work on cqt toolbox) in the frequency domain, and this leads to slightly different calculations for gamma. There are reasonable arguments to be had for both ways of specifying the bins, and I could envision supporting both modes down the road. @lostanlen and I whiteboarded this back in november, but didn't really come to a conclusion about how to proceed. I think this is ultimately a separate issue, but whatever we implement here should be consistent with how the bandwidths are set.

bmcfee · 2020-01-06T19:08:41Z

(sorry, did not mean to close PR)

bmcfee · 2020-05-24T20:23:14Z

Just rebased this one to current master so we can get tests running again.

Aside from @cwitkowitz 's other pr on this #1045 , there are still a few points we've yet to resolve. The big one IMO has to do with centering vs left-aligning frequency ranges, since that's a huge difference between us and the reference implementation / paper.

Arguments for keeping it left-aligned:

No changes to the code (we already do this)
Semantics of fmin are clear (does not depend on Q or other parameters)

Arguments for centering:

Consistency with literature
Arguably, consistency (in the abstract) with other parts of the library (eg frame alignment)

Other minor things to work out:

deprecate filter_scale?
Separate or same functions?
- On this, I'd be okay with having everything implemented in cqt with default gamma=0, and then making vqt a thin wrapper on cqt that sets the default gamma=None (implemented via Added default gamma #1045) and otherwise does no work. Sound good?

bmcfee · 2020-05-28T21:06:41Z

One slight hiccup to the idea of merging cqt and vqt as one function with a light wrapper: any exceptions / warnings raised could get confusing since they explicitly say CQT or VQT in the current implementation. I'd be okay with rewording the exceptions to avoid this problem.

The other minor barrier to merging implementations is that the vqt is going to be a little slower than the cqt, since it's building new bases for each octave. I don't think this constitutes a huge performance hit, but it's worth considering.

As a quick anecdote: example track at 44.1K gives these timings on my machine:

In [11]: %timeit librosa.cqt(y=y, sr=sr, bins_per_octave=5*12, n_bins=5*12*7)                                                                                                                 
1.57 s ± 7.81 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [14]: %timeit librosa.vqt(y=y, sr=sr, bins_per_octave=5*12, n_bins=5*12*7, gamma=0)                                                                                                        
1.59 s ± 31.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

So, not appreciably different for a decently representative use case.

bmcfee · 2020-05-28T21:44:31Z

Well that was a gnarly merge, but I think we're back in business.

bmcfee · 2020-05-28T23:35:31Z

Current status:

consolidate CQT and VQT implementations. CQT now wraps to VQT; exceptions are updated accordingly.
inverse VQT. This will follow the same pattern as the forward transform, and I anticipate no major headaches now that we have the forward path done.
griffinlim_vqt? Or should we just add a gamma= parameter to cqt and document that it works for both?

Things left to decide:

Should we deprecate filter_scale?
Do we want to implement a normalization mode following this comment:

I'm on the side of (1), but ideally we should have a utility to recover the littlewood-paley sum so that we can estimate dB from variable-Q transform.

This seems like a useful thing to have in general (and also for CQT), but maybe it doesn't need to be part of this PR?

librosa/core/constantq.py

bmcfee · 2020-06-12T13:57:23Z

Clearly, deep learning practitioners they care a lot about Diracs and sine sweeps, and very little about energy conservation.
It's also less code, less docs, and less tests on our end.

😆

Seriously though, I was envisioning this as a helper function like perceptual-weighting that you could use to normalize a cqt/vqt output after the fact. But I agree that it's not critical at this point / could be added later.

bmcfee · 2020-06-18T13:03:57Z

Any objection to merging this? It's been through quite a bit of preliminary review already.

cwitkowitz · 2020-06-18T13:20:55Z

Looks ready to me. I have a few questions.

Is there any plan to do anything regarding the bandwidth convention in the future?
What about the original implementation in the VQT paper?
Are we satisfied with the current wrapped solution instead?

From what I gather, future work so far consists only of adding an inverse VQT and griffinlim VQT.

bmcfee · 2020-06-18T13:29:37Z

Is there any plan to do anything regarding the bandwidth convention in the future?

No plan as of now. The most likely path that I see is that we add a switch to change from left-aligned to centered (in frequency) filters. I guess it won't matter much until we get around to implementing asymmetric windows.

What about the original implementation in the VQT paper?

In terms of the filter centering, or did you mean something else?

Are we satisfied with the current wrapped solution instead?

As in, cqt(...) = vqt(..., gamma=0)? Or something else?

I'm fine with it.

From what I gather, future work so far consists only of adding an inverse VQT and griffinlim VQT.

Those are the main things, yes. GL will be easy once ivqt is in place.

cwitkowitz · 2020-06-18T13:47:57Z

In terms of the filter centering, or did you mean something else?

I remember considering that implementation as a possible way to combine cqt/vqt efficiently. However, we decided to avoid it for now because of the alternate bandwidth convention. I'm not sure that there is any strong benefit to do it one way or the other. I'm just wondering where you stood on that.

As in, cqt(...) = vqt(..., gamma=0)? Or something else?

I'm fine with it.

Yes, that's what I'm referring to. Again my reason for asking this stems from the original implementation proposed in the VQT paper. I'm just wondering whether we get any speed benefits or anything else from the original implementation.

bmcfee · 2020-06-18T14:08:42Z

Yes, that's what I'm referring to. Again my reason for asking this stems from the original implementation proposed in the VQT paper. I'm just wondering whether we get any speed benefits or anything else from the original implementation.

Ah. I benchmarked it earlier on some representative examples, and the performance hit relative to pure CQT was extremely small, so IMO it's worth it to have a simpler code base.

I also have some performance improvements in mind #1144 that should make up the difference.

bmcfee · 2020-06-18T14:18:47Z

I remember considering that implementation as a possible way to combine cqt/vqt efficiently. However, we decided to avoid it for now because of the alternate bandwidth convention. I'm not sure that there is any strong benefit to do it one way or the other. I'm just wondering where you stood on that.

Sorry, forgot to respond to that earlier.

I think it's more or less independent of the cqt/vqt combination, since both can (maybe should?) be defined with centered bandwidth calculations. FWIW the original CQT toolbox paper used left-aligned, and then the implementation later moved to centered. Point is, I think both cqt and vqt can be implemented either way, and it doesn't change much, but we should provide a consistent interface. But this can come later IMO.

cwitkowitz · 2020-06-18T14:20:27Z

Makes sense. Well, I have no objections to merging.

bmcfee · 2020-06-18T17:16:07Z

I'll merge this tomorrow, unless any one objects.

bmcfee added enhancement Does this improve existing functionality? functionality Does this add new functionality? labels Nov 11, 2019

cwitkowitz reviewed Nov 11, 2019

View reviewed changes

librosa/filters.py Outdated Show resolved Hide resolved

lostanlen requested changes Nov 12, 2019

View reviewed changes

librosa/core/constantq.py Show resolved Hide resolved

librosa/core/constantq.py Show resolved Hide resolved

This was referenced Dec 3, 2019

Fixed calculation of maximum filter cutoff #1020

Merged

Normalise Mel filters to constant energy of 1 #1030

Closed

bmcfee closed this Jan 6, 2020

bmcfee reopened this Jan 6, 2020

cwitkowitz mentioned this pull request Feb 17, 2020

Added default gamma #1045

Merged

bmcfee mentioned this pull request Feb 19, 2020

[Discussion] CQT, VQT, energy preservation, and frequency band alignment #1073

Closed

bmcfee force-pushed the vqt branch from 3f43ab6 to c1d0628 Compare May 24, 2020 20:10

bmcfee added this to the 0.8.0 milestone May 24, 2020

bmcfee self-assigned this May 26, 2020

bmcfee force-pushed the vqt branch from 5bcfcfd to 1b39b36 Compare May 28, 2020 20:30

bmcfee force-pushed the vqt branch 2 times, most recently from d3d3008 to ea081f6 Compare May 28, 2020 21:40

bmcfee commented May 29, 2020

View reviewed changes

librosa/core/constantq.py Show resolved Hide resolved

bmcfee and others added 16 commits June 15, 2020 14:05

added vqt implementation

7133dcb

fixed gamma scaling in vqt

db238e3

Fixed calculation of maximum filter cutoff

3367845

added vqt implementation

01e5e65

added vqt implementation

d567f42

fixed gamma scaling in vqt

986dbbc

Fixed calculation of maximum filter cutoff

1a72213

Added default gamma

ee4a647

Mirrored CQT unit tests for VQT function

d0aebd1

Fixed order of paramters in vqt tests

22e3bf4

Changed values in tests for indicating VQT regression

2ccd644

Better description of alpha and percentage consistency in tests

1621609

fixed a weird copypasta

7803fce

consolidated cqt/vqt implementations

b020506

removed redundant vqt tests

3aae4a1

updated cqt filter documentation

dc40e69

bmcfee force-pushed the vqt branch from feafbb5 to dc40e69 Compare June 16, 2020 14:13

lostanlen approved these changes Jun 18, 2020

View reviewed changes

bmcfee merged commit c2417c1 into main Jun 18, 2020

bmcfee changed the title ~~[CR] Variable-Q transform~~ Variable-Q transform Jun 18, 2020

bmcfee deleted the vqt branch May 23, 2021 20:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Variable-Q transform #1018

Variable-Q transform #1018

bmcfee commented Nov 11, 2019

bmcfee commented Nov 11, 2019

bmcfee commented Nov 12, 2019

lostanlen commented Nov 12, 2019

bmcfee commented Nov 12, 2019

cwitkowitz commented Jan 6, 2020

bmcfee commented Jan 6, 2020

cwitkowitz commented Jan 6, 2020

bmcfee commented Jan 6, 2020

bmcfee commented Jan 6, 2020

bmcfee commented May 24, 2020

bmcfee commented May 28, 2020

bmcfee commented May 28, 2020

bmcfee commented May 28, 2020 •

edited

bmcfee commented Jun 12, 2020

bmcfee commented Jun 18, 2020

cwitkowitz commented Jun 18, 2020

bmcfee commented Jun 18, 2020

cwitkowitz commented Jun 18, 2020

bmcfee commented Jun 18, 2020

bmcfee commented Jun 18, 2020

cwitkowitz commented Jun 18, 2020

bmcfee commented Jun 18, 2020

Variable-Q transform #1018

Variable-Q transform #1018

Conversation

bmcfee commented Nov 11, 2019

What does this implement/fix? Explain your changes.

bmcfee commented Nov 11, 2019

bmcfee commented Nov 12, 2019

lostanlen commented Nov 12, 2019

bmcfee commented Nov 12, 2019

cwitkowitz commented Jan 6, 2020

bmcfee commented Jan 6, 2020

cwitkowitz commented Jan 6, 2020

bmcfee commented Jan 6, 2020

bmcfee commented Jan 6, 2020

bmcfee commented May 24, 2020

bmcfee commented May 28, 2020

bmcfee commented May 28, 2020

bmcfee commented May 28, 2020 • edited

bmcfee commented Jun 12, 2020

bmcfee commented Jun 18, 2020

cwitkowitz commented Jun 18, 2020

bmcfee commented Jun 18, 2020

cwitkowitz commented Jun 18, 2020

bmcfee commented Jun 18, 2020

bmcfee commented Jun 18, 2020

cwitkowitz commented Jun 18, 2020

bmcfee commented Jun 18, 2020

bmcfee commented May 28, 2020 •

edited