Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Variable-Q transform #1018

Merged
merged 16 commits into from Jun 18, 2020
Merged

Variable-Q transform #1018

merged 16 commits into from Jun 18, 2020

Conversation

bmcfee
Copy link
Member

@bmcfee bmcfee commented Nov 11, 2019

What does this implement/fix? Explain your changes.

This PR extends CQT to add support for variable-Q transforms. It's a work in progress, but the core functionality seems to work already. A new function (vqt) has been added that is similar to CQT, but allows for different filterbanks to be used in each downsampled octave.

I'm opening this PR now to prevent duplication of effort between this and #1016 .

As I mentioned above, I don't consider this "done" yet, for a few reasons:

  • The reference papers for CQT and VQT use slightly different conventions for determining filter bandwidths (low-frequency aligned vs centered), and we should reconcile this before merging.
  • There may be an issue of normalization / gain correction that we need to sort out. @lostanlen has thoughts about this (discussed offline), but we haven't had time to thoroughly work through it all. Getting this done may require introducing some new normalization or scaling parameters, or changing the behavior of cqt in a backwards-incompatible way.

@bmcfee bmcfee added enhancement Does this improve existing functionality? functionality Does this add new functionality? labels Nov 11, 2019
@bmcfee
Copy link
Member Author

bmcfee commented Nov 11, 2019

Tagging @cwitkowitz

librosa/filters.py Outdated Show resolved Hide resolved
@bmcfee
Copy link
Member Author

bmcfee commented Nov 12, 2019

One question that we should sort out sooner rather than later: does it make sense to have an entirely separate function for vqt, or should we expand cqt to support vqt as a general case?

Arguments for two functions:

  • Keeps CQT simple
  • Easy to discover by documentation

Arguments for one function:

  • Minimizes redundancy in the API
  • Easier to maintain

Thoughts?

@lostanlen
Copy link
Contributor

Wow! A historic day for librosa!

There may be an issue of normalization / gain correction that we need to sort out. @lostanlen has thoughts about this (discussed offline), but we haven't had time to thoroughly work through it all. Getting this done may require introducing some new normalization or scaling parameters, or changing the behavior of cqt in a backwards-incompatible way.

Yes. It's a dilemma:
(1) if we don't renormalize, then there is no energy conservation, which means that measuring loudness (after perceptual_weighting) will be numerically wrong in ways that are difficult to interpret
(2) if we do renormalize, then the maximum amplitude of a single sine wave (or slow AM/FM additive model) will not be consistent across the CQT axis, which will cause problem for CNNs in the time--frequency domain after max-pooling

I'm on the side of (1), but ideally we should have a utility to recover the littlewood-paley sum so that we can estimate dB from variable-Q transform.

Another thing to note is that variable-Q looks a bit like a pun on constant-Q, and the VQT acronym is not really established. The received terminology is "Nonstationary Gabor transform" (NSGT), but the problem is that it's restrictive on the type of wavelet (Gabor means Gaussian envelope, whereas we use a Hann envelope by default), so technically it isn't right either. But that's something to keep in mind for the docs.

@bmcfee
Copy link
Member Author

bmcfee commented Nov 12, 2019

I'm on the side of (1), but ideally we should have a utility to recover the littlewood-paley sum so that we can estimate dB from variable-Q transform.

Another option is to add a new scaling mode, analogous to how fft has mode='ortho', in case people want that behavior. Or, we could do as suggested elsewhere (offline?) and build in the scaling to any perceptual weighting function, but that seems a little messy from the API perspective. I'm not sure there's a good solution overall here.

The received terminology is "Nonstationary Gabor transform" (NSGT), but the problem is that it's restrictive on the type of wavelet (Gabor means Gaussian envelope, whereas we use a Hann envelope by default), so technically it isn't right either.

Indeed, that's why I tend to prefer VQT over NSGT, even though it's a relatively non-informative name. (Technically, a DFT has variable Q as well...)

librosa/core/constantq.py Show resolved Hide resolved
librosa/core/constantq.py Show resolved Hide resolved
@cwitkowitz
Copy link
Contributor

I am starting to think that our implementation might be cleaner if we choose instead to combine the CQT and VQT functions in constantq.py.

From what I can tell, the only differences (assuming gamma=0 for CQT and gamma>0 or gamma=None for VQT) between the two functions is the generation and/or rescaling of the filters for each iteration (as opposed to reusing the same bases by simply downsampling them). I think we could probably make a few tweaks so that we have the VQT function looking similar to the way it does now, with the CQT function simply calling VQT with gamma=0.

I think it would be easier to test this way as well.

Thoughts?

@bmcfee
Copy link
Member Author

bmcfee commented Jan 6, 2020

I am starting to think that our implementation might be cleaner if we choose instead to combine the CQT and VQT functions in constantq.py.

I raised this point earlier here. I'm still not convinced that they should be the same function, since that could confuse the API. Probably the best solution is to factor out the commonalities and put some light-weight API on top so that we still have cqt and vqt, but minimize redundant code.

I agree with your assessment that the only real difference is the rescaling logic for each octave. If we can generalize this to support gamma=0 directly, and not introduce additional branching paths (conditionals within the downsampling loop), that would be ideal.

@cwitkowitz
Copy link
Contributor

I think that's the right way to move forward. What we have now seems to work well, so I wouldn't be opposed to adding some test cases and inserting it into master for now. I just feel like the functions are too similar to leave by themselves. As I mentioned in my current pull request for the VQT default gamma, maybe we should consider the implementation presented in the original VQT paper http://www.cs.tut.fi/sgn/arg/CQT/ (which we would have to port over to python), as it is my understanding that it covers all the CQT/VQT bases, and has a generalized inverse.

If we want to just finish what we have now, I need to add another comment addressing the ERB constants, and then I could essentially copy the CQT test cases for the VQT function. Then, I believe we could call it a pull request (correct me if I am wrong).

@bmcfee
Copy link
Member Author

bmcfee commented Jan 6, 2020

I think that's the right way to move forward.

👍 but let's get everything stable and correct first, then worry about refactoring.

As I mentioned in my current pull request for the VQT default gamma, maybe we should consider the implementation presented in the original VQT paper http://www.cs.tut.fi/sgn/arg/CQT/ (which we would have to port over to python), as it is my understanding that it covers all the CQT/VQT bases, and has a generalized inverse.

Indeed -- this is more or less what I had in mind for the default gamma setting. My main concern with implementing it exactly as stated is that their frequency binning (alpha, eq 11) is slightly different from ours: centered (theirs) vs left-aligned (ours, inherited from earlier work on cqt toolbox) in the frequency domain, and this leads to slightly different calculations for gamma. There are reasonable arguments to be had for both ways of specifying the bins, and I could envision supporting both modes down the road. @lostanlen and I whiteboarded this back in november, but didn't really come to a conclusion about how to proceed. I think this is ultimately a separate issue, but whatever we implement here should be consistent with how the bandwidths are set.

@bmcfee bmcfee closed this Jan 6, 2020
@bmcfee bmcfee reopened this Jan 6, 2020
@bmcfee
Copy link
Member Author

bmcfee commented Jan 6, 2020

(sorry, did not mean to close PR)

@bmcfee
Copy link
Member Author

bmcfee commented May 24, 2020

Just rebased this one to current master so we can get tests running again.

Aside from @cwitkowitz 's other pr on this #1045 , there are still a few points we've yet to resolve. The big one IMO has to do with centering vs left-aligning frequency ranges, since that's a huge difference between us and the reference implementation / paper.

Arguments for keeping it left-aligned:

  • No changes to the code (we already do this)
  • Semantics of fmin are clear (does not depend on Q or other parameters)

Arguments for centering:

  • Consistency with literature
  • Arguably, consistency (in the abstract) with other parts of the library (eg frame alignment)

Other minor things to work out:

  • deprecate filter_scale?
  • Separate or same functions?
    • On this, I'd be okay with having everything implemented in cqt with default gamma=0, and then making vqt a thin wrapper on cqt that sets the default gamma=None (implemented via Added default gamma #1045) and otherwise does no work. Sound good?

@bmcfee
Copy link
Member Author

bmcfee commented May 28, 2020

One slight hiccup to the idea of merging cqt and vqt as one function with a light wrapper: any exceptions / warnings raised could get confusing since they explicitly say CQT or VQT in the current implementation. I'd be okay with rewording the exceptions to avoid this problem.

The other minor barrier to merging implementations is that the vqt is going to be a little slower than the cqt, since it's building new bases for each octave. I don't think this constitutes a huge performance hit, but it's worth considering.

As a quick anecdote: example track at 44.1K gives these timings on my machine:

In [11]: %timeit librosa.cqt(y=y, sr=sr, bins_per_octave=5*12, n_bins=5*12*7)                                                                                                                 
1.57 s ± 7.81 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [14]: %timeit librosa.vqt(y=y, sr=sr, bins_per_octave=5*12, n_bins=5*12*7, gamma=0)                                                                                                        
1.59 s ± 31.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

So, not appreciably different for a decently representative use case.

@bmcfee bmcfee force-pushed the vqt branch 2 times, most recently from d3d3008 to ea081f6 Compare May 28, 2020 21:40
@bmcfee
Copy link
Member Author

bmcfee commented May 28, 2020

Well that was a gnarly merge, but I think we're back in business.

@bmcfee
Copy link
Member Author

bmcfee commented May 28, 2020

Current status:

  • consolidate CQT and VQT implementations. CQT now wraps to VQT; exceptions are updated accordingly.
  • inverse VQT. This will follow the same pattern as the forward transform, and I anticipate no major headaches now that we have the forward path done.
  • griffinlim_vqt? Or should we just add a gamma= parameter to cqt and document that it works for both?

Things left to decide:

  • Should we deprecate filter_scale?
  • Do we want to implement a normalization mode following this comment:

I'm on the side of (1), but ideally we should have a utility to recover the littlewood-paley sum so that we can estimate dB from variable-Q transform.

This seems like a useful thing to have in general (and also for CQT), but maybe it doesn't need to be part of this PR?

@bmcfee
Copy link
Member Author

bmcfee commented Jun 12, 2020

Clearly, deep learning practitioners they care a lot about Diracs and sine sweeps, and very little about energy conservation.
It's also less code, less docs, and less tests on our end.

😆

Seriously though, I was envisioning this as a helper function like perceptual-weighting that you could use to normalize a cqt/vqt output after the fact. But I agree that it's not critical at this point / could be added later.

@bmcfee
Copy link
Member Author

bmcfee commented Jun 18, 2020

Any objection to merging this? It's been through quite a bit of preliminary review already.

@cwitkowitz
Copy link
Contributor

Looks ready to me. I have a few questions.

Is there any plan to do anything regarding the bandwidth convention in the future?
What about the original implementation in the VQT paper?
Are we satisfied with the current wrapped solution instead?

From what I gather, future work so far consists only of adding an inverse VQT and griffinlim VQT.

@bmcfee
Copy link
Member Author

bmcfee commented Jun 18, 2020

Is there any plan to do anything regarding the bandwidth convention in the future?

No plan as of now. The most likely path that I see is that we add a switch to change from left-aligned to centered (in frequency) filters. I guess it won't matter much until we get around to implementing asymmetric windows.

What about the original implementation in the VQT paper?

In terms of the filter centering, or did you mean something else?

Are we satisfied with the current wrapped solution instead?

As in, cqt(...) = vqt(..., gamma=0)? Or something else?

I'm fine with it.

From what I gather, future work so far consists only of adding an inverse VQT and griffinlim VQT.

Those are the main things, yes. GL will be easy once ivqt is in place.

@cwitkowitz
Copy link
Contributor

In terms of the filter centering, or did you mean something else?

I remember considering that implementation as a possible way to combine cqt/vqt efficiently. However, we decided to avoid it for now because of the alternate bandwidth convention. I'm not sure that there is any strong benefit to do it one way or the other. I'm just wondering where you stood on that.

As in, cqt(...) = vqt(..., gamma=0)? Or something else?

I'm fine with it.

Yes, that's what I'm referring to. Again my reason for asking this stems from the original implementation proposed in the VQT paper. I'm just wondering whether we get any speed benefits or anything else from the original implementation.

@bmcfee
Copy link
Member Author

bmcfee commented Jun 18, 2020

Yes, that's what I'm referring to. Again my reason for asking this stems from the original implementation proposed in the VQT paper. I'm just wondering whether we get any speed benefits or anything else from the original implementation.

Ah. I benchmarked it earlier on some representative examples, and the performance hit relative to pure CQT was extremely small, so IMO it's worth it to have a simpler code base.

I also have some performance improvements in mind #1144 that should make up the difference.

@bmcfee
Copy link
Member Author

bmcfee commented Jun 18, 2020

I remember considering that implementation as a possible way to combine cqt/vqt efficiently. However, we decided to avoid it for now because of the alternate bandwidth convention. I'm not sure that there is any strong benefit to do it one way or the other. I'm just wondering where you stood on that.

Sorry, forgot to respond to that earlier.

I think it's more or less independent of the cqt/vqt combination, since both can (maybe should?) be defined with centered bandwidth calculations. FWIW the original CQT toolbox paper used left-aligned, and then the implementation later moved to centered. Point is, I think both cqt and vqt can be implemented either way, and it doesn't change much, but we should provide a consistent interface. But this can come later IMO.

@cwitkowitz
Copy link
Contributor

Makes sense. Well, I have no objections to merging.

@bmcfee
Copy link
Member Author

bmcfee commented Jun 18, 2020

I'll merge this tomorrow, unless any one objects.

@bmcfee bmcfee merged commit c2417c1 into main Jun 18, 2020
@bmcfee bmcfee changed the title [CR] Variable-Q transform Variable-Q transform Jun 18, 2020
@bmcfee bmcfee deleted the vqt branch May 23, 2021 20:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Does this improve existing functionality? functionality Does this add new functionality?
Development

Successfully merging this pull request may close these issues.

None yet

3 participants