-
Notifications
You must be signed in to change notification settings - Fork 938
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DTW #298
Comments
That would be wonderful! I would love to see that. |
(Looping in @craffel and @dpwe) We've had some previous discussion of DTW in #41, and more or less concluded that: 1) dtw is independent of the other functionality in librosa, 2) there are already many implementations available, so we'd do best to not reinvent the wheel. (I'd also throw in 3: that adding a cython dependency for one feature may unnecessarily complicate the distribution process, but that's a minor issue.) I'm happy to reconsider these points, especially if we can provide a robust implementation that has demonstrable advantages. |
Agreed.
You should use numba, not Cython. It's more readable, and usually a bit faster. Here's my numba implementation: |
I generally agree with the readability bit, but isn't numba even worse for packaging due to the llvm dependency? |
Not with conda, it has always been installable easily and immediately (including llvm dependencies) with |
Sure. But since librosa isn't distributed via conda (see #202) the dependency chain is more complicated when pip installing. Not that this is a deal-breaker, but we already get enough complaints about the install process that I'd rather not complicate it any more than necessary. |
Okay, sure, I get your points and agree. BUT: Also, as beginners with Python tend to be some kind of overwhelmed by the mass of available packages, it would be nice to have a "standard" implementation available or getting things ready with just using librosa. Only the D matrix calculation is in a separate Cython file:
I think the pro is that it corresponds to Meinard's book so it can be referenced somehow. Don't want to start a big discussion. |
Oh, I certainly wasn't advocating making numba a dependency for librosa. Just that it has largely obsoleted Cython for my purposes
I think the point is that librosa covers common MIR-specific building blocks which were not widely available in the Python world otherwise (mfcc, beat detection, cqts, etc). DTW is not MIR-specific, and implementations are widely available. It would also add another dependency in order to run in reasonable time, be it Cython or numba or whatever (though I guess we could go through the nightmare of using scipy.weave). Just my 2c. |
Too late :)
Yeah, like I said, I'm open to persuasion here.
That's a good argument.. I'm convinced.
Also good, but do people actually use vanilla dtw? I imagine we'd want to bake in some options for the obvious extensions (start/end gullies, max step size, etc).
Yeah. Actually, one additional benefit of numba over cython here is that we can wrap it with a decorator that makes the dependency dynamic, eg: @decorator
def optional_jit(f):
try:
import numba
return numba.jit(f)
except ImportError:
return f so that the code will still run (albeit slowly) without numba installed. |
My must haves would be:
That would be the core functionality which could be easily extended with contraint regions or whatever. @craffel I haven't tried numba. Was a little bit reluctant of the "automatic" way it does things but we can totally try this. I still have the pure python version. |
No. People use different tweaks on DTW in practice. If we wanted to have a standard implementation, I would argue we'd need to include all these things.
Plus additive and multiplicative weightings for all of the step sizes and band path constraints. One thing I have come up against is that as you add more features, the implementation gets more and more slow compared to vanilla DTW (which is why I have two different versions for path-constrained and unconstrained). This gets really hairy as you add more features.
Yes, and another advantage is dynamically compiling different versions of the function depending on different call signatures (e.g. if someone passes an int for a float, or whatever, it won't barf, it'll just compile a new function). @stefan-balke not trying to be contrarian or argumentative here, partially just playing devil's advocate so that if something gets added it's the right thing! |
I would not agree with that. A standard implementation is not necessarily cutting-edge but could also be the baseline. Okay, in summary this feature set would satisfy common need:
With that I could already do some retrieval things, audio synchronization (maybe not opera recordings) and maybe also tracking of trajectories in spectrograms. But this totally depends on the purpose of librosa on how the featureset could explode.
No worries, I appreciate that. |
For the record, I just reran this notebook on the latest conda (py3.5, ubuntu, lenovo x1) and get comparable results for pairwise distance calculation:
Can we consolidate a list of commonly-used extensions/features, so as to go about this rationally? For my $2e-2, callable distance calculation and/or sparse matrix implementation would be nice to have. Couple that with bounded path deviations and we should be able to support arbitrarily long sequences. |
My point is that if we have an implementation which is missing a feature that someone wants/needs, they will not use it, and in practice people seem to use DTW in quite different flavors (for example, you and I are, in the scheme of things, working on the same problem, but our DTWs look quite different).
and
@stefan-balke's list looks OK to me, I would add the "gully" (@dpwe's term), i.e. tolerance to paths which do not cover the entirety of one sequence or the other
Yes, but then you waste a lot of time computing DTW paths through the infinite region.
Usually the pre-computed distance matrix is passed to the DTW function; the distances aren't computed on-the-fly. We don't want to reinvent
This sounds very hairy to me. |
It's easy enough to support both by checking for The reason I bring this up is that once you start talking about features like subsequence and bounded windows, then it's really unnecessary to compute the entire distance matrix. Given that, I'm not sure it makes sense to use the distance matrix as the only API.
I mean sparse |
I have always left it outside of the DTW implementation to compute distances - i.e. if someone only wants to compute part of a distance matrix, they'd do that before passing it to the DTW function.
I'm not sure how/if Cython/numba support |
I see how that works with rectangular blocks of D, but not for diagonal bands. Of course, if we support sparse input, that problem goes away (mostly). Right?
It works fine in numba. Cython might be much more difficult. |
So, assuming we call the module I would expect the following functions:
Some display functions. Would be modular enough to plug in custom cost functions and keeping it easy to use for beginners. |
It works for rectangular blocks using
Maybe? Depends on how it's implemented.
I'm not seeing any way of supplying bands/computing DTW for only a subportion of possible pairings. Or gullies. |
Yeah, it was more meant to be developed from this :) I mean, a general question for the implementation is if we first build the core vanilla dtw and then write wrapper functions for adding restricting regions or gullies. |
This is all sounding great. I'm not sure if wrappers for the fancier behavior will work out though; it seems like we'll probably have to go the opposite direction, having a fancy core algorithm with simplified wrappers. I would recommend getting a stable vanilla implementation first, and then extending its functionality later. It does seem like trying to get everything exactly right from the start may send us in circles. Meanwhile, maybe it would help to have a collection of use cases that motivate the various extensions? By this I mean concrete, but simple, problem instances. |
|
I think between @stefan-balke and me we have a decent coverage of use-cases. Ideally I'd like to be able to use it for all use-cases in my ICASSP paper (see the list in section 4), so that requires additive and multiplicative penalties, a band-path constraint, and gullies. djitw covers those use-cases. @stefan-balke I think also wants to be able to use arbitrary step sizes (i.e. not just (+1, +1), (+0, +1), and (+1, +0)), which is not something I've used myself. As a side note to think about, the more features you add to the DTW routine, the slower it gets. That's why I have a separate implementations for band-masked and unmask in djitw, and why I have a separate DTW implementaiton for when you only care about the score and not the path. But maybe not everyone will be as performance-crazy as me because they aren't trying to perform O(10^11) alignments. |
I think most of the features we're talking about here are optimizations? Variable step sizes will obviously increase complexity, but I think most everything else shouldn't impact efficiency too much if done properly. (FWIW, I see a few places that djitw could probably be made a bit faster using numpy primitives and np.ma, but that's beside the point...) At any rate: are you two ( @stefan-balke and @craffel ) up for this? Github only lets me assign one person to an issue, so does one of you want to take the lead here? |
The issue is that DTW at its core is extremely fast - two nested for loops and a switch statement - that once you add other things to it, even if those things are meant to make it faster, they often don't in practice (speaking from experience with audio-to-MIDI-alignment, at least). For example, additive and mulitplicative penalties both add at least one operation each; that slows things down. In my experience, any sort of path constraints tended to hurt speed on average, too, due to the extra conditionals/table lookup (the argument for path constraints is often better accuracy, not speed, see e.g. here section 4). Just my experience. But it's potentially a moot point if we just want a single, reasonably fast DTW that supports most common use-cases; it just complicates things when we start talking about optimizations.
numba doesn't support ma (as I recall), and better use of numpy primitives/features does not always translate to faster numba code - it's designed to optimize the simplest/dumbest usage patterns (like explicitly iterating over a numpy array).
I can start by adding my djitw code. I had been meaning to add support for arbitrary step sizes and a test suite, which are the main things it'd need to be included here, but had punted it until after my thesis. It may be smart to make this collaborative in some way (e.g. a WIP PR from a fork we can all contribute to), to avoid roadblocks due to me getting pulled away by thesis obligations. |
👍 |
Arbitrary step sizes means that we have to deal with the the edge cases a lot. We usually try to overcome this by appending infinity cols/rows. I'll try to have a look at @craffel code at the weekend and see how mergeable our two approaches are.
Maybe can @bmcfee create that one with the correct module naming. |
Nah, I trust you two to do it right. 🏆 |
I met with @craffel in Shanghai and started to work on this. Here is the working branch: |
Any suggestions for unit-tests? Until now, I have a really small toy example to test basic dynamic programming functionality. For subsequence DTW, I would create something like this:
The minimum of the matching-function is known and would serve as ground-truth. Suggestions are welcome. P.S.: @craffel still missing some features but when we have a "complete" dtw, I'll provide some benchmarks as we discussed. The question is if more conditionals influence the performance significantly. |
@bmcfee time-stretching is a global thing at the moment, or? Would be cool to hand-over a warping path to the time-stretcher and warp the audio though... Jonathan did a nice review article recently about Time-Scale Modification: |
A simple toy example where you can compute the best path, for different parameter settings, by hand beforehand and hard-code it in seems smart to me.
Yes, another option would be to synthesize a cycle of a sine wave and then synthesize a 1/2 cycle at some random phase, and then try to recover that phase + a linear path.
Sweet. |
Currently, yes. My original re-implementation of phase vocoding was for the dynamic case though, and used to power this hack. In principle, it shouldn't be hard to extend the api to support this directly, though I had trouble coming up with a clean interface for it. If you want to take a crack at it though, PRs are always welcome!
Can we make this simpler? How about a linear ramp, concatenated with noise of varying magnitude and duration? |
That doesn't sound any simpler to me, but either way seems fine. |
It's one fewer function ( |
Alright, will do so. |
While I am thinking of it, I have code for doing some DTW result plotting here: |
Cool, I've something similar. Will make a 'BestOf' again. |
Closed via #323. |
oops! I missed something in the CR... @stefan-balke can you change the |
Here you go... |
Wait, the squashing was stupid. I'll send a patch as PR. |
#393 will fix it. |
Nice! Looking forward to trying this. Sorry I did not help out in any tangible capacity. |
A preliminary look at performance, is there a benchmark of the various DTW implementations? NOTE: This line in core\audio.py is causing an error, if I change the index from 2 to 0, I can import. x=np.random.randint(0,2,(1000,1)) |
Hey there,
I recently did a DTW implementation based on Meinard's book with some Cython speed up for the dynamic programming.
As this is not yet reflected in librosa, I wondered if we could do a module for music syncronization.
Best
Stefan
The text was updated successfully, but these errors were encountered: