Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DTW #298

Closed
stefan-balke opened this issue Jan 26, 2016 · 45 comments
Closed

DTW #298

stefan-balke opened this issue Jan 26, 2016 · 45 comments
Labels
discussion Open-ended discussion for developers and users functionality Does this add new functionality?
Milestone

Comments

@stefan-balke
Copy link
Member

Hey there,

I recently did a DTW implementation based on Meinard's book with some Cython speed up for the dynamic programming.
As this is not yet reflected in librosa, I wondered if we could do a module for music syncronization.

Best
Stefan

@sonovice
Copy link
Contributor

That would be wonderful! I would love to see that.

@bmcfee
Copy link
Member

bmcfee commented Jan 26, 2016

(Looping in @craffel and @dpwe)

We've had some previous discussion of DTW in #41, and more or less concluded that: 1) dtw is independent of the other functionality in librosa, 2) there are already many implementations available, so we'd do best to not reinvent the wheel. (I'd also throw in 3: that adding a cython dependency for one feature may unnecessarily complicate the distribution process, but that's a minor issue.)

I'm happy to reconsider these points, especially if we can provide a robust implementation that has demonstrable advantages.

@bmcfee bmcfee added functionality Does this add new functionality? discussion Open-ended discussion for developers and users labels Jan 26, 2016
@craffel
Copy link
Contributor

craffel commented Jan 26, 2016

We've had some previous discussion of DTW in #41, and more or less concluded that: 1) dtw is independent of the other functionality in librosa, 2) there are already many implementations available, so we'd do best to not reinvent the wheel. (I'd also throw in 3: that adding a cython dependency for one feature may unnecessarily complicate the distribution process, but that's a minor issue.)

Agreed.

I recently did a DTW implementation based on Meinard's book with some Cython speed up for the dynamic programming.

You should use numba, not Cython. It's more readable, and usually a bit faster. Here's my numba implementation:
https://github.com/craffel/djitw/blob/master/djitw/dtw.py

@bmcfee
Copy link
Member

bmcfee commented Jan 26, 2016

You should use numba, not Cython. It's more readable, and usually a bit faster.

I generally agree with the readability bit, but isn't numba even worse for packaging due to the llvm dependency?

@craffel
Copy link
Contributor

craffel commented Jan 26, 2016

I generally agree with the readability bit, but isn't numba even worse for packaging due to the llvm dependency?

Not with conda, it has always been installable easily and immediately (including llvm dependencies) with conda install numba. Without conda, it used to be (about a year ago) that installing the right version of llvm and llvmlight took a lot of command line bullshittery, but it's much better now.

@bmcfee
Copy link
Member

bmcfee commented Jan 26, 2016

Not with conda,

Sure. But since librosa isn't distributed via conda (see #202) the dependency chain is more complicated when pip installing. Not that this is a deal-breaker, but we already get enough complaints about the install process that I'd rather not complicate it any more than necessary.

@stefan-balke
Copy link
Member Author

Okay, sure, I get your points and agree.

BUT:
In my understanding, librosa wants to cover several MIR tasks and some standard approaches to solve them. Usually no rocket science, more "tools".
DTW is clearly missing here.

Also, as beginners with Python tend to be some kind of overwhelmed by the mass of available packages, it would be nice to have a "standard" implementation available or getting things ready with just using librosa.
My implementation is based on this one here (nothing fancy):
http://nbviewer.jupyter.org/github/stefan-balke/mpa-exc/blob/master/dtw.ipynb

Only the D matrix calculation is in a separate Cython file:

def get_accumulated_cost_matrix(np.ndarray[np.float64_t, ndim=2] C,
                                np.ndarray[np.float64_t, ndim=2] D,
                                np.ndarray[np.int_t, ndim=2] step_sizes_sigma,
                                np.ndarray[np.int_t, ndim=1] step_weights,
                                int max_0, int max_1):
    cdef int n, m, cur_step_size_idx
    cdef int D_shape_0 = D.shape[0]
    cdef int D_shape_1 = D.shape[1]
    cdef np.float64_t cur_cost
    cdef int steps = step_sizes_sigma.shape[0]

    # fill D with values from C
    for n in range(max_0, D_shape_0):
        for m in range(max_1, D_shape_1):
            # loop over possible step sizes
            for cur_step_size_idx in range(steps):
                cur_cost = D[n-step_sizes_sigma[cur_step_size_idx, 0], m-step_sizes_sigma[cur_step_size_idx, 1]]
                cur_cost += step_weights[cur_step_size_idx] * C[n-max_0, m-max_1]

                D[n, m] = min(cur_cost, D[n, m])

    return D

I think the pro is that it corresponds to Meinard's book so it can be referenced somehow.

Don't want to start a big discussion.
Main point is that I miss it in librosa.

@craffel
Copy link
Contributor

craffel commented Jan 26, 2016

Sure. But since librosa isn't distributed via conda (see #202) the dependency chain is more complicated when pip installing. Not that this is a deal-breaker, but we already get enough complaints about the install process that I'd rather not complicate it any more than necessary.

Oh, I certainly wasn't advocating making numba a dependency for librosa. Just that it has largely obsoleted Cython for my purposes

BUT:
In my understanding, librosa wants to cover several MIR tasks and some standard approaches to solve them. Usually no rocket science, more "tools".
DTW is clearly missing here.

I think the point is that librosa covers common MIR-specific building blocks which were not widely available in the Python world otherwise (mfcc, beat detection, cqts, etc). DTW is not MIR-specific, and implementations are widely available. It would also add another dependency in order to run in reasonable time, be it Cython or numba or whatever (though I guess we could go through the nightmare of using scipy.weave). Just my 2c.

@bmcfee
Copy link
Member

bmcfee commented Jan 26, 2016

Don't want to start a big discussion.

Too late :)

In my understanding, librosa wants to cover several MIR tasks and some standard approaches to solve them. Usually no rocket science, more "tools".
DTW is clearly missing here.

Yeah, like I said, I'm open to persuasion here.

Also, as beginners with Python tend to be some kind of overwhelmed by the mass of available packages, it would be nice to have a "standard" implementation available or getting things ready with just using librosa.

That's a good argument.. I'm convinced.

I think the pro is that it corresponds to Meinard's book so it can be referenced somehow.

Also good, but do people actually use vanilla dtw? I imagine we'd want to bake in some options for the obvious extensions (start/end gullies, max step size, etc).

Oh, I certainly wasn't advocating making numba a dependency for librosa.

Yeah. Actually, one additional benefit of numba over cython here is that we can wrap it with a decorator that makes the dependency dynamic, eg:

@decorator
def optional_jit(f):
    try:
        import numba
        return numba.jit(f)
    except ImportError:
        return f

so that the code will still run (albeit slowly) without numba installed.

@stefan-balke
Copy link
Member Author

Also good, but do people actually use vanilla dtw?

My must haves would be:

  • Arbitrary step sizes
  • Subsequence (so it can be used for matching)

That would be the core functionality which could be easily extended with contraint regions or whatever.

@craffel I haven't tried numba. Was a little bit reluctant of the "automatic" way it does things but we can totally try this. I still have the pure python version.

@craffel
Copy link
Contributor

craffel commented Jan 26, 2016

Also good, but do people actually use vanilla dtw?

No. People use different tweaks on DTW in practice. If we wanted to have a standard implementation, I would argue we'd need to include all these things.

My must haves would be:
Arbitrary step sizes
Subsequence (so it can be used for matching)

Plus additive and multiplicative weightings for all of the step sizes and band path constraints. One thing I have come up against is that as you add more features, the implementation gets more and more slow compared to vanilla DTW (which is why I have two different versions for path-constrained and unconstrained). This gets really hairy as you add more features.

Yeah. Actually, one additional benefit of numba over cython here is that we can wrap it with a decorator that makes the dependency dynamic, eg:

Yes, and another advantage is dynamically compiling different versions of the function depending on different call signatures (e.g. if someone passes an int for a float, or whatever, it won't barf, it'll just compile a new function).

@stefan-balke not trying to be contrarian or argumentative here, partially just playing devil's advocate so that if something gets added it's the right thing!

@stefan-balke
Copy link
Member Author

If we wanted to have a standard implementation, I would argue we'd need to include all these things.

I would not agree with that. A standard implementation is not necessarily cutting-edge but could also be the baseline.

Okay, in summary this feature set would satisfy common need:

  • Arbitrary step sizes
  • Additive or multiplicative local weights for the steps
  • Subsequence (so it can be used for matching)
  • Global path constraints (e.g. Sakoe-Chiba band etc.) [btw: Isn't that easily realized with initializing D with infinite cost outside the allowed region?]

With that I could already do some retrieval things, audio synchronization (maybe not opera recordings) and maybe also tracking of trajectories in spectrograms.

But this totally depends on the purpose of librosa on how the featureset could explode.

just playing devil's advocate so that if something gets added it's the right thing

No worries, I appreciate that.

@bmcfee
Copy link
Member

bmcfee commented Jan 26, 2016

For the record, I just reran this notebook on the latest conda (py3.5, ubuntu, lenovo x1) and get comparable results for pairwise distance calculation:

image

This gets really hairy as you add more features.

Can we consolidate a list of commonly-used extensions/features, so as to go about this rationally?

For my $2e-2, callable distance calculation and/or sparse matrix implementation would be nice to have. Couple that with bounded path deviations and we should be able to support arbitrarily long sequences.

@craffel
Copy link
Contributor

craffel commented Jan 26, 2016

I would not agree with that. A standard implementation is not necessarily cutting-edge but could also be the baseline.

My point is that if we have an implementation which is missing a feature that someone wants/needs, they will not use it, and in practice people seem to use DTW in quite different flavors (for example, you and I are, in the scheme of things, working on the same problem, but our DTWs look quite different).

Okay, in summary this feature set would satisfy common need: ...

and

Can we consolidate a list of commonly-used extensions/features, so as to go about this rationally?

@stefan-balke's list looks OK to me, I would add the "gully" (@dpwe's term), i.e. tolerance to paths which do not cover the entirety of one sequence or the other

[btw: Isn't that easily realized with initializing D with infinite cost outside the allowed region?]

Yes, but then you waste a lot of time computing DTW paths through the infinite region.

callable distance calculation

Usually the pre-computed distance matrix is passed to the DTW function; the distances aren't computed on-the-fly. We don't want to reinvent scipy.spatial.distance.cdist.

sparse matrix implementation

This sounds very hairy to me.

@bmcfee
Copy link
Member

bmcfee commented Jan 26, 2016

Usually the pre-computed distance matrix is passed to the DTW function; the distances aren't computed on-the-fly. We don't want to reinvent scipy.spatial.distance.cdist.

It's easy enough to support both by checking for callable. We wouldn't need to reinvent cdist, since it can support callable distances as well.

The reason I bring this up is that once you start talking about features like subsequence and bounded windows, then it's really unnecessary to compute the entire distance matrix. Given that, I'm not sure it makes sense to use the distance matrix as the only API.

sparse matrix implementation
This sounds very hairy to me.

I mean sparse D.. why should that be difficult?

@craffel
Copy link
Contributor

craffel commented Jan 26, 2016

The reason I bring this up is that once you start talking about features like subsequence and bounded windows, then it's really unnecessary to compute the entire distance matrix. Given that, I'm not sure it makes sense to use the distance matrix as the only API.

I have always left it outside of the DTW implementation to compute distances - i.e. if someone only wants to compute part of a distance matrix, they'd do that before passing it to the DTW function.

I mean sparse D.. why should that be difficult?

I'm not sure how/if Cython/numba support scipy.sparse, for example, and it seems a little over-the-top to try to support some native format. Maybe we're not talking about the same thing.

@bmcfee
Copy link
Member

bmcfee commented Jan 26, 2016

I have always left it outside of the DTW implementation to compute distances - i.e. if someone only wants to compute part of a distance matrix, they'd do that before passing it to the DTW function.

I see how that works with rectangular blocks of D, but not for diagonal bands. Of course, if we support sparse input, that problem goes away (mostly). Right?

'm not sure how/if Cython/numba support scipy.sparse, for example, and it seems a little over-the-top to try to support some native format. Maybe we're not talking about the same thing.

It works fine in numba. Cython might be much more difficult.

@stefan-balke
Copy link
Member Author

So, assuming we call the module dtw.py.

I would expect the following functions:

  • cost_matrix(feat_seq_1, feat_seq_2, distance='euclidean')
    • distance is handed over to scipy.spatial.distance.cdist
    • returns cost_matrix
  • accumulated_cost_matrix(cost_matrix, step_sizes_sigma=np.array([[1, 1], [0, 1], [1, 0]]), step_weights=np.array([1, 1, 1]), subsequence=False)
    • returns accumulated_cost_matrix
    • Supporting subsequence DTW is just a matter of initialization
    • computationally heavy
    • Optional: Store step sizes in a matrix to improve backtracking
  • warping_path(accumulated_cost_matrix, step_sizes)
    • Backtracking
    • returns index pairs

Some display functions.

Would be modular enough to plug in custom cost functions and keeping it easy to use for beginners.

@craffel
Copy link
Contributor

craffel commented Jan 26, 2016

I see how that works with rectangular blocks of D, but not for diagonal bands.

It works for rectangular blocks using scipy.spatial.distance.cdist; it works for other subportions of D using a hand-rolled distance matrix calculation function.

Of course, if we support sparse input, that problem goes away (mostly). Right?

Maybe? Depends on how it's implemented.

I would expect the following functions:

I'm not seeing any way of supplying bands/computing DTW for only a subportion of possible pairings. Or gullies.

@stefan-balke
Copy link
Member Author

I'm not seeing any way of supplying bands/computing DTW for only a subportion of possible pairings.

Yeah, it was more meant to be developed from this :)

I mean, a general question for the implementation is if we first build the core vanilla dtw and then write wrapper functions for adding restricting regions or gullies.

@bmcfee
Copy link
Member

bmcfee commented Jan 27, 2016

This is all sounding great.

I'm not sure if wrappers for the fancier behavior will work out though; it seems like we'll probably have to go the opposite direction, having a fancy core algorithm with simplified wrappers.

I would recommend getting a stable vanilla implementation first, and then extending its functionality later. It does seem like trying to get everything exactly right from the start may send us in circles.

Meanwhile, maybe it would help to have a collection of use cases that motivate the various extensions? By this I mean concrete, but simple, problem instances.

@stefan-balke
Copy link
Member Author

  • Step sizes: Regularization of the steepness of the warping path
  • Step weights: Avoid degenerated paths
  • Applications:
    • Synchronization: Keep tempo of two sequences similar.
  • Global path constraints: Sync long audio recordings in reasonable time (operas?)

@craffel
Copy link
Contributor

craffel commented Jan 27, 2016

I think between @stefan-balke and me we have a decent coverage of use-cases. Ideally I'd like to be able to use it for all use-cases in my ICASSP paper (see the list in section 4), so that requires additive and multiplicative penalties, a band-path constraint, and gullies. djitw covers those use-cases. @stefan-balke I think also wants to be able to use arbitrary step sizes (i.e. not just (+1, +1), (+0, +1), and (+1, +0)), which is not something I've used myself.

As a side note to think about, the more features you add to the DTW routine, the slower it gets. That's why I have a separate implementations for band-masked and unmask in djitw, and why I have a separate DTW implementaiton for when you only care about the score and not the path. But maybe not everyone will be as performance-crazy as me because they aren't trying to perform O(10^11) alignments.

@bmcfee
Copy link
Member

bmcfee commented Jan 28, 2016

As a side note to think about, the more features you add to the DTW routine, the slower it gets.

I think most of the features we're talking about here are optimizations? Variable step sizes will obviously increase complexity, but I think most everything else shouldn't impact efficiency too much if done properly. (FWIW, I see a few places that djitw could probably be made a bit faster using numpy primitives and np.ma, but that's beside the point...)

At any rate: are you two ( @stefan-balke and @craffel ) up for this? Github only lets me assign one person to an issue, so does one of you want to take the lead here?

@craffel
Copy link
Contributor

craffel commented Jan 28, 2016

I think most of the features we're talking about here are optimizations?

The issue is that DTW at its core is extremely fast - two nested for loops and a switch statement - that once you add other things to it, even if those things are meant to make it faster, they often don't in practice (speaking from experience with audio-to-MIDI-alignment, at least). For example, additive and mulitplicative penalties both add at least one operation each; that slows things down. In my experience, any sort of path constraints tended to hurt speed on average, too, due to the extra conditionals/table lookup (the argument for path constraints is often better accuracy, not speed, see e.g. here section 4). Just my experience. But it's potentially a moot point if we just want a single, reasonably fast DTW that supports most common use-cases; it just complicates things when we start talking about optimizations.

(FWIW, I see a few places that djitw could probably be made a bit faster using numpy primitives and np.ma, but that's beside the point...)

numba doesn't support ma (as I recall), and better use of numpy primitives/features does not always translate to faster numba code - it's designed to optimize the simplest/dumbest usage patterns (like explicitly iterating over a numpy array).

At any rate: are you two ( @stefan-balke and @craffel ) up for this? Github only lets me assign one person to an issue, so does one of you want to take the lead here?

I can start by adding my djitw code. I had been meaning to add support for arbitrary step sizes and a test suite, which are the main things it'd need to be included here, but had punted it until after my thesis. It may be smart to make this collaborative in some way (e.g. a WIP PR from a fork we can all contribute to), to avoid roadblocks due to me getting pulled away by thesis obligations.

@bmcfee
Copy link
Member

bmcfee commented Jan 28, 2016

It may be smart to make this collaborative in some way (e.g. a WIP PR from a fork we can all contribute to), to avoid roadblocks due to me getting pulled away by thesis obligations.

👍

@stefan-balke
Copy link
Member Author

Arbitrary step sizes means that we have to deal with the the edge cases a lot. We usually try to overcome this by appending infinity cols/rows. I'll try to have a look at @craffel code at the weekend and see how mergeable our two approaches are.

WIP PR

Maybe can @bmcfee create that one with the correct module naming.

@bmcfee
Copy link
Member

bmcfee commented Jan 28, 2016

Maybe can @bmcfee create that one with the correct module naming.

Nah, I trust you two to do it right. 🏆

@stefan-balke
Copy link
Member Author

I met with @craffel in Shanghai and started to work on this.

Here is the working branch:

https://github.com/stefan-balke/librosa/tree/feature-dtw

@stefan-balke
Copy link
Member Author

Any suggestions for unit-tests?

Until now, I have a really small toy example to test basic dynamic programming functionality.

For subsequence DTW, I would create something like this:

query = librosa.cqt(y)
noise_seq = np.random.rand(...)
db = np.c_[noise_seq, query, noise_seq]

The minimum of the matching-function is known and would serve as ground-truth.

Suggestions are welcome.

P.S.: @craffel still missing some features but when we have a "complete" dtw, I'll provide some benchmarks as we discussed. The question is if more conditionals influence the performance significantly.

@stefan-balke
Copy link
Member Author

@bmcfee time-stretching is a global thing at the moment, or?

Would be cool to hand-over a warping path to the time-stretcher and warp the audio though...

Jonathan did a nice review article recently about Time-Scale Modification:

https://www.semanticscholar.org/paper/A-Review-of-Time-Scale-Modification-of-Music-Driedger-Müller/0ac0fed7476412c7d85073e1cc2dd0923a1ce90f

@craffel
Copy link
Contributor

craffel commented Mar 27, 2016

Until now, I have a really small toy example to test basic dynamic programming functionality.

A simple toy example where you can compute the best path, for different parameter settings, by hand beforehand and hard-code it in seems smart to me.

For subsequence DTW, I would create something like this:

Yes, another option would be to synthesize a cycle of a sine wave and then synthesize a 1/2 cycle at some random phase, and then try to recover that phase + a linear path.

P.S.: @craffel still missing some features but when we have a "complete" dtw, I'll provide some benchmarks as we discussed. The question is if more conditionals influence the performance significantly.

Sweet.

@bmcfee
Copy link
Member

bmcfee commented Mar 28, 2016

@bmcfee time-stretching is a global thing at the moment, or?

Would be cool to hand-over a warping path to the time-stretcher and warp the audio though...

Currently, yes. My original re-implementation of phase vocoding was for the dynamic case though, and used to power this hack. In principle, it shouldn't be hard to extend the api to support this directly, though I had trouble coming up with a clean interface for it.

If you want to take a crack at it though, PRs are always welcome!

Yes, another option would be to synthesize a cycle of a sine wave and then synthesize a 1/2 cycle at some random phase, and then try to recover that phase + a linear path.

Can we make this simpler? How about a linear ramp, concatenated with noise of varying magnitude and duration?

@craffel
Copy link
Contributor

craffel commented Mar 28, 2016

Can we make this simpler? How about a linear ramp, concatenated with noise of varying magnitude and duration?

That doesn't sound any simpler to me, but either way seems fine.

@bmcfee
Copy link
Member

bmcfee commented Mar 28, 2016

That doesn't sound any simpler to me,

It's one fewer function (linspace instead of sin(tau * linspace)), and eliminates any weird off-by-one boundary effects.

@stefan-balke
Copy link
Member Author

Alright, will do so.

@stefan-balke stefan-balke mentioned this issue Apr 18, 2016
Merged
7 tasks
@craffel
Copy link
Contributor

craffel commented Apr 20, 2016

Some display functions.

While I am thinking of it, I have code for doing some DTW result plotting here:
https://github.com/craffel/midi-dataset/blob/e7092a1eed3921b8621c99710add031f6ef06997/alignment_analysis.py
I don't think I mentioned this elsewhere in this thread (can't find anywhere that I did). I'm going to be removing that script from that repository sometime soon, so I would be happy to have it live on in spirit in librosa.

@stefan-balke
Copy link
Member Author

While I am thinking of it, I have code for doing some DTW result plotting here:

Cool, I've something similar. Will make a 'BestOf' again.

@bmcfee bmcfee modified the milestone: 0.4.3 Apr 25, 2016
@bmcfee bmcfee modified the milestones: 0.4.3, 0.5 May 4, 2016
@stefan-balke
Copy link
Member Author

Closed via #323.

@bmcfee
Copy link
Member

bmcfee commented Jul 26, 2016

oops! I missed something in the CR...

@stefan-balke can you change the dist parameter to metric so that it matches scipy and the rest of librosa?

@bmcfee bmcfee reopened this Jul 26, 2016
@stefan-balke
Copy link
Member Author

Here you go...
#392

@stefan-balke
Copy link
Member Author

Wait, the squashing was stupid. I'll send a patch as PR.

@stefan-balke
Copy link
Member Author

#393 will fix it.

@craffel
Copy link
Contributor

craffel commented Jul 26, 2016

Nice! Looking forward to trying this. Sorry I did not help out in any tangible capacity.

@apiszcz
Copy link

apiszcz commented Aug 13, 2017

A preliminary look at performance, is there a benchmark of the various DTW implementations?

NOTE: This line in core\audio.py is causing an error, if I change the index from 2 to 0, I can import.
BW_BEST = resampy.filters.get_filter('kaiser_best')[2]
BW_FASTEST = resampy.filters.get_filter('kaiser_fast')[2]

x=np.random.randint(0,2,(1000,1))
y=np.random.randint(0,2,(1000,1))

image
x

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion Open-ended discussion for developers and users functionality Does this add new functionality?
Development

No branches or pull requests

5 participants