[WIP] YIN #974

lostanlen · 2019-08-20T04:02:30Z

Reference Issue

Fixes #527 (eventually)

What does this implement/fix? Explain your changes.

This is a comprehensive reimplementation of the YIN algorithm (de Cheveigné and Kawahara, JASA 2002). It's a complex and long-awaited new feature, so i imagine that it's going to require meticulous rounds of CR.

To write this function, i re-read the YIN paper as well as the MATLAB code:
https://github.com/lemonzi/matlab/blob/master/yin/yin2.m

Thankfully, this paper is very well written so it was easy to link it to relevant parts of the code.

Any other comments?

the code is in a relatively mature state, but it's still a work in progress. I haven't written the tests yet. I want to create a consensus in the API before i do.
i tried my best to follow existing naming conventions in the librosa API. Hence y, sr, frame_length, hop_length, fmin, fmax, and pad_mode.
However, threshold_1 and threshold_2 could use better names. Ideas appreciated.
Likewise for the booleans cumulative and interpolate. Do we even support them as kwargs? If so, i need to document them.
Perhaps the docstring should be longer and give some historical / methodological context around YIN? Contributions appreciated.
The parabolic interpolation step could potentially be faster by allocating less intermediate arrays. But maybe that's a premature optimization.
I did not implement part 6: "best local estimate". It would considerably slow down the code; de Cheveigné and Kawahara report a relatively small effect; and i doubt many people use it anyway. Long term, we would rather use pYIN rather then part 6.
This implementation works splendidly well as long as there are ten periods or more in the window. Below that number, it's quite numerically unstable. Could it be due to the boxcar window? de Cheveigné and Kawahara write that "there is [...] little reason not to use a square window", but i doubt it.
Likewise for padding. I am using pad_mode=reflect, but really, there is little hope of recovering the instantaneous frequency at the edges.
What should the tests be? Tests on synthetic tones or tests vs the MATLAB code?

bmcfee · 2019-08-20T12:01:04Z

😮 thanks for doing this! I'll give it some cycles later on today.

librosa/core/pitch.py

bmcfee

Looks pretty good so far; a few minor API requests.

One big question: we'll want to support pyin in the future, if not immediately in this PR. How can we set up the internals of this function to make that easy to do?

bmcfee · 2019-08-20T13:57:42Z

the code is in a relatively mature state, but it's still a work in progress. I haven't written the tests yet. I want to create a consensus in the API before i do.

Notes in my first CR. Generally, I think it's better if we can make the API and default parameters internally consistent with the rest of the library. If we want to show how to recover a good approximation to the original paper / implementation, that can be done in a notebook or docstring example.

i tried my best to follow existing naming conventions in the librosa API. Hence y, sr, frame_length, hop_length, fmin, fmax, and pad_mode.

However, threshold_1 and threshold_2 could use better names. Ideas appreciated.

peak_threshold and periodicity_threshold? A bit wordy, but they get the point across.

Likewise for the booleans cumulative and interpolate. Do we even support them as kwargs? If so, i need to document them.

I don't have strong opinions on this one, but it doesn't seem to hurt to include them.

Perhaps the docstring should be longer and give some historical / methodological context around YIN? Contributions appreciated.

Let's hold off on that for now.

The parabolic interpolation step could potentially be faster by allocating less intermediate arrays. But maybe that's a premature optimization.

I'm wondering if this could be simplified to use scipy's polynomial interpolation instead? It might take a little bit of clever array shaping, but it would be preferable to having a bunch of grungy interpolation code IMO.

I did not implement part 6: "best local estimate". It would considerably slow down the code; de Cheveigné and Kawahara report a relatively small effect; and i doubt many people use it anyway. Long term, we would rather use pYIN rather then part 6.

Indeed. Our API plan should make it easy to extend to pyin.

This implementation works splendidly well as long as there are ten periods or more in the window. Below that number, it's quite numerically unstable. Could it be due to the boxcar window? de Cheveigné and Kawahara write that "there is [...] little reason not to use a square window", but i doubt it.

We should definitely support user-specified windows. Ditto for padding modes. Edge behavior is not well-defined anyway, so I'm not going to worry about it.

Likewise for padding. I am using pad_mode=reflect, but really, there is little hope of recovering the instantaneous frequency at the edges.

What should the tests be? Tests on synthetic tones or tests vs the MATLAB code?

I think synthetic tones would make sense here.

lostanlen · 2019-09-03T11:08:44Z

In the interest of releasing v0.7.1 soon, i suggest we bump this to the v0.8 milestone?

bmcfee · 2019-09-03T11:28:02Z

I'm okay with that.

lostanlen · 2019-11-13T10:34:06Z

Looking at this again in the wake of ISMIR. I will address the CR, add tests, and reconvene.

lostanlen · 2020-01-21T15:14:30Z

superseded by #1063

implement yin fundamental frequency estimator

a786019

bmcfee added the functionality Does this add new functionality? label Aug 20, 2019

bmcfee added this to the 0.7.1 milestone Aug 20, 2019

bmcfee assigned lostanlen Aug 20, 2019

bmcfee self-requested a review August 20, 2019 13:42