Measures for over- and under-segmentation of chord labels#263
Measures for over- and under-segmentation of chord labels#263craffel merged 5 commits intomir-evaluation:masterfrom
Conversation
| seg = 0. | ||
| for start, end in reference_intervals: | ||
| dur = end - start | ||
| between_start_end = est_ts[(est_ts > start) & (est_ts < end)] |
There was a problem hiding this comment.
Should only one of these comparisons be strict?
| directional hamming distance between reference intervals and | ||
| estimated intervals. | ||
| """ | ||
| est_ts = np.unique(estimated_intervals.flatten()) |
There was a problem hiding this comment.
Seems like this should have an input validation
There was a problem hiding this comment.
Something like if estimated_intervals.shape[1] != 2 or should the check go deeper and ensure e.g. that there are no overlaps in the intervals, etc.?
There was a problem hiding this comment.
Calling util.validate_intervals would probably suffice. It doesn't check for disjoint/complete segmentation (and maybe it should), but it does cover the basics of shape matching and valid interval timings.
bmcfee
left a comment
There was a problem hiding this comment.
The implementation looks good, thanks! A few minor comments about validation and edge cases.
Also, it looks like there are no unit tests for the new functions?
|
Right, I missed that - somehow I thought having numbers in the |
|
Thanks for reviewing @bmcfee , I am slammed right now so I can give it a final look-over once things are squared |
|
So, I think I addressed all concerns @bmcfee pointed out, so from my side, this PR is ready to merge. Let me know if there is anything else I can improve! |
| between_start_end = est_ts[(est_ts >= start) & (est_ts < end)] | ||
| seg_ts = np.hstack([start, between_start_end, end]) | ||
| seg += dur - np.diff(seg_ts).max() | ||
| return seg / (reference_intervals[-1, 1] - reference_intervals[0, 0]) |
There was a problem hiding this comment.
possible div-by-0 here, since validate_intervals doesn't check for non-emptiness. Maybe add an explicit check here, and document the behavior if reference intervals are empty.
There was a problem hiding this comment.
I think it does implicitly in the following check (util.py:770):
if intervals.ndim != 2 or intervals.shape[1] != 2:
raise ValueError('Intervals should be n-by-2 numpy ndarray, '
'but shape={}'.format(intervals.shape))Here, np.atleast_2d(np.array([])).shape[1] == 0, and thus the ValueError will be raised. So, at least one interval must be present, and it must have a positive duration, and thus reference_intervals[-1, 1] - reference_intervals[0, 0] must be positive.
There was a problem hiding this comment.
Ah, quite correct, thanks. Sorry I missed that before.
LGTM!
|
Great. I am about to travel to ISMIR but should get a chance to do a final pass over in the next few days. Thanks! |
craffel
left a comment
There was a problem hiding this comment.
Minor documentation question, otherwise looks good.
| ... ref_intervals, est_intervals) | ||
| >>> underseg = 1 - mir_eval.chord.directional_hamming_distance( | ||
| ... est_intervals, ref_intervals) | ||
| >>> meanseg = min(overseg, underseg) |
There was a problem hiding this comment.
I'm confused by this line --
- It's called "meanseg" but is computed as the min.
- Is this a thing people measure? If so should we have a separate metric function for it?
There was a problem hiding this comment.
I agree that the naming is weird, but I wanted to be consistent with MIREX (see http://www.music-ir.org/mirex/wiki/2017:Audio_Chord_Estimation_Results).
I'm not sure if it is meaningful to have this as a separate metric for this for each song (it is, after all, just min(overseg, underseg)). However, it might then be easier to recreate the metrics as used in the MIREX task, where "MeanSeg" means the mean of the min(os, us) for each song. In this case, I would prefer to call it just something like "Segmentation".
What do you think?
There was a problem hiding this comment.
So they are calling it "mean" because it's the mean of the min across all songs? I guess that bothers me because all of the metrics are mean across songs, correct? In terms of what I think, I honestly don't know in this case! But I will say typically it's ok to have an extra metric function if it's something that people measure, even if it's oneline. The idea being that evaluate returns everything that you might want to measure to report your results.
There was a problem hiding this comment.
Sure, let's add a function for it, then! The only negative now is that underseg() and overseg() get computed twice when calling evaluate(). If this is a problem, I can replace scores['seg'] = seg(...) with scores['seg'] = min(scores['underseg'], scores['overseg']).
There was a problem hiding this comment.
If this is a problem, I can replace scores['seg'] = seg(...) with scores['seg'] = min(scores['underseg'], scores['overseg']).
This seems like the best option. (and sorry for the delay)
There was a problem hiding this comment.
No worries! Should be done now. If there is anything I can do to further improve this PR, let me know.
| estimated intervals as defined by [#harte2010towards]_ and used for MIREX | ||
| 'OverSeg', 'UnderSeg' and 'MeanSeg' measures. | ||
|
|
||
| Examples |
There was a problem hiding this comment.
Technically, since this isn't a metric function, you don't need an "Examples" section, but it certainly doesn't hurt :)
fixed chord interval overlap check; added tests for chord interval overlap check.
|
Merged! Thanks so much! |
* implemented over- and under-segmentation measuers * added function to merge intervals; added tests * added more tests and input validation for directional hamming distance * added function for "segmentation" measure * fixed computation of 'segmentation' measure for chord evaluation; fixed chord interval overlap check; added tests for chord interval overlap check.
An attempt for addressing #262.
MeanSegmetric is not implemented, because it is the dataset average of min(overseg, underseg). I don't think it's meaningful to add another field to the scores that just returns the minimum of the two.Let me know what you think.