New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] Bug fix and new feature: fix implementation of average precision score and add eleven-point interpolated option (7356 rebased) #9017

Closed
wants to merge 3 commits into
from

Conversation

Projects
None yet
5 participants
@GaelVaroquaux
Member

GaelVaroquaux commented Jun 6, 2017

Rebased version of #7356

Fixes #4577 and #6377

What does this implement/fix? Explain your changes.

This adds an optional interpolation parameter to both average_precision_score. By default, the value is set to None which replicates the existing behavior, but there is also a 'eleven_point' option that implements the strategy described in Stanford's Introduction to Information Retrieval.

Reference Issue

What does this implement/fix? Explain your changes.

Any other comments?

@GaelVaroquaux

This comment has been minimized.

Show comment
Hide comment
@GaelVaroquaux

GaelVaroquaux Jun 6, 2017

Member

I need to address the pending comments of #7356 and make the example pretty again, as it was rendered a bit horrible by the merge to master

Member

GaelVaroquaux commented Jun 6, 2017

I need to address the pending comments of #7356 and make the example pretty again, as it was rendered a bit horrible by the merge to master

@GaelVaroquaux

This comment has been minimized.

Show comment
Hide comment
@GaelVaroquaux

GaelVaroquaux Jun 6, 2017

Member

I reordered a bit the example (no major change though). I find it more readable on the html output.

Member

GaelVaroquaux commented Jun 6, 2017

I reordered a bit the example (no major change though). I find it more readable on the html output.

@GaelVaroquaux

This comment has been minimized.

Show comment
Hide comment
@GaelVaroquaux

GaelVaroquaux Jun 7, 2017

Member

After a lot of back and worth, I have finally convinced myself that I trust this implementation of average_precision_score. I was worried about dealing with duplicates. So I added a stringent test.

This is ready for merge.

Can I have a review, @agramfort, @vene, @amueller

Member

GaelVaroquaux commented Jun 7, 2017

After a lot of back and worth, I have finally convinced myself that I trust this implementation of average_precision_score. I was worried about dealing with duplicates. So I added a stringent test.

This is ready for merge.

Can I have a review, @agramfort, @vene, @amueller

Show outdated Hide outdated doc/whats_new.rst
@@ -6,6 +6,35 @@ Release history
===============
Version 0.19
Version 0.18.2

This comment has been minimized.

@amueller

amueller Jun 8, 2017

Member

rebase issue you need to move this to the stuff below

@amueller

amueller Jun 8, 2017

Member

rebase issue you need to move this to the stuff below

Show outdated Hide outdated examples/model_selection/plot_precision_recall.py
possible with a recall value at least equal to the target value.
The most common choice is 'eleven point' interpolated precision, where the
desired recall values are [0, 0.1, 0.2, ..., 1.0]. This is the metric used in
`The PASCAL Visual Object Classes (VOC) Challenge <http://citeseerx.ist.psu.edu

This comment has been minimized.

@amueller

amueller Jun 8, 2017

Member

Maybe this metrics was? sometimes? Or just remove that sentence?

@amueller

amueller Jun 8, 2017

Member

Maybe this metrics was? sometimes? Or just remove that sentence?

Show outdated Hide outdated sklearn/metrics/ranking.py
For each of the recall values, r, in {0, 0.1, 0.2, ..., 1.0},
compute the arithmetic mean of the first precision value with a
corresponding recall >= r. This is the metric used in the Pascal
Visual Objects Classes (VOC) Challenge and is as described in the

This comment has been minimized.

@amueller

amueller Jun 8, 2017

Member

Maybe not mention Pascal VOC?

@amueller

amueller Jun 8, 2017

Member

Maybe not mention Pascal VOC?

This comment has been minimized.

@GaelVaroquaux

GaelVaroquaux Jun 8, 2017

Member

Well, they describe it in the docs, though don't use it in the code. Maybe we should mention the docs.

@GaelVaroquaux

GaelVaroquaux Jun 8, 2017

Member

Well, they describe it in the docs, though don't use it in the code. Maybe we should mention the docs.

This comment has been minimized.

@amueller

amueller Jun 8, 2017

Member

yeah I don't know... but the current text could be misleading.

@amueller

amueller Jun 8, 2017

Member

yeah I don't know... but the current text could be misleading.

Show outdated Hide outdated sklearn/metrics/tests/test_ranking.py
def _interpolated_average_precision_slow(y_true, y_score):
"""A second implementation for the eleven-point interpolated average
precision used by Pascal VOC. This should produce identical results to

This comment has been minimized.

@amueller

amueller Jun 8, 2017

Member

descripted in ir-book?

@amueller

amueller Jun 8, 2017

Member

descripted in ir-book?

@amueller

LGTM, gonna check out the example now

Show outdated Hide outdated sklearn/metrics/tests/test_ranking.py
precision_recall_auc = _average_precision_slow(y_true, probas_pred)
interpolated_average_precision = _interpolated_average_precision_slow(
y_true, probas_pred)
assert_array_almost_equal(precision_recall_auc, 0.859, 3)

This comment has been minimized.

@amueller

amueller Jun 8, 2017

Member

where does that number come from

@amueller

amueller Jun 8, 2017

Member

where does that number come from

This comment has been minimized.

@GaelVaroquaux

GaelVaroquaux Jun 8, 2017

Member

Agreed. Ping @ndingwall: we don't like to have hard-coded numbers in our test suites without being able to rederive them.

@GaelVaroquaux

GaelVaroquaux Jun 8, 2017

Member

Agreed. Ping @ndingwall: we don't like to have hard-coded numbers in our test suites without being able to rederive them.

Show outdated Hide outdated sklearn/metrics/tests/test_ranking.py
assert_equal(p.size, r.size)
assert_equal(p.size, thresholds.size + 1)
# Smoke test in the case of proba having only one value
p, r, thresholds = precision_recall_curve(y_true,
np.zeros_like(probas_pred))
precision_recall_auc = auc(r, p)
assert_array_almost_equal(precision_recall_auc, 0.75, 3)
assert_array_almost_equal(precision_recall_auc, 0.75)

This comment has been minimized.

@amueller

amueller Jun 8, 2017

Member

where does that number come from?

@amueller

amueller Jun 8, 2017

Member

where does that number come from?

This comment has been minimized.

@GaelVaroquaux

GaelVaroquaux Jun 8, 2017

Member

This number is actually meaningless, as the way above to compute the AUC of the PR curve is not the right way: it does not cater for edge effects. On a constant prediction, the average precision score is the TPR, here .5.

I am removing these lines. Anyhow, I added a correct test (which passes) at a different location.

@GaelVaroquaux

GaelVaroquaux Jun 8, 2017

Member

This number is actually meaningless, as the way above to compute the AUC of the PR curve is not the right way: it does not cater for edge effects. On a constant prediction, the average precision score is the TPR, here .5.

I am removing these lines. Anyhow, I added a correct test (which passes) at a different location.

Show outdated Hide outdated sklearn/metrics/tests/test_ranking.py
@@ -510,15 +552,15 @@ def test_precision_recall_curve_toydata():
auc_prc = average_precision_score(y_true, y_score)
assert_array_almost_equal(p, [0.5, 0., 1.])
assert_array_almost_equal(r, [1., 0., 0.])
assert_almost_equal(auc_prc, 0.25)
assert_almost_equal(auc_prc, 0.5)

This comment has been minimized.

@amueller

amueller Jun 8, 2017

Member

maybe add a short explanation? Why is this one right and the other one wasn't?

@amueller

amueller Jun 8, 2017

Member

maybe add a short explanation? Why is this one right and the other one wasn't?

This comment has been minimized.

@amueller

amueller Jun 9, 2017

Member

explanation still missing ;)

@amueller

amueller Jun 9, 2017

Member

explanation still missing ;)

Show outdated Hide outdated sklearn/metrics/tests/test_ranking.py
def test_average_precision_constant_values():
# Check the average_precision_score of a constant predictor is
# the tps

This comment has been minimized.

@amueller

amueller Jun 8, 2017

Member

tps? tpr?

@amueller

amueller Jun 8, 2017

Member

tps? tpr?

This comment has been minimized.

@amueller

amueller Jun 9, 2017

Member

not addressed

@amueller

amueller Jun 9, 2017

Member

not addressed

y_true[::4] = 1
# And a constant score
y_score = np.ones(100)
# The precision is then the fraction of positive whatever the recall

This comment has been minimized.

@amueller

amueller Jun 8, 2017

Member

"for all thresholds and recall values"?

@amueller

amueller Jun 8, 2017

Member

"for all thresholds and recall values"?

This comment has been minimized.

@GaelVaroquaux

GaelVaroquaux Jun 8, 2017

Member

No, because there is only one threshold.

@GaelVaroquaux

GaelVaroquaux Jun 8, 2017

Member

No, because there is only one threshold.

Show outdated Hide outdated doc/whats_new.rst
............
- Added a `'eleven-point'` interpolated average precision option to
:func:`metrics.ranking.average_precision_score` as used in the `PASCAL

This comment has been minimized.

@amueller

amueller Jun 8, 2017

Member

is it?

@amueller

amueller Jun 8, 2017

Member

is it?

Show outdated Hide outdated examples/model_selection/plot_precision_recall.py
plt.xlabel('Recall')
plt.ylabel('Precision')
plt.ylim([0.0, 1.05])
plt.xlim([0.0, 1.0])
plt.title('Precision-Recall example: AUC={0:0.2f}'.format(average_precision[0]))
plt.title('Precision-Recall example: AUC={0:0.2f}'.format(
average_precision["micro"]))
plt.legend(loc="lower left")
plt.show()

This comment has been minimized.

@amueller

amueller Jun 8, 2017

Member

I'd remove the show so both pop up at once.

@amueller

amueller Jun 8, 2017

Member

I'd remove the show so both pop up at once.

y_score.ravel())
average_precision["micro"] = average_precision_score(y_test, y_score,
average_precision["micro"] = average_precision_score(Y_test, y_score,

This comment has been minimized.

@amueller

amueller Jun 8, 2017

Member

Is this standard? I'm not sure I understand what that means. Maybe do binary for this example?

@amueller

amueller Jun 8, 2017

Member

Is this standard? I'm not sure I understand what that means. Maybe do binary for this example?

This comment has been minimized.

@amueller

amueller Jun 8, 2017

Member

So I think multi-label is not that common, and I'm not sure this is a common strategy for multi-class. as a matter of fact, we don't have multi-class average precision (there is a PR for multi-class AUC and there is a bunch of literature on that!).
So I vote going binary here.

@amueller

amueller Jun 8, 2017

Member

So I think multi-label is not that common, and I'm not sure this is a common strategy for multi-class. as a matter of fact, we don't have multi-class average precision (there is a PR for multi-class AUC and there is a bunch of literature on that!).
So I vote going binary here.

This comment has been minimized.

@GaelVaroquaux

GaelVaroquaux Jun 8, 2017

Member

I'll do a binary first, and then illustrate micro

@GaelVaroquaux

GaelVaroquaux Jun 8, 2017

Member

I'll do a binary first, and then illustrate micro

This comment has been minimized.

@vene

vene Jun 9, 2017

Member

I have certainly seen this before in NLP even for multiclass.

@vene

vene Jun 9, 2017

Member

I have certainly seen this before in NLP even for multiclass.

Show outdated Hide outdated sklearn/metrics/ranking.py
"""Compute average precision (AP) from prediction scores
This score corresponds to the area under the precision-recall curve.
Optionally, this will compute an eleven-point interpolated average

This comment has been minimized.

@amueller

amueller Jun 8, 2017

Member

trailing whitespaces

@amueller

amueller Jun 8, 2017

Member

trailing whitespaces

@GaelVaroquaux

This comment has been minimized.

Show comment
Hide comment
@GaelVaroquaux

GaelVaroquaux Jun 9, 2017

Member

All green. Can I haz reviews / mrg ?

Member

GaelVaroquaux commented Jun 9, 2017

All green. Can I haz reviews / mrg ?

Show outdated Hide outdated doc/whats_new.rst
@@ -6,6 +6,35 @@ Release history
===============
Version 0.19
==============

This comment has been minimized.

@amueller

amueller Jun 9, 2017

Member

Still rebase issue

@amueller

amueller Jun 9, 2017

Member

Still rebase issue

@amueller

Mostly nitpicks. I think it would be good to have an explanation in the tests and the whatsnew needs to be fixed, but I don't want to delay this because of minor issues in the example.

Show outdated Hide outdated doc/whats_new.rst
by the change in recall since the last operating point, as per the
`Wikipedia entry <http://en.wikipedia.org/wiki/Average_precision>`_.
(`#7356 <https://github.com/scikit-learn/scikit-learn/pull/7356>`_). By
`Nick Dingwall`_.

This comment has been minimized.

low false negative rate. High scores for both show that the classifier is
returning accurate results (high precision), as well as returning a majority of
all positive results (high recall).
Precision-Recall is a useful measure of success of prediction when the

This comment has been minimized.

@amueller

amueller Jun 9, 2017

Member

Precision and recall? Or the precision-recall-curve?

@amueller

amueller Jun 9, 2017

Member

Precision and recall? Or the precision-recall-curve?

This comment has been minimized.

@vene

vene Jun 9, 2017

Member

yeah, and I would change the example title too: "Using precision and recall for classifier evaluation" or something. It's very awkward now.

And maybe replace below "very imbalanced" by "imbalanced"?

@vene

vene Jun 9, 2017

Member

yeah, and I would change the example title too: "Using precision and recall for classifier evaluation" or something. It's very awkward now.

And maybe replace below "very imbalanced" by "imbalanced"?

relevant results are returned.
The precision-recall curve shows the tradeoff between precision and
recall for different threshold. A high area under the curve represents

This comment has been minimized.

@amueller

amueller Jun 9, 2017

Member

decision thresholds?

@amueller

amueller Jun 9, 2017

Member

decision thresholds?

This comment has been minimized.

@vene

vene Jun 9, 2017

Member

high area -> large area?

@vene

vene Jun 9, 2017

Member

high area -> large area?

Show outdated Hide outdated examples/model_selection/plot_precision_recall.py
random_state=random_state)
# Create a simple classifier
classifier = svm.SVC(kernel='linear', probability=True,

This comment has been minimized.

@amueller

amueller Jun 9, 2017

Member

Why????? How about LogisticRegression? Or not using probability=True? We don't need probabilities for the precision-recall-curve and many people are not aware of that. So maybe using SVC(linear) is actually good, but we shouldn't use probability=True. What's the reason not to use LinearSVC()?

@amueller

amueller Jun 9, 2017

Member

Why????? How about LogisticRegression? Or not using probability=True? We don't need probabilities for the precision-recall-curve and many people are not aware of that. So maybe using SVC(linear) is actually good, but we shouldn't use probability=True. What's the reason not to use LinearSVC()?

This comment has been minimized.

@amueller

amueller Jun 9, 2017

Member

We're not even using predict_proba, right?

@amueller

amueller Jun 9, 2017

Member

We're not even using predict_proba, right?

This comment has been minimized.

@GaelVaroquaux

GaelVaroquaux Jun 9, 2017

Member

Aaaaaaaaaargh!

@GaelVaroquaux

GaelVaroquaux Jun 9, 2017

Member

Aaaaaaaaaargh!

Show outdated Hide outdated examples/model_selection/plot_precision_recall.py
random_state=random_state)
# We use OneVsRestClassifier for multi-label prediction
from sklearn.multiclass import OneVsRestClassifier
# Run classifier
classifier = OneVsRestClassifier(svm.SVC(kernel='linear', probability=True,

This comment has been minimized.

@amueller

amueller Jun 9, 2017

Member

again, why probability=True, why SVC(linear) instead of LinearSVC

@amueller

amueller Jun 9, 2017

Member

again, why probability=True, why SVC(linear) instead of LinearSVC

This comment has been minimized.

Show outdated Hide outdated examples/model_selection/plot_precision_recall.py
print("Target recall Selected recall Precision")
for i in range(11):
print(" >= {} {:3.3f} {:3.3f}".format(i / 10,

This comment has been minimized.

@amueller

amueller Jun 9, 2017

Member

Nitpick: if we're using format strings, maybe we should, you know, actually use them for the paddding?

@amueller

amueller Jun 9, 2017

Member

Nitpick: if we're using format strings, maybe we should, you know, actually use them for the paddding?

Show outdated Hide outdated examples/model_selection/plot_precision_recall.py
plt.fill_between(recall[iris_cls], precision[iris_cls], step='post', alpha=0.1,
color='g')
for i in range(11):
plt.annotate('',

This comment has been minimized.

@amueller

amueller Jun 9, 2017

Member

If you put the '' on the next line you have more space ;)

@amueller

amueller Jun 9, 2017

Member

If you put the '' on the next line you have more space ;)

This comment has been minimized.

@GaelVaroquaux

GaelVaroquaux Jun 9, 2017

Member

Yeah, but I don't feel it improves things.

@GaelVaroquaux

GaelVaroquaux Jun 9, 2017

Member

Yeah, but I don't feel it improves things.

Show outdated Hide outdated sklearn/metrics/tests/test_ranking.py
@@ -510,15 +552,15 @@ def test_precision_recall_curve_toydata():
auc_prc = average_precision_score(y_true, y_score)
assert_array_almost_equal(p, [0.5, 0., 1.])
assert_array_almost_equal(r, [1., 0., 0.])
assert_almost_equal(auc_prc, 0.25)
assert_almost_equal(auc_prc, 0.5)

This comment has been minimized.

@amueller

amueller Jun 9, 2017

Member

explanation still missing ;)

@amueller

amueller Jun 9, 2017

Member

explanation still missing ;)

Show outdated Hide outdated sklearn/metrics/tests/test_ranking.py
def test_average_precision_constant_values():
# Check the average_precision_score of a constant predictor is
# the tps

This comment has been minimized.

@amueller

amueller Jun 9, 2017

Member

not addressed

@amueller

amueller Jun 9, 2017

Member

not addressed

where :math:`P_n` and :math:`R_n` are the precision and recall at the
nth threshold. A pair :math:`(R_k, P_k)` is referred to as an
*operating point*.

This comment has been minimized.

@amueller

amueller Jun 9, 2017

Member

Maybe mention that this is related to the integral under the curve but there are subtle differences?

@amueller

amueller Jun 9, 2017

Member

Maybe mention that this is related to the integral under the curve but there are subtle differences?

@GaelVaroquaux

This comment has been minimized.

Show comment
Hide comment
@GaelVaroquaux

GaelVaroquaux Jun 9, 2017

Member

Addressd all @amueller 's comments

Member

GaelVaroquaux commented Jun 9, 2017

Addressd all @amueller 's comments

@GaelVaroquaux

This comment has been minimized.

Show comment
Hide comment
Member

GaelVaroquaux commented Jun 9, 2017

@amueller @vene : 👍 ?

Show outdated Hide outdated doc/whats_new.rst
- Added a `'eleven-point'` interpolated average precision option to
:func:`metrics.ranking.average_precision_score` as described in the
`PASCAL
Visual Object Classes (VOC) Challenge <http://citeseerx.ist.psu.edu/viewdoc/

This comment has been minimized.

@vene

vene Jun 9, 2017

Member

I guess line wrapping here is weird?

@vene

vene Jun 9, 2017

Member

I guess line wrapping here is weird?

Show outdated Hide outdated doc/whats_new.rst
@@ -193,6 +201,13 @@ Enhancements
Bug fixes
.........
- :func:`metrics.ranking.average_precision_score` no longer linearly
interpolates between operating points, and instead weights precisions

This comment has been minimized.

@vene

vene Jun 9, 2017

Member

weighs ?

@vene

vene Jun 9, 2017

Member

weighs ?

Show outdated Hide outdated examples/model_selection/plot_precision_recall.py
Precision-recall curves are typically used in binary classification to study
the output of a classifier. In order to extend Precision-recall curve and
the output of a classifier. In order to extend the Precision-recall curve and

This comment has been minimized.

@vene

vene Jun 9, 2017

Member

lowercase p

@vene

vene Jun 9, 2017

Member

lowercase p

Show outdated Hide outdated examples/model_selection/plot_precision_recall.py
# In multi-label settings
# ------------------------
#
# Create multli-label data, fit, and predict

This comment has been minimized.

@vene

vene Jun 9, 2017

Member

multi

@vene

vene Jun 9, 2017

Member

multi

from sklearn.preprocessing import label_binarize
# Use label_binarize to be multi-label like settings

This comment has been minimized.

@vene

vene Jun 9, 2017

Member

wat 😛

@vene

vene Jun 9, 2017

Member

wat 😛

This comment has been minimized.

@GaelVaroquaux

GaelVaroquaux Jun 9, 2017

Member

yeah :/. I don't like this part of the example, but that's what we have.

@GaelVaroquaux

GaelVaroquaux Jun 9, 2017

Member

yeah :/. I don't like this part of the example, but that's what we have.

This comment has been minimized.

@vene

vene Jun 9, 2017

Member

i just mean i don't understand what the comment is saying

@vene

vene Jun 9, 2017

Member

i just mean i don't understand what the comment is saying

random_state=random_state)
# We use OneVsRestClassifier for multi-label prediction
from sklearn.multiclass import OneVsRestClassifier

This comment has been minimized.

@vene

vene Jun 9, 2017

Member

newline after

@vene

vene Jun 9, 2017

Member

newline after

Show outdated Hide outdated examples/model_selection/plot_precision_recall.py
# Compute Precision-Recall and plot curve
###############################################################################
# The precision-Recall score in multi-label settings

This comment has been minimized.

@vene

vene Jun 9, 2017

Member

inconsistent capitalization. I'd suggest precision-recall everywhere.

I'm very confused now about what "the precision-recall score" is. Is it the same thing as average precision? I've never heard the former before. (just as two distinct scores.) The narrative at the top doesn't make it clear

@vene

vene Jun 9, 2017

Member

inconsistent capitalization. I'd suggest precision-recall everywhere.

I'm very confused now about what "the precision-recall score" is. Is it the same thing as average precision? I've never heard the former before. (just as two distinct scores.) The narrative at the top doesn't make it clear

Show outdated Hide outdated examples/model_selection/plot_precision_recall.py
plt.title('Precision-Recall example: AUC={0:0.2f}'.format(average_precision[0]))
plt.legend(loc="lower left")
plt.show()
plt.title('Precision-Recall micro-averaged over all classes: AUC={0:0.2f}'

This comment has been minimized.

@vene

vene Jun 9, 2017

Member

same confusion. Maybe "precision and recall micro-averaged..."?

@vene

vene Jun 9, 2017

Member

same confusion. Maybe "precision and recall micro-averaged..."?

This comment has been minimized.

@GaelVaroquaux

GaelVaroquaux Jun 9, 2017

Member

Or "micro average precision score"?

@GaelVaroquaux

GaelVaroquaux Jun 9, 2017

Member

Or "micro average precision score"?

Show outdated Hide outdated examples/model_selection/plot_precision_recall.py
# ------------------------------
#
# In *interpolated* average precision, a set of desired recall values is
# specified and for each desired value, we average the best precision

This comment has been minimized.

@vene

vene Jun 9, 2017

Member

i'd move the comma to after the "and" on this line

@vene

vene Jun 9, 2017

Member

i'd move the comma to after the "and" on this line

Show outdated Hide outdated sklearn/metrics/ranking.py
"""Compute average precision (AP) from prediction scores
This score corresponds to the area under the precision-recall curve.
Optionally, this will compute an eleven-point interpolated average
precision score: for each of the 11 evenly-spaced target recall values

This comment has been minimized.

@vene

vene Jun 9, 2017

Member

maybe say eleven in words to make it easier to spot the connection to the interpolation arg below

@vene

vene Jun 9, 2017

Member

maybe say eleven in words to make it easier to spot the connection to the interpolation arg below

Show outdated Hide outdated sklearn/metrics/ranking.py
precision, recall, thresholds = precision_recall_curve(
y_true, y_score, sample_weight=sample_weight)
return auc(recall, precision)
# Return the step function integral
return -np.sum(np.diff(recall) * np.array(precision)[:-1])

This comment has been minimized.

@vene

vene Jun 9, 2017

Member

I think this works because the last entry of precision is guaranteed to be 1, as written in the docstring of precision_recall_curve; do you think this warrants a note in a comment here, for posterity?

@vene

vene Jun 9, 2017

Member

I think this works because the last entry of precision is guaranteed to be 1, as written in the docstring of precision_recall_curve; do you think this warrants a note in a comment here, for posterity?

This comment has been minimized.

@GaelVaroquaux

GaelVaroquaux Jun 9, 2017

Member

Agreed

Show outdated Hide outdated sklearn/metrics/tests/test_ranking.py
precision = list(reversed(precision))
recall = list(reversed(recall))
indices = np.searchsorted(recall, np.arange(0, 1.1, 0.1))
return np.mean([max(precision[i:]) for i in indices])

This comment has been minimized.

@vene

vene Jun 9, 2017

Member

confused: other than using lists instead of arrays, and other than not ignoring the first p-r pair with p=1.0, this seems identical to the actual implementation in this pr, what am I missing?

@vene

vene Jun 9, 2017

Member

confused: other than using lists instead of arrays, and other than not ignoring the first p-r pair with p=1.0, this seems identical to the actual implementation in this pr, what am I missing?

This comment has been minimized.

@GaelVaroquaux

GaelVaroquaux Jun 9, 2017

Member

Agreed.

Should we remove this test?

@GaelVaroquaux

GaelVaroquaux Jun 9, 2017

Member

Agreed.

Should we remove this test?

Show outdated Hide outdated sklearn/metrics/tests/test_ranking.py
assert_almost_equal(auc_prc, 0.25)
# Here we are doing a terrible prediction: we are always getting
# it wrong, hence the average_precision_score is the accuracy at
# change: 50%

This comment has been minimized.

@vene

vene Jun 9, 2017

Member

chance

@vene

vene Jun 9, 2017

Member

chance

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Jun 9, 2017

Member

LGTM, I haven't recalculated the test results though.

Member

amueller commented Jun 9, 2017

LGTM, I haven't recalculated the test results though.

DOC: Simpler precision-recall example, remove 11pt
Remove the eleven average precision score

Add better tests.
@GaelVaroquaux

This comment has been minimized.

Show comment
Hide comment
@GaelVaroquaux

GaelVaroquaux Jun 9, 2017

Member

After discussion with @vene, I have removed the eleven-point average precision from this PR. Sorry @ndingwall we feel that get the fixes of this PR merged in is very important, but that we are not sure that we can warrant correctness the eleven point variant.

I will issue a new PR so that the work does not gets lost, and @ndingwall can take over.

Member

GaelVaroquaux commented Jun 9, 2017

After discussion with @vene, I have removed the eleven-point average precision from this PR. Sorry @ndingwall we feel that get the fixes of this PR merged in is very important, but that we are not sure that we can warrant correctness the eleven point variant.

I will issue a new PR so that the work does not gets lost, and @ndingwall can take over.

Show outdated Hide outdated doc/whats_new.rst
@@ -6,7 +6,7 @@ Release history
===============
Version 0.19
============
==============

This comment has been minimized.

@vene

vene Jun 9, 2017

Member

what's up here?

@vene

vene Jun 9, 2017

Member

what's up here?

@GaelVaroquaux

This comment has been minimized.

Show comment
Hide comment
Member

GaelVaroquaux commented Jun 9, 2017

@vene: fixed

@GaelVaroquaux

This comment has been minimized.

Show comment
Hide comment
@GaelVaroquaux

GaelVaroquaux Jun 9, 2017

Member

@amueller and @vene gave me 👍 at the bar. Merging

Member

GaelVaroquaux commented Jun 9, 2017

@amueller and @vene gave me 👍 at the bar. Merging

@GaelVaroquaux

This comment has been minimized.

Show comment
Hide comment
@GaelVaroquaux

GaelVaroquaux Jun 9, 2017

Member

Merged

Member

GaelVaroquaux commented Jun 9, 2017

Merged

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Jun 10, 2017

Member

I think this needs more of a loud warning in changelog

Member

jnothman commented Jun 10, 2017

I think this needs more of a loud warning in changelog

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment