Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error in average_precision_score #5379

Closed
tomsvds opened this issue Oct 9, 2015 · 18 comments
Closed

error in average_precision_score #5379

tomsvds opened this issue Oct 9, 2015 · 18 comments

Comments

@tomsvds
Copy link

tomsvds commented Oct 9, 2015

I believe there is an error in sklearn.metrics.average_precision_score. Here is a script to show the result:

from sklearn.metrics import average_precision_score

y_true = [
       1, 1,
       1, 1,
       1, 1,
       1, -1,      # Single negative here, pos 8
       1, 1
       ]
y_score = list(range(len(y_true)))

for average in ('micro', 'macro', 'weighted', 'samples'):
    print( "Average:", average)
    print(average_precision_score(y_true, y_score, average=average))

What we have are essentially ten documents, nine of which are Positive. The
example at position eight is negative. According to my understanding (and
Wikipedia, see https://en.wikipedia.org/wiki/Information_retrieval#Average_precision
third equation for AveP down the screen), the average precision score should be:

| Pos | Value | Precision |                                 
| === |   === | ========  |                                 
|   1 |     1 | 1/1       |                                 
|   2 |     1 | 2/2       |                                 
|   3 |     1 | 3/3       |                                 
|   4 |     1 | 4/4       |                                 
|   5 |     1 | 5/5       |                                 
|   6 |     1 | 6/6       |                                 
|   7 |     1 | 7/7       |                                 
|   8 |     0 | 7/8       | Contributes zero since negative example
|   9 |     1 | 8/9       |                                 
|  10 |     1 | 9/10      |                                 

The True Positive (TP) value here is 9.
The average precision = (1*7 + 8/9 + 9/10) / TP = 0.976
The value produced by the script (for all averaging schemes) = 0.865
I believe a_p_s is producing this value because it is dividing by N instead of TP.

Regards,
-Tom

@amueller
Copy link
Member

amueller commented Oct 9, 2015

ping @arjoly @jnothman

@absolutelyNoWarranty
Copy link
Contributor

Your script works fine if you reverse 'y_score'.
'y_score', the greater the score the higher the position. (Like probabilities.)

@tomsvds
Copy link
Author

tomsvds commented Oct 9, 2015

Ah, OK, that's most of the problem. Thanks! My bad.

But it exposes another thing that still looks wrong. Try this even simpler example:

y_true = [1, -1, 1]
y_score = [3, 2, 1]

Average precision should be (1/1 + 2/3) / 2 = 5/6 = 0.83
a_p_s produces 0.79

Regards,
-Tom

@absolutelyNoWarranty
Copy link
Contributor

The wikipedia page says:

That is the area under the precision-recall curve. This integral is in practice replaced with a finite sum over every position in the ranked sequence of documents:

That formula is more of an approximation than what sklearn is doing. In your example, the precision-recall graph has points (1,0), (1, 0.5), (0.5, 0.5) and (2/3, 1) and sklearn calculates the area under the curve formed by connected those points with straight lines. Notice the upward slope in going from (0.5, 0.5) to (2/3, 1), similar to what we see here: http://scikit-learn.org/stable/auto_examples/model_selection/plot_precision_recall.html

The finite sum formula is equivalent to calculate the area under a curve which has no upward slopes only stairs going downwards. (In your example, it'd be as if the point (0.5, 0.5) were magically moved up to (2/3, 0.5) to match the height of the next point.)

@amueller
Copy link
Member

@tomsvds more questions? Or can we close?

@tomsvds
Copy link
Author

tomsvds commented Oct 13, 2015

On Oct 12, 2015, at 2:53 PM, Andreas Mueller notifications@github.com wrote:
@tomsvds https://github.com/tomsvds more questions? Or can we close?

Well, I can’t use the function, but I appreciate the answer and I have no more questions.

Regards,
-Tom

@amueller
Copy link
Member

why can't you use the function? Because you don't want the standard definition of average precision?

@amueller
Copy link
Member

We could have options "interpolated", "triangle" and "quadrature"?

@glouppe glouppe changed the title error in error in average_precision_score Oct 19, 2015
@ndingwall
Copy link
Contributor

I just submitted a pull request to change the interpolation from linear to a sort of step function (for reasons I explain in some detail the comments of the pull request!) I'd support making linear interpolation an option, but using it as a default is (IMHO!) fatally flawed.

@roadjiang
Copy link

@tomsvds @absolutelyNoWarranty

I think it is an issue to fix for the current average precision algorithm.

If we follow the definition of NIST (National Institute of Standards and Technology),
http://trec.nist.gov/pubs/trec15/appendices/CE.MEASURES06.pdf

They define two versions of average precision (interpolated and non-interpolated).

According to them, for the non-interpolated average precision:
"As an example, consider a query that has four relevant documents which are retrieved at ranks 1, 2, 4, and 7. The actual precision obtained when each relevant document is retrieved is 1, 1, 0.75, and 0.57, respectively, the mean of which is 0.83. Thus, the average precision over all relevant documents for this query is 0.83"

However, the current code yields 0.81101190476190477 rather than 0.83
y_true = np.array([1, 1, 0, 1, 0, 0, 1])
y_scores = np.array([7, 6, 5, 4, 3, 2, 1])
average_precision_score(y_true, y_scores)

Out[520]: 0.81101190476190477

Besides, the current algorithm is different from the one on Wikipedia. Which calculates

\operatorname{AveP} = \frac{\sum_{k=1}^n (P(k) \times \operatorname{rel}(k))}{\mbox{number of relevant documents}} !

@achalddave
Copy link

Tentatively, it would be good to at least document this confusion about average precision. The overestimation of the true mAP as shown in #6377 is a rather critical bug.

@amueller
Copy link
Member

amueller commented Sep 6, 2016

@achalddave which confusion exactly? Whether it's interpolated? Also see #4577 and #6711 and check out the paper at http://pages.cs.wisc.edu/~jdavis/davisgoadrichcamera2.pdf If you want real interpolation, you need a validation set (or do it on the training set, which is possibly bad). Doing interpolation on the test set is methodically flawed.

@amueller
Copy link
Member

amueller commented Sep 6, 2016

@roadjiang I'm actually not quite sure why our definition disagrees with the wikipedia one. It shouldn't.

@ndingwall
Copy link
Contributor

@amueller @roadjiang Your comments here yesterday inspired me to finish a blog post and an update to average_precision_score that offers step-wise interpolation as an option. Step-wise interpolation is equivalent to randomly breaking ties, which removes a bias towards models that assign few discrete scores. This blog post discusses all this in more detail.

@bkj
Copy link

bkj commented Jan 24, 2017

I agree it'd be a good idea to put a disclaimer on the docs page that this doesn't agree with the most straightforward definition of AP. Took a little bit of time to figure out what was going on.

@amueller
Copy link
Member

@bkj I'm for more documentation, but I disagree with "this doesn't agree with the most straightforward definition of AP". If you look at the definition of AP on wikipedia, which is pretty straight-forward, that's what we do. Also the same in the IR book. If you say "area under the PR curve" that's way less straight-forward, very ambiguous, and also mostly wrong. The PR curve is a series of points, and linear interpolation between these points is meaningless. So it's very unclear what "area under the curve" means.

@amueller
Copy link
Member

I think we should close this. Adding interpolation as an option is still possible, but we should be very clear with the documentation.

@jnothman
Copy link
Member

jnothman commented Aug 21, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants