New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
error in average_precision_score #5379
Comments
Your script works fine if you reverse 'y_score'. |
Ah, OK, that's most of the problem. Thanks! My bad. But it exposes another thing that still looks wrong. Try this even simpler example:
Average precision should be (1/1 + 2/3) / 2 = 5/6 = 0.83 Regards, |
The wikipedia page says:
That formula is more of an approximation than what sklearn is doing. In your example, the precision-recall graph has points (1,0), (1, 0.5), (0.5, 0.5) and (2/3, 1) and sklearn calculates the area under the curve formed by connected those points with straight lines. Notice the upward slope in going from (0.5, 0.5) to (2/3, 1), similar to what we see here: http://scikit-learn.org/stable/auto_examples/model_selection/plot_precision_recall.html The finite sum formula is equivalent to calculate the area under a curve which has no upward slopes only stairs going downwards. (In your example, it'd be as if the point (0.5, 0.5) were magically moved up to (2/3, 0.5) to match the height of the next point.) |
@tomsvds more questions? Or can we close? |
Regards, |
why can't you use the function? Because you don't want the standard definition of average precision? |
We could have options "interpolated", "triangle" and "quadrature"? |
I just submitted a pull request to change the interpolation from linear to a sort of step function (for reasons I explain in some detail the comments of the pull request!) I'd support making linear interpolation an option, but using it as a default is (IMHO!) fatally flawed. |
@tomsvds @absolutelyNoWarranty I think it is an issue to fix for the current average precision algorithm. If we follow the definition of NIST (National Institute of Standards and Technology), They define two versions of average precision (interpolated and non-interpolated). According to them, for the non-interpolated average precision: However, the current code yields 0.81101190476190477 rather than 0.83 Out[520]: 0.81101190476190477 Besides, the current algorithm is different from the one on Wikipedia. Which calculates \operatorname{AveP} = \frac{\sum_{k=1}^n (P(k) \times \operatorname{rel}(k))}{\mbox{number of relevant documents}} ! |
Tentatively, it would be good to at least document this confusion about average precision. The overestimation of the true mAP as shown in #6377 is a rather critical bug. |
@achalddave which confusion exactly? Whether it's interpolated? Also see #4577 and #6711 and check out the paper at http://pages.cs.wisc.edu/~jdavis/davisgoadrichcamera2.pdf If you want real interpolation, you need a validation set (or do it on the training set, which is possibly bad). Doing interpolation on the test set is methodically flawed. |
@roadjiang I'm actually not quite sure why our definition disagrees with the wikipedia one. It shouldn't. |
@amueller @roadjiang Your comments here yesterday inspired me to finish a blog post and an update to |
I agree it'd be a good idea to put a disclaimer on the docs page that this doesn't agree with the most straightforward definition of AP. Took a little bit of time to figure out what was going on. |
@bkj I'm for more documentation, but I disagree with "this doesn't agree with the most straightforward definition of AP". If you look at the definition of AP on wikipedia, which is pretty straight-forward, that's what we do. Also the same in the IR book. If you say "area under the PR curve" that's way less straight-forward, very ambiguous, and also mostly wrong. The PR curve is a series of points, and linear interpolation between these points is meaningless. So it's very unclear what "area under the curve" means. |
I think we should close this. Adding interpolation as an option is still possible, but we should be very clear with the documentation. |
well the implementation has changed since this issue was raised anyway.
closing
…On 22 Aug 2017 1:08 am, "Andreas Mueller" ***@***.***> wrote:
I think we should close this. Adding interpolation as an option is still
possible, but we should be very clear with the documentation.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#5379 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz69sCJte6G4O71-gZaXSrDrATW3YDks5saZ1lgaJpZM4GMRD2>
.
|
I believe there is an error in sklearn.metrics.average_precision_score. Here is a script to show the result:
What we have are essentially ten documents, nine of which are Positive. The
example at position eight is negative. According to my understanding (and
Wikipedia, see https://en.wikipedia.org/wiki/Information_retrieval#Average_precision
third equation for AveP down the screen), the average precision score should be:
The True Positive (TP) value here is 9.
The average precision = (1*7 + 8/9 + 9/10) / TP = 0.976
The value produced by the script (for all averaging schemes) = 0.865
I believe a_p_s is producing this value because it is dividing by N instead of TP.
Regards,
-Tom
The text was updated successfully, but these errors were encountered: