-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hammering home the point #1
Comments
Oh, the "Hammering home the point" section is still about P-R curves. I now realize that it's confusing because we talk about ROC AUC and then go back to P-R without making the switch clear. I'll update the text to clarify this. Anyway, I think we agree: linear interpolation is fine for ROC AUC, but not P-R. I'm not sure I follow your second point. Is there any reason to think that a better-than-chance classifier could be improved by randomly increasing or decreasing all of the scores? (Okay, not quite randomly, but the classifier doesn't know how rounding works, so from its point of view these changes are essentially random!) Finally, I would assume that anyone using this function is computing the operating points on a test set, so now it's just a matter of how we convert the list of operating points into a single number. It makes sense for that number to represent the area under a curve defined by those operating points, and so we just need to choose how to interpolate. We agree that we shouldn't interpolate linearly. Step interpolation arises naturally in the same way that coin-flipping leads to linear interpolation for ROC, so that seems like the right choice. Another option is to return to the ROC space, compute the curve and transform it into the P-R space. But that would require a pretty big change to the API: lists containing precision and recall numbers aren't enough to compute ROC. I might have missed your point entirely though, in which case please let me know! Anyway, thanks for your feedback on this! |
I think we agree on most things. I really recommend reading the paper that I cited, though ;) I commented on your PR on what I think would be the right thing to do. |
Hm but maybe the simple point that I'm not sure got across is: in the IR book, they remove the dips by computing a maximum (on the test set!!). You don't do that in your code at all. |
Yes, I was confused by that as well:
This seems wrong to me, because you wouldn't know if the precision of the larger set is higher without peeking at the gold labels. My reference to the paper was just in their use of horizontal segments to interpolate between points, but I'm not convinced by their overall strategy. I'm about to get on a flight so I'll review the rest of your comments and read that paper. Thanks for all your feedback - super helpful! |
Yeah I just discussed exactly the same point with a colleague and we both Sent from phone. Please excuse spelling and brevity. On Sep 8, 2016 18:27, "ndingwall" notifications@github.com wrote:
|
Great - happy to help! |
Just some comment on this here:
https://github.com/roaminsight/roamresearch/blob/master/BlogPosts/Average_precision/Average_precision_post.ipynb
It's true that linear interpolation for PR curves is overly optimistic and we should fix that. See this paper:
http://machinelearning.wustl.edu/mlpapers/paper_files/icml2006_DavisG06.pdf
However, linear interpolation is totally fine for ROC curves and your argument in "Hammering home the point" is actually wrong.
The way you picked the rounding point, you actually ended up with a better classifier. You can achieve any classifier that is linearly interpolated between points by flipping a weighted coin on which of the two end-points to use (for ROC, not PR).
There is a more subtle issue with how you do the interpolation. Choosing any point for interpolation based on their P/R values means that you already observed these points. You can no longer use them as you test set. So interpolation that skips "bad" points is only allowed when using a validation set.
The text was updated successfully, but these errors were encountered: