New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Built-in box for ROC curve #197
Conversation
The BoxOutputState may not exist because we may not have opened the inside of the custom box. But the BoxOutputState is actually a table GUID. The user is ignored in this case, but the original code didn't offer real access control either on this endpoint.
select | ||
label, score, | ||
sum(label) over ( | ||
order by score desc rows between | ||
unbounded preceding and current row) | ||
/ (select sum(label) from input) | ||
as tpr, | ||
|
||
sum(1 - label) over ( | ||
order by score desc rows between | ||
unbounded preceding and current row) | ||
/ (select sum(1 - label) from input) | ||
as fpr | ||
|
||
from input |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@erbenpeter when you are not on vacation I'd appreciate if you could take a look at this SQL query for calculating the curve.
select sum(1 - fpr) / count(1) as AUC | ||
from input where label == 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Plus this one for calculating the AUC. I made up both queries myself instead of copying them from somewhere, so I'm not terribly confident in them. 😅 Thanks!
Everyone's on vacation but I want to prepare a release candidate. I'm merging this while hoping for a future review. |
I'm back from vacation, so now I'll take a look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, nice and clean!
I almost like it. My issue is with the possibility of items have the same score but different labels (which I think is possible). See my more detailed comments inline.
@@ -392,8 +392,7 @@ class ProductionJsonServer @javax.inject.Inject() ( | |||
|
|||
def downloadCSV = asyncAction(parse.anyContent) { (user: User, r: mvc.Request[mvc.AnyContent]) => | |||
val request = parseJson[GetTableOutputRequest](user, r) | |||
implicit val metaManager = workspaceController.metaManager | |||
val table = workspaceController.getOutput(user, request.id).table | |||
val table = workspaceController.metaManager.table(java.util.UUID.fromString(request.id)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part I don't understand even after reading the long commit command. How is it related to the newly added built-in box?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part I don't understand even after reading the long commit command. How is it related to the newly added built-in box?
The plot refers to the table GUID and the frontend sends this request to get the data. The box output state for a table also has the table's GUID. So you can access the same table either by looking up the GUID as a box output state ID and then taking the table from it (the old code) or looking up the GUID as a table (the new code).
Box output states are not persisted. We assume you only want to look at a box output that we have returned in this run. So if you restart LynxKite after creating a plot, and then look at the plot without looking at the box that generated it, you get an error. This is an edge case I didn't consider originally. You don't typically look at box outputs when not looking at the box. Except this happens with custom boxes!
from vertices | ||
where isnotnull(${`true label`}) | ||
and isnotnull(${`predicted score`}) | ||
limit ${`sample size`} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we regard this a a true random sample because the rows are in random order in the query?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we regard this a a true random sample because the rows are in random order in the query?
Yes. The input is a graph where this usually holds.
select | ||
label, score, | ||
sum(label) over ( | ||
order by score desc rows between |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only found a small issue here: If it's possible that the classifier assigns the same score for multiple items, for that specific score we will get a lot of different (fpr, tpr)
pairs. Only the last one (order by score desc) is really meaningful.
On the chart it can create weird blobs I think.
One possible solution is to add a second sql to pick the last value from each score-group, but it only works if we can assume that the order of the records remains the same between the first and the second sql. Or we can pre-aggregate by score (num_1, num_0)
and the then your SQL will work even for repeated score values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only found a small issue here: If it's possible that the classifier assigns the same score for multiple items, for that specific score we will get a lot of different
(fpr, tpr)
pairs. Only the last one (order by score desc) is really meaningful.On the chart it can create weird blobs I think.
Nope. Looks fine:
parameters: | ||
persist: 'no' | ||
sql: | | ||
select sum(1 - fpr) / count(1) as AUC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a tricky implementation (compared to the first 10 I've found googling) but it also assumes that scores are unique (which I think is not realistic: if two customers have the same ingredients the (deterministic) classifier needs to assign the same score to them, but it's possible that their labels are different in reality.)
Mathematically speaking this formula only works when the segments of the ROC curve are all vertical or horizontal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To phrase my problem in a different way: if it's possible to have repeated scores with different labels, your curve is not defined, its shape (and consequently the AUC value) )depends on the order of the data points inside a group with the same score.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To phrase my problem in a different way: if it's possible to have repeated scores with different labels, your curve is not defined, its shape (and consequently the AUC value) )depends on the order of the data points inside a group with the same score.
You need to have them in a random order, which is the case here. So you get a nearly diagonal line and an AUC close to 0.5 in the case of the above screenshot. (Random 0/1 label and random 0/1 prediction.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mathematically speaking this formula only works when the segments of the ROC curve are all vertical or horizontal.
Then I have great news! The segments of the ROC curve are all vertical or horizontal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, nice and clean!
I almost like it. My issue is with the possibility of items have the same score but different labels (which I think is possible). See my more detailed comments inline.
I've checked how scikit-learn does the ROC curve when scores are repeated and it does it like you say. 😓 Sorry I was confidently wrong. I've sent a fix in #202. Thanks! |
Resolves #181.
In action:
And inside: