Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Returning scores along with the curves? #5

Open
lukauskas opened this issue Jan 12, 2018 · 1 comment
Open

Returning scores along with the curves? #5

lukauskas opened this issue Jan 12, 2018 · 1 comment
Assignees

Comments

@lukauskas
Copy link

Currently the PRC curve returns essentially a DataFrame with three columns:
x, y, and a boolean column orig_points.

Is it possible to somehow map the non-interpolated points (orig_points = 1) to the actual score thresholds for the resulting precision/recall measurements? Somewhat among the lines of how sklearn handles it. This is sometimes needed to ask questions, like 'what is the minimum threshold at which precision is >= 75%?' or similar.

I assume a sorted increasing list of unique scores should map 1:1 to the orig_points, but this seems a bit hacky. Maybe there is a way to get it out of precrec directly?

@takayasaito takayasaito self-assigned this Jan 13, 2018
@takayasaito
Copy link
Member

It is not easy to use orig_points to retrieve the corresponding scores, but you can use mode = "basic" for that purpose. For instance, the following snippet shows how to get the original scores when precision is greater than or equal to 0.75.

library("precrec")

# Dataset with 10 positives and 10 negatives
data(P10N10)

# Calculate basic evaluation measures
sspoints <- evalmod(mode = "basic", scores = P10N10$scores, labels = P10N10$labels)

# Convert sspoints to data.frame
df <- data.frame(sspoints)

# Get normalized threshold values for precision >= 0.75
xs <- df[df$type == "precision" & df$y >= 0.75, "x"]

# Show scores and precision values corresponding to xs
df[df$x %in% xs & df$type %in% c("score", "precision"), ]

In the data frame of the example above, the x column contains the normalized threshold values with range [0, 1], and the y column contains the values specified in the type column.

Unlike ROC, precision-recall curves are not monotonically increasing so that you may need to add one more condition, such as 'recall is greater than 0.5', for some cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants