Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calculating AUC/AUPRC confidence intervals #13

Open
micdonato opened this issue Nov 19, 2019 · 9 comments
Open

Calculating AUC/AUPRC confidence intervals #13

micdonato opened this issue Nov 19, 2019 · 9 comments
Assignees

Comments

@micdonato
Copy link

Hello.

I love precrec, but every time I use it I have to go crazy with integrating it with pROC to include confidence intervals of the AUCs (I still wasn't able to do so for AUPRCs).

Since precrec computes the cb bounds for the curves, is it possible to have the confidence intervals coming out of the auc function?

@takayasaito takayasaito self-assigned this Nov 20, 2019
@takayasaito
Copy link
Member

I checked the source code of pROC for its CI calculation and found that it uses a bootstrapping approach. pROC generates 2000 bootstrap samples (resampling with replacement) by default so that 2000 AUCs should be calculated. Then, it simply selects the 0.25 and the 0.975 quantiles from the list of the calculated AUCs when the significant level (alpha) is 0.05.

Since precrec doesn't provide bootstrapping, we can't apply the same method to calculate CIs. Alternatively, you can still use precrec to calculate a CI when you are dealing with cross-validation data. I added a simple help function called auc_ci that performs CI calculation on precrec objects.

library(precrec)

# Create sample datasets with 100 positives and 100 negatives
samps <- create_sim_samples(4, 100, 100, "all")
mdat <- mmdata(samps[["scores"]], samps[["labels"]],
               modnames = samps[["modnames"]],
               dsids = samps[["dsids"]])

# Generate an mscurve object that contains ROC and Precision-Recall curves
mmcurves <- evalmod(mdat)

# Calculate CI of AUCs
auc_ci(mmcurves)

# Calculate CI with alpha = 0.01
auc_ci(mmcurves, alpha = 0.01)

# Calculate CI with t-distribution
auc_ci(mmcurves, dtype = "t")

I have submitted precrec v0.11 to CRAN, and it has been already available for several platforms. You can check the availability status here.

@JanaFe
Copy link

JanaFe commented Nov 26, 2020

Hi, I am also trying to calculate confidence intervals for the area under the precision recall curve with R version 4.0.3.

I have a vector of scores (value range 0-100), and a vector of labels (0 or 1).
Running this code:

mdat <- mmdata(scores, labels)  
mmcurves <- evalmod(mdat)  
mm_auc_ci <- auc_ci(mmcurves, alpha=0.05, dtype='t')  

Gives an error:
Error: 'curves' must contain multiple datasets.

What am I doing wrong?

@takayasaito
Copy link
Member

precrec doesn't calculate confidence band/confidence interval for a single testset but for cross-validation results with multiple testsets. Your example seems like a case of a single test set to me. It is of course possible to use a bootstrapping approach to simulate the result of your model with a single test set, but I don't know whether or not it's a good idea.

  1. Your example
library(precrec)

# Create scores and labels
n <- 100
scores <- runif(n)*100
labels <- sample(c(0, 1), n, replace=TRUE)

# Calculate curves (single model with single dataset)
mdat <- mmdata(scores, labels)
sscurves <- evalmod(mdat)
plot(sscurves)
  1. Resample scores r1 times
# Create bootstrapped scores
r1 <- 10
resampled_scores <- replicate(r1, sample(scores, replace=TRUE))

# Calculate curves (single model with multiple datasets)
smdat1 <- mmdata(resampled_scores, labels, modnames=rep("m1", r1), dsids=1:r1)
smcurves1 <- evalmod(smdat1)
plot(smcurves1)
auc_ci(smcurves1)  
  1. Resample labels r2 times
# Create bootstrapped labels
r2 <- 10
resampled_labels <- replicate(r2, sample(labels, replace=TRUE))

# Calculate curves (single model with multiple datasets)
smdat2 <- mmdata(replicate(r2, scores), resampled_labels, modnames=rep("m1", r2), dsids=1:r2)
smcurves2 <- evalmod(smdat2)
plot(smcurves2)
auc_ci(smcurves2)  

To access the performance of your model accurately, it would be much better to perform cross-validation than bootstrapping the result of your model on a test dataset (resampling scores and labels like the examples above). I would avoid using any bootstrapping approaches if it's possible.

@JanaFe
Copy link

JanaFe commented Nov 29, 2020

That helps, thanks a lot!

@bblodfon
Copy link

Hi @takayasaito! Happy to have found your package! I am trying to do something similar to the above (ie we have predictions from a single model and we do stratified bootstrap both on labels and scores to see the variability of the PR) and would like a bit your help since you know the internal functions better than me :)

So, how can I get the Precision-Recall data in a data.frame from an smcurves object (before plotting)? eg

library(precrec)

samps = create_sim_samples(4, 100, 100, "good_er")
mdat  = mmdata(samps[["scores"]], samps[["labels"]],
  modnames = samps[["modnames"]],
  dsids = samps[["dsids"]]
)
smcurves = evalmod(mdat, type = "rocpr")

# how can I get a `data.frame` with colnames `c(recall, precision, threshold)` for each dataset ID?
# ie a list of `data.frame`s with that info? My problem especially using `PRROC` doing the same 
# thing is that the multiplicity and number of thresholds is different so merging them is really a 
# pain :) - which I think you have solved since we can call `plot(smcurves)`!
smcurves
#> 
#>     === AUCs ===
#> 
#>      Model name Dataset ID Curve type       AUC
#>    1    good_er          1        ROC 0.8364000
#>    2    good_er          1        PRC 0.8593735
#>    3    good_er          2        ROC 0.7677000
#>    4    good_er          2        PRC 0.8169513
#>    5    good_er          3        ROC 0.8218000
#>    6    good_er          3        PRC 0.8520650
#>    7    good_er          4        ROC 0.8139000
#>    8    good_er          4        PRC 0.8528955
#> 
#> 
#>     === Input data ===
#> 
#>      Model name Dataset ID # of negatives # of positives
#>    1    good_er          1            100            100
#>    2    good_er          2            100            100
#>    3    good_er          3            100            100
#>    4    good_er          4            100            100

Created on 2024-04-26 with reprex v2.0.2

@bblodfon
Copy link

bblodfon commented Apr 26, 2024

Ah, ok you have it in res = precrec::evalmod(data, raw_curves = TRUE), can extract it, nice

@bblodfon
Copy link

bblodfon commented Apr 26, 2024

So, the thresholds might not be equal as far as I can see (I thought x_bins controls for that), may it's a bug? I have another example where there are way more unique values. Maybe filling them up with the last precision value in each respective vector makes sense? (without breaking the 1-1 correspondence between the thresholds I guess, if that makes sense...)

library(precrec)

samps = create_sim_samples(100, 20, 20, "good_er")
mdat  = mmdata(samps[["scores"]], samps[["labels"]],
  modnames = samps[["modnames"]],
  dsids = samps[["dsids"]]
)

# Generate an smcurve object that contains ROC and Precision-Recall curves
smcurves = evalmod(mdat, type = "rocpr", raw_curves = TRUE)
# extract precision vectors per dataset
precision = lapply(smcurves$prcs, function(obj) obj$y)
unique(unlist(lapply(precision, length)))
#> [1] 1024 1023

Created on 2024-04-26 with reprex v2.0.2

@takayasaito
Copy link
Member

For the first example, you can simply call data.frame as data.frame(smcurves).

data.frame(smcurves) |> head()
#      x      y      ymin      ymax modname type
#1 0.000 0.0000 0.0000000 0.0000000 good_er  ROC
#2 0.000 0.2975 0.1912348 0.4037652 good_er  ROC
#3 0.001 0.2975 0.1912348 0.4037652 good_er  ROC
#4 0.002 0.2975 0.1912348 0.4037652 good_er  ROC
#5 0.003 0.2975 0.1912348 0.4037652 good_er  ROC
#6 0.004 0.2975 0.1912348 0.4037652 good_er  ROC

Similarly, you can use data.frame to convert an AUC object to a data.frame.

auc(smcurves) |> data.frame() |> head()
#  modnames dsids curvetypes      aucs
#1  good_er     1        ROC 0.7683000
#2  good_er     1        PRC 0.8108477
#3  good_er     2        ROC 0.8287000
#4  good_er     2        PRC 0.8626605
#5  good_er     3        ROC 0.7498000
#6  good_er     3        PRC 0.7995740

@takayasaito
Copy link
Member

For the second question, you can convert the object to a data frame in order to check the actual values.

library(dplyr)

precision <- data.frame(smcurves) |> 
  dplyr::filter(type == "ROC" & modname == "good_er" & dsid == 1) |>
  dplyr::select(x)

length(precision) == length(unique(precision))
# [1] TRUE

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants