Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Binomial family with > 1 trials: mu and y in get_stat() don't match #342

Closed
fweber144 opened this issue Jul 15, 2022 · 2 comments
Closed

Comments

@fweber144
Copy link
Collaborator

In the "aggregated" binomial case (i.e., if there is more than one Bernoulli trial for at least one observation), mu and y in get_stat() don't match (tested with projpred v2.1.2):

set.seed(6834)
nobs <- 99L
x1 <- rnorm(nobs)
mu <- binomial()$linkinv(-0.42 + x1)
trials <- sample.int(30L, size = nobs, replace = TRUE)
y <- rbinom(nobs, size = trials, prob = mu)
dat <- data.frame(y = y, x1 = x1, trials = trials)

library(rstanarm)
options(mc.cores = parallel::detectCores(logical = FALSE))
r_fit <- stan_glm(
  cbind(y, trials - y) ~ 1 + x1,
  family = binomial(),
  data = dat,
  seed = 101
)

library(projpred)
vs <- varsel(r_fit,
             method = "forward",
             nclusters = 3,
             nclusters_pred = 5,
             seed = 12354)
debug(projpred:::get_stat)
summary(vs, stats = "mse", seed = 43465)

When debugging until line

value <- mean(weights * (mu - y)^2, na.rm = TRUE)
, we can see that we have

range(mu)
[1] 0.05699402 0.87109673

and

range(y)
[1]  0 25

so taking mu - y doesn't make sense.

@fweber144
Copy link
Collaborator Author

Just for the sake of completeness: In case of stats = "acc" (or stats = "pctcorr"), mu and y also don't match: Simply replace the last line from the reprex above by

summary(vs, stats = "acc")

and debug get_stat() until line

value <- mean(weights * (round(mu) == y), na.rm = TRUE)
. Then we have

table(round(mu))

giving


 0  1 
58 41

and

table(y)

giving

y
 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 23 25 
 4  9  8  6  9  8  6  5  8  3  3  3  2  1  3  2  4  2  5  3  3  1  1

which shows that taking round(mu) == y doesn't make sense.

@fweber144
Copy link
Collaborator Author

For the AUC, see #329 and #330.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant