Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

datafits: type = "diff" and type = "diff.se" #349

Closed
fweber144 opened this issue Sep 14, 2022 · 1 comment · Fixed by #350
Closed

datafits: type = "diff" and type = "diff.se" #349

fweber144 opened this issue Sep 14, 2022 · 1 comment · Fixed by #350

Comments

@fweber144
Copy link
Collaborator

fweber144 commented Sep 14, 2022

For datafits, summary.vsel() with type = "diff" and type = "diff.se" should give NAs (because the "reference model"'s predictions consists of NAs, see lines

projpred/R/refmodel.R

Lines 710 to 717 in c5a7800

ref_predfun <- function(fit, newdata = NULL) {
stopifnot(is.null(fit))
if (is.null(newdata)) {
return(matrix(rep(NA, NROW(y))))
} else {
return(matrix(rep(NA, NROW(newdata))))
}
}
), but it doesn't (data uploaded here: dat.csv):

library(projpred)
.extrmoddat_datafit <- function(object, newdata = NULL, wrhs = NULL,
                                orhs = NULL, resp_form = NULL) {
  if (is.null(newdata)) {
    newdata <- object$data
  }
  weights <- rep(1, nrow(newdata))
  offset <- rep(0, nrow(newdata))
  if (inherits(resp_form, "formula")) {
    y <- projpred:::eval_rhs(resp_form, newdata)
  } else {
    y <- NULL
  }
  return(projpred:::nlist(y, weights, offset))
}
fml <- y_glm_brnll ~ xco.1 + xco.2 + xco.3 + xca.1 + xca.2
dat <- read.csv("dat.csv", stringsAsFactors = TRUE)
extrmoddat <- function(object, newdata = NULL, wrhs = NULL, orhs = NULL,
                       extract_y = TRUE) {
  resp_form <- if (!extract_y) NULL else projpred:::lhs(fml)
  if (is.null(newdata)) {
    newdata <- dat
  }
  args <- projpred:::nlist(object, newdata, wrhs, orhs, resp_form)
  return(do.call(.extrmoddat_datafit, args))
}
dfit <- init_refmodel(object = NULL,
                      data = dat,
                      formula = fml,
                      family = binomial(),
                      extract_model_data = extrmoddat)
vs <- varsel(dfit, nclusters = 5, nclusters_pred = 10, seed = 5734)
print(vs)
#
# Family: binomial
# Link function: logit
#
# Formula: y_glm_brnll ~ xco.1 + xco.2 + xco.3 + xca.1 + xca.2
# Observations: 41
# Search method: l1, maximum number of terms 5
# Number of draws used for selection: 1
# Number of draws used for prediction: 1
# Suggested Projection Size: 3
#
# Selection Summary:
#  size solution_terms  elpd  se diff diff.se
#     0           <NA> -24.8 2.6    0       0
#     1          xco.3 -24.6 2.6    0       0
#     2          xco.1 -24.4 2.6    0       0
#     3          xca.1 -21.1 3.0    0       0
#     4          xco.2 -21.1 2.7    0       0
#     5          xca.2 -20.2 3.3    0       0
smmry <- summary(vs, stats = c("elpd", "mlpd", "mse", "rmse", "acc", "auc"),
                 type = c("diff", "diff.se"), seed = 46283)
options(width = 72)
print(smmry)
#
# Family: binomial
# Link function: logit
#
# Formula: y_glm_brnll ~ xco.1 + xco.2 + xco.3 + xca.1 + xca.2
# Observations: 41
# Search method: l1, maximum number of terms 5
# Number of draws used for selection: 1
# Number of draws used for prediction: 1
# Suggested Projection Size: 3
#
# Selection Summary:
#  size solution_terms elpd.diff elpd.diff.se mlpd.diff mlpd.diff.se
#     0           <NA>         0            0         0            0
#     1          xco.3         0            0         0            0
#     2          xco.1         0            0         0            0
#     3          xca.1         0            0         0            0
#     4          xco.2         0            0         0            0
#     5          xca.2         0            0         0            0
#  mse.diff mse.diff.se rmse.diff rmse.diff.se acc.diff acc.diff.se
#       NaN           0       NaN           NA      NaN           0
#       NaN           0       NaN           NA      NaN           0
#       NaN           0       NaN           NA      NaN           0
#       NaN           0       NaN           NA      NaN           0
#       NaN           0       NaN           NA      NaN           0
#       NaN           0       NaN           NA      NaN           0
#  auc.diff auc.diff.se
#        NA          NA
#        NA          NA
#        NA          NA
#        NA          NA
#        NA          NA
#        NA          NA
### Full summary table:
# summary(vs, stats = c("elpd", "mlpd", "mse", "rmse", "acc", "auc"),
#         type = c("mean", "se", "lower", "upper", "diff", "diff.se"),
#         seed = 46283)
###

(tested on branch master, commit c5a7800).

@fweber144
Copy link
Collaborator Author

Btw, in this case, the messages concerning ndraws <= 20 thrown on branch master (commit c5a7800) should be suppressed.

fweber144 added a commit that referenced this issue Sep 14, 2022
…n't throw

the message concerning `ndraws <= 20`, see <#349 (comment)>.
@fweber144 fweber144 mentioned this issue Sep 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant