`datafit`s: `type = "diff"` and `type = "diff.se"` #349

fweber144 · 2022-09-14T10:46:50Z

For datafits, summary.vsel() with type = "diff" and type = "diff.se" should give NAs (because the "reference model"'s predictions consists of NAs, see lines

projpred/R/refmodel.R

Lines 710 to 717 in c5a7800

    
           ref_predfun <- function(fit, newdata = NULL) { 
        
             stopifnot(is.null(fit)) 
        
             if (is.null(newdata)) { 
        
               return(matrix(rep(NA, NROW(y)))) 
        
             } else { 
        
               return(matrix(rep(NA, NROW(newdata)))) 
        
             } 
        
           }

), but it doesn't (data uploaded here: dat.csv):

library(projpred)
.extrmoddat_datafit <- function(object, newdata = NULL, wrhs = NULL,
                                orhs = NULL, resp_form = NULL) {
  if (is.null(newdata)) {
    newdata <- object$data
  }
  weights <- rep(1, nrow(newdata))
  offset <- rep(0, nrow(newdata))
  if (inherits(resp_form, "formula")) {
    y <- projpred:::eval_rhs(resp_form, newdata)
  } else {
    y <- NULL
  }
  return(projpred:::nlist(y, weights, offset))
}
fml <- y_glm_brnll ~ xco.1 + xco.2 + xco.3 + xca.1 + xca.2
dat <- read.csv("dat.csv", stringsAsFactors = TRUE)
extrmoddat <- function(object, newdata = NULL, wrhs = NULL, orhs = NULL,
                       extract_y = TRUE) {
  resp_form <- if (!extract_y) NULL else projpred:::lhs(fml)
  if (is.null(newdata)) {
    newdata <- dat
  }
  args <- projpred:::nlist(object, newdata, wrhs, orhs, resp_form)
  return(do.call(.extrmoddat_datafit, args))
}
dfit <- init_refmodel(object = NULL,
                      data = dat,
                      formula = fml,
                      family = binomial(),
                      extract_model_data = extrmoddat)
vs <- varsel(dfit, nclusters = 5, nclusters_pred = 10, seed = 5734)
print(vs)
#
# Family: binomial
# Link function: logit
#
# Formula: y_glm_brnll ~ xco.1 + xco.2 + xco.3 + xca.1 + xca.2
# Observations: 41
# Search method: l1, maximum number of terms 5
# Number of draws used for selection: 1
# Number of draws used for prediction: 1
# Suggested Projection Size: 3
#
# Selection Summary:
#  size solution_terms  elpd  se diff diff.se
#     0           <NA> -24.8 2.6    0       0
#     1          xco.3 -24.6 2.6    0       0
#     2          xco.1 -24.4 2.6    0       0
#     3          xca.1 -21.1 3.0    0       0
#     4          xco.2 -21.1 2.7    0       0
#     5          xca.2 -20.2 3.3    0       0
smmry <- summary(vs, stats = c("elpd", "mlpd", "mse", "rmse", "acc", "auc"),
                 type = c("diff", "diff.se"), seed = 46283)
options(width = 72)
print(smmry)
#
# Family: binomial
# Link function: logit
#
# Formula: y_glm_brnll ~ xco.1 + xco.2 + xco.3 + xca.1 + xca.2
# Observations: 41
# Search method: l1, maximum number of terms 5
# Number of draws used for selection: 1
# Number of draws used for prediction: 1
# Suggested Projection Size: 3
#
# Selection Summary:
#  size solution_terms elpd.diff elpd.diff.se mlpd.diff mlpd.diff.se
#     0           <NA>         0            0         0            0
#     1          xco.3         0            0         0            0
#     2          xco.1         0            0         0            0
#     3          xca.1         0            0         0            0
#     4          xco.2         0            0         0            0
#     5          xca.2         0            0         0            0
#  mse.diff mse.diff.se rmse.diff rmse.diff.se acc.diff acc.diff.se
#       NaN           0       NaN           NA      NaN           0
#       NaN           0       NaN           NA      NaN           0
#       NaN           0       NaN           NA      NaN           0
#       NaN           0       NaN           NA      NaN           0
#       NaN           0       NaN           NA      NaN           0
#       NaN           0       NaN           NA      NaN           0
#  auc.diff auc.diff.se
#        NA          NA
#        NA          NA
#        NA          NA
#        NA          NA
#        NA          NA
#        NA          NA
### Full summary table:
# summary(vs, stats = c("elpd", "mlpd", "mse", "rmse", "acc", "auc"),
#         type = c("mean", "se", "lower", "upper", "diff", "diff.se"),
#         seed = 46283)
###

(tested on branch master, commit c5a7800).

The text was updated successfully, but these errors were encountered:

fweber144 · 2022-09-14T10:58:31Z

Btw, in this case, the messages concerning ndraws <= 20 thrown on branch master (commit c5a7800) should be suppressed.

…n't throw the message concerning `ndraws <= 20`, see <#349 (comment)>.

Fix #349

fweber144 added a commit that referenced this issue Sep 14, 2022

datafits (and other cases where nclusters == S and S <= 20): Do…

61aa08c

…n't throw the message concerning `ndraws <= 20`, see <#349 (comment)>.

fweber144 mentioned this issue Sep 14, 2022

Fix #349 #350

Merged

fweber144 closed this as completed in 81d2bf0 Sep 14, 2022

fweber144 added a commit that referenced this issue Sep 14, 2022

Merge pull request #350 from fweber144/datafit_stats_NA

e7080db

Fix #349

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`datafit`s: `type = "diff"` and `type = "diff.se"` #349

`datafit`s: `type = "diff"` and `type = "diff.se"` #349

fweber144 commented Sep 14, 2022 •

edited

Loading

fweber144 commented Sep 14, 2022

datafits: type = "diff" and type = "diff.se" #349

datafits: type = "diff" and type = "diff.se" #349

Comments

fweber144 commented Sep 14, 2022 • edited Loading

fweber144 commented Sep 14, 2022

`datafit`s: `type = "diff"` and `type = "diff.se"` #349

`datafit`s: `type = "diff"` and `type = "diff.se"` #349

fweber144 commented Sep 14, 2022 •

edited

Loading