Skip to content

Unexpected behaviour when incompatible full formulas included in search_terms argument #362

@sor16

Description

@sor16

Hi,

I noticed that when full formulas are used in search_terms input argument such that some formulas are incompatible, cv_varsel doesn't check the incompatibility and includes variables from two incompatible formulas. This can be seen in the following reprex

library(brms)
library(projpred)
set.seed(2)
N <- 100
p <- 5 #number of parameters
dat <- as.data.frame(matrix(rnorm(N*p), nrow = N, ncol = p)) #initialize data frame with p covariates
names(dat) <- paste0('x', 1:p)

betas <- rnorm(p) # simulate effect values
dat$y <- rnorm(N, mean=as.matrix(dat[, paste0('x', 1:p)]) %*% betas) #y is a noisy observation of the linear combination of covariates

formula_all <- as.formula(paste0('y~', paste(paste0('x', 1:p), collapse = '+')))
ref_mod <- brm(formula_all, data = dat, refresh = 0)
cv_select_prj <- cv_varsel(ref_mod, method = 'forward', cv_method = 'LOO', refit_prj = F, search_terms = c('x1+x2','x1+x3','x1+x2+x4'))

Here, I would assume that either x1, x2, x4 would be included or x1 and x3 only, as x1+x2+x4 is not compatible with x1+x3. However, cv_varsel returns:

Selection Summary:
 size solution_terms elpd.loo  se  diff diff.se
    0           <NA>   -181.9 6.0 -36.8     7.0
    1        x1 + x3   -161.3 6.0 -16.2     5.3
    2             x2   -149.1 6.1  -4.1     3.1
    3             x4   -147.4 5.8  -2.3     2.5

Is this the intended behaviour or should we try to resolve this? I The problem seems to lie in select_possible_terms_size in formula.R.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementEnhancements of existing features, but also new feature requests.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions