Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

search_terms argument in cv_varsel not working properly #345

Closed
sor16 opened this issue Aug 8, 2022 · 2 comments
Closed

search_terms argument in cv_varsel not working properly #345

sor16 opened this issue Aug 8, 2022 · 2 comments
Labels
bug Bugs.

Comments

@sor16
Copy link

sor16 commented Aug 8, 2022

Hi,

I am interested in doing variable selection on just a subset of variables, while including the rest in all submodels. This naturally leads to the use of the search_terms argument within cv_varsel. Assume I have a data set of 5 covariates, a treatment variable t and a response y. I let one of the 5 covariates be a confounder, so I am interested in doing variable selection of the remaining 4 covariates, while including t and the confounder x1 in each submodel. A reproducible example of the data generating process, reference model construction and cv_varsel call are below:

library(brms)
library(projpred)
#function to create the space of submodels to consider
get_search_terms <- function(idx_select, idx_fixed = NULL){
 search_terms <- c()
 n_select <- length(idx_select)
 for(i in 1:n_select){
   search_terms <- c(search_terms, apply(combn(idx_select[1:n_select], i), 2, function(x) paste(paste0("x", x),collapse = "+")))
 }
 fixed_terms <- ''
 if(length(idx_fixed)!=0){
   fixed_terms <- paste(paste0('x', idx_fixed), collapse = '+')
   search_terms <- paste(fixed_terms, search_terms,sep = '+')
   base_model <- paste('t',fixed_terms,sep='+')
 }else{
   base_model <- 't'
 }

 search_terms <- c(base_model,paste0('t+', search_terms))
 return(search_terms)
}

set.seed(2)
N <- 100
p <- 5 #number of additional parameters
p_conf <- 1 #number of confounding variables
p_rel <- p_conf + 2 #number of "relevant" parameters affecting y, excluding t
dat <- as.data.frame(matrix(rnorm(N*p), nrow = N, ncol = p)) #initialize data frame with p covariates
names(dat) <- paste0('x', 1:p)

if(p_conf>0){
 t_betas <- rnorm(p_conf) # effects of confounders on treatment
 dat$t <- rnorm(N, as.matrix(dat[, paste0('x', 1:p_conf)]) %*% t_betas) #treatment is a noisy observation of a linear combination of confounders
}else{
 dat$t <- rnorm(N) #no confounders
}

y_betas <- rnorm(p_rel+1) # effects of treatment and "relevant" parameters on response y
dat$y <- rnorm(N, mean=as.matrix(dat[, c('t', paste0('x', 1:p_rel))]) %*% y_betas) #y is a noisy observation of a linear combination of relevant parameters and treatment
formula_all <- as.formula(paste0('y~', paste(c('t', paste0('x', 1:p)), collapse = '+')))
ref_mod <- brm(formula_all, data = dat, refresh = 0) # fit reference model
possible_mods <- get_search_terms(idx_select = seq(p_conf+1, p), idx_fixed = seq_len(p_conf))
cv_select_prj <- cv_varsel(ref_mod, method = 'forward', cv_method = 'LOO', refit_prj = F, validate_search = F, search_terms = possible_mods) #refit_prj=F and validate_search=F for speed

This produces an error,

Error in cv_varsel.refmodel(refmodel, ...) : 
  Unexpected number of rows in `solution_terms_cv_chr`. Please notify the package maintainer.

It seems it has to do with the dimension of the solution_terms_mat which uses nterms_max to infer number of submodels to fit, but in this case, both t and x1 are included in the same step so nterms_max does not include information on the number of submodels in the search path. This issue is related #307 and it seems like fixing this would require some significant changes to the search_forward function. I would be happy to contribute to such changes if there is interest.

Best,
Sölvi

@fweber144
Copy link
Collaborator

Thank you for creating the issue. Yes, this needs to be fixed.

@fweber144 fweber144 added the bug Bugs. label Aug 8, 2022
fweber144 added a commit to fweber144/projpred that referenced this issue Sep 15, 2022
sor16 pushed a commit to sor16/projpred that referenced this issue Oct 18, 2022
…rms in case search_terms argument is NOT NULL. Importantly, it also changes a detail in the way new solution_terms are represented, by removing all previously selected variables from them, see change in select_possible_terms_size function.
@sor16 sor16 mentioned this issue Oct 18, 2022
@fweber144
Copy link
Collaborator

Fixed by PR #360.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bugs.
Projects
None yet
Development

No branches or pull requests

2 participants