Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inline functions in formulas cause error #2

Closed
topepo opened this issue May 11, 2017 · 3 comments

Comments

Projects
None yet
2 participants
@topepo
Copy link
Contributor

commented May 11, 2017

It looks like there are errors if any inline functions are used on either side of the formula:

> library(olsrr)
> library(caret)
> data("Sacramento")
> 
> lm_fit1 <- lm(log(price)  ~ . - city, data = Sacramento)
> 
> stepw <- ols_stepwise(lm_fit1)
We are selecting variables based on p value...
Error in eval(expr, envir, enclos) : object 'price' not found
> 
> lm_fit2 <- lm(price  ~ beds + baths + log(sqft), data = Sacramento)
> stepw <- ols_stepwise(lm_fit2)
We are selecting variables based on p value...
Error in eval(expr, envir, enclos) : object 'sqft' not found
> # this works
> lm_fit3 <- lm(price  ~ beds + baths + sqft, data = Sacramento)
> stepw <- ols_stepwise(lm_fit3)
We are selecting variables based on p value...
1 variable(s) added....
1 variable(s) added...
No more variables to be added or removed.
> sessionInfo()
R version 3.3.3 (2017-03-06)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: macOS Sierra 10.12.4

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] caret_6.0-76       ggplot2_2.2.1.9000 lattice_0.20-34    olsrr_0.1.0       

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.10       nloptr_1.0.4       plyr_1.8.4         iterators_1.0.8    tools_3.3.3       
 [6] goftest_1.1-1      lme4_1.1-13        tibble_1.3.0.9002  nlme_3.1-131       gtable_0.2.0      
[11] mgcv_1.8-17        rlang_0.1.9000     Matrix_1.2-8       foreach_1.4.3      parallel_3.3.3    
[16] SparseM_1.77       gridExtra_2.2.1    dplyr_0.5.0.9004   stringr_1.2.0      MatrixModels_0.4-1
[21] stats4_3.3.3       grid_3.3.3         nnet_7.3-12        nortest_1.0-4      glue_1.0.0        
[26] R6_2.2.0           minqa_1.2.4        tidyr_0.6.2        purrr_0.2.2        reshape2_1.4.2    
[31] car_2.1-4          magrittr_1.5       scales_0.4.1       codetools_0.2-15   ModelMetrics_1.1.0
[36] MASS_7.3-45        splines_3.3.3      assertthat_0.2.0   pbkrtest_0.4-7     colorspace_1.3-2  
[41] quantreg_5.33      stringi_1.1.5      lazyeval_0.2.0     munsell_0.4.3    

It looks like reg_comp gets fed a data frame or matrix of predictors from ols_regress. You might want to pass the formula through model.frame or something to resolve any functions that are embedded. You might also want to look out for an inline like ns since it will create 2+ columns from the original variable (in case it messes up how d.f. are counted).

@aravindhebbali

This comment has been minimized.

Copy link
Member

commented May 11, 2017

Thank you @topepo. Yes, currently the variable selection methods work only when the formula is straight forward i.e. without any variable transformation or inline functions (which is a shortcoming). We will work on it as suggested by you.

@aravindhebbali aravindhebbali self-assigned this May 11, 2017

@aravindhebbali aravindhebbali added the bug label May 11, 2017

@aravindhebbali aravindhebbali added this to the v0.2.0 milestone Jun 5, 2017

aravindhebbali added a commit that referenced this issue Jun 5, 2017

@aravindhebbali

This comment has been minimized.

Copy link
Member

commented Jun 5, 2017

We have changed the way reg_comp uses data from the model. It now uses eval(model$call$data) to access the data and colnames(attr(model$terms, 'factors')) to access the independent terms.

> library(olsrr)
> library(caret)
> data("Sacramento")

> lm_fit1 <- lm(log(price)  ~ . - city, data = Sacramento)
> stepw <- ols_stepwise(lm_fit1)
We are selecting variables based on p value...
1 variable(s) added....
1 variable(s) added...
1 variable(s) added...
1 variable(s) added...
No more variables to be added or removed.

> stepw
Stepwise Selection Method                                                               

Candidate Terms:                                                                        

1 . zip                                                                                 
2 . beds                                                                                
3 . baths                                                                               
4 . sqft                                                                                
5 . type                                                                                
6 . latitude                                                                            
7 . longitude                                                                           

---------------------------------------------------------------------------------------
                              Stepwise Selection Summary                                
---------------------------------------------------------------------------------------
                      Added/                   Adj.                                        
Step    Variable     Removed     R-Square    R-Square      C(p)        AIC        RMSE     
---------------------------------------------------------------------------------------
   1      sqft       addition       0.533       0.533    478.8286    740.1894    0.3592    
   2    longitude    addition       0.561       0.560    398.6749    686.1102    0.3493    
   3    latitude     addition       0.562       0.561    395.5536    684.5000    0.3479    
   4      baths      addition       0.564       0.562    393.1857    683.4099    0.3479    
---------------------------------------------------------------------------------------

> lm_fit2 <- lm(price  ~ beds + baths + log(sqft), data = Sacramento)
> stepw <- ols_stepwise(lm_fit2)
We are selecting variables based on p value...
1 variable(s) added....
1 variable(s) added...
No more variables to be added or removed.

> stepw
Stepwise Selection Method                                                                   

Candidate Terms:                                                                            

1 . beds                                                                                    
2 . baths                                                                                   
3 . log(sqft)                                                                               

-------------------------------------------------------------------------------------------
                                Stepwise Selection Summary                                  
-------------------------------------------------------------------------------------------
                      Added/                   Adj.                                            
Step    Variable     Removed     R-Square    R-Square     C(p)         AIC          RMSE       
-------------------------------------------------------------------------------------------
   1    log(sqft)    addition       0.568       0.567    52.6943    23833.1040    86242.3553    
   2      beds       addition       0.591       0.590     2.9559    23784.5900    83981.7543    
-------------------------------------------------------------------------------------------
@aravindhebbali

This comment has been minimized.

Copy link
Member

commented Jun 5, 2017

I am closing this issue for the time being. Might have to reopen it when we come across any model that uses natural splines ns.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.