Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

best subset regression fails when model formula contains inline functions or interaction variables #5

Closed
aravindhebbali opened this issue Jun 3, 2017 · 1 comment

Comments

Projects
None yet
1 participant
@aravindhebbali
Copy link
Member

commented Jun 3, 2017

ols_best_subset() returns an error when the formula in the model contains inline functions or interaction variables.

> library(caret)
> data("Sacramento")
> lm_fit2 <- lm(price  ~ beds + baths + log(sqft), data = Sacramento)
> ols_best_subset(lm_fit2)
Error in eval(predvars, data, env) : object 'sqft' not found
Called from: eval(predvars, data, env)

# interaction variables
> lm_fit3 <- lm(mpg ~ disp + hp + wt + am * disp, data = mtcars)
> ols_best_subset(lm_fit3)
Error in ols_mallows_cp(out$model, model) : 
  model must be a subset of full model
Called from: ols_mallows_cp(out$model, model)

Session Info

> sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=English_India.1252  LC_CTYPE=English_India.1252   
[3] LC_MONETARY=English_India.1252 LC_NUMERIC=C                  
[5] LC_TIME=English_India.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] Rcpp_0.12.7     gridExtra_2.0.0 tidyr_0.6.0     tibble_1.2     
[5] purrr_0.2.2     dplyr_0.5.0     caret_6.0-76    ggplot2_2.2.1  
[9] lattice_0.20-35

loaded via a namespace (and not attached):
 [1] magrittr_1.5       splines_3.4.0      MASS_7.3-47       
 [4] munsell_0.4.3      colorspace_1.2-7   R6_2.2.1          
 [7] foreach_1.4.3      minqa_1.2.4        stringr_1.1.0     
[10] car_2.1-2          plyr_1.8.4         tools_3.4.0       
[13] parallel_3.4.0     nnet_7.3-12        pbkrtest_0.4-6    
[16] grid_3.4.0         gtable_0.2.0       nlme_3.1-125      
[19] mgcv_1.8-17        quantreg_5.19      DBI_0.5-1         
[22] MatrixModels_0.4-1 iterators_1.0.8    lme4_1.1-11       
[25] lazyeval_0.2.0     assertthat_0.2.0   Matrix_1.2-9      
[28] nloptr_1.0.4       reshape2_1.4.2     ModelMetrics_1.1.0
[31] codetools_0.2-15   stringi_1.1.2      compiler_3.4.0    
[34] scales_0.4.1       stats4_3.4.0       SparseM_1.7  

@aravindhebbali aravindhebbali added the bug label Jun 3, 2017

@aravindhebbali aravindhebbali self-assigned this Jun 3, 2017

@aravindhebbali aravindhebbali added this to the v0.2.0 milestone Jun 5, 2017

aravindhebbali added a commit that referenced this issue Jun 5, 2017

@aravindhebbali

This comment has been minimized.

Copy link
Member Author

commented Jun 5, 2017

ols_best_subset() does not return an error when model formula contains inline functions or interaction variables.

> library(olsrr)
> library(caret)
> data("Sacramento")

> lm_fit2 <- lm(price  ~ beds + baths + log(sqft), data = Sacramento)
> ols_best_subset(lm_fit2)
      Best Subsets Regression      
-----------------------------------
Model Index    Predictors
-----------------------------------
     1         log(sqft)            
     2         beds log(sqft)       
     3         beds baths log(sqft) 
-----------------------------------

                                                                  Subsets Regression Summary                                                                  
--------------------------------------------------------------------------------------------------------------------------------------------------------------
                       Adj.        Pred                                                                                                                        
Model    R-Square    R-Square    R-Square     C(p)         AIC           SBIC          SBC             MSEP                FPE              HSP          APC  
--------------------------------------------------------------------------------------------------------------------------------------------------------------
  1        0.5680      0.5670      0.5655    52.6943    23833.1040    21187.9990    23847.6160    7453739033.0154    7453704671.7157    8006182.8289    0.4340 
  2        0.5910      0.5900      0.5873     2.9559    23784.5900    21139.7082    23803.9393    7075719175.8852    7075637629.2594    7600145.5265    0.4120 
  3        0.5910      0.5900      0.5856     4.0000    23785.6305    21140.7635    23809.8172    7083688577.2837    7083541628.0341    7608705.5907    0.4124 
--------------------------------------------------------------------------------------------------------------------------------------------------------------
AIC: Akaike Information Criteria 
 SBIC: Sawa's Bayesian Information Criteria 
 SBC: Schwarz Bayesian Criteria 
 MSEP: Estimated error of prediction, assuming multivariate normality 
 FPE: Final Prediction Error 
 HSP: Hocking's Sp 
 APC: Amemiya Prediction Criteria 

# interaction variables
> lm_fit3 <- lm(mpg ~ disp + hp + wt + am * disp, data = mtcars)
> ols_best_subset(lm_fit3)
      Best Subsets Regression       
------------------------------------
Model Index    Predictors
------------------------------------
     1         wt                    
     2         hp wt                 
     3         hp wt am              
     4         hp wt am disp:am      
     5         disp hp wt am disp:am 
------------------------------------

                                                  Subsets Regression Summary                                                   
-------------------------------------------------------------------------------------------------------------------------------
                       Adj.        Pred                                                                                         
Model    R-Square    R-Square    R-Square     C(p)        AIC        SBIC        SBC        MSEP      FPE       HSP       APC  
-------------------------------------------------------------------------------------------------------------------------------
  1        0.7530      0.7450      0.7087    15.7814    166.0294    77.0143    170.4266    9.8972    9.8572    0.3199    0.2801 
  2        0.8270      0.8150      0.7811     4.6820    156.6523    70.3199    162.5153    7.4314    7.3563    0.2402    0.2090 
  3        0.8400      0.8230      0.7879     4.3607    156.1348    72.9956    163.4635    7.3780    7.2438    0.2385    0.2059 
  4        0.8530      0.8310      0.7871     4.0081    155.3638    76.1548    164.1582    7.2864    7.0804    0.2355    0.2012 
  5        0.8530      0.8250      0.7704     6.0000    157.3538    81.8096    167.6140    7.8669    7.5490    0.2543    0.2145 
-------------------------------------------------------------------------------------------------------------------------------
AIC: Akaike Information Criteria 
 SBIC: Sawa's Bayesian Information Criteria 
 SBC: Schwarz Bayesian Criteria 
 MSEP: Estimated error of prediction, assuming multivariate normality 
 FPE: Final Prediction Error 
 HSP: Hocking's Sp 
 APC: Amemiya Prediction Criteria 
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.