Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extracting model data #159

Closed
aravindhebbali opened this issue Apr 28, 2020 · 9 comments
Closed

Extracting model data #159

aravindhebbali opened this issue Apr 28, 2020 · 9 comments
Assignees
Labels

Comments

@aravindhebbali
Copy link
Member

aravindhebbali commented Apr 28, 2020

Use model$model to extract underlying data instead of eval(model$call$data)

@aravindhebbali aravindhebbali self-assigned this Apr 28, 2020
@aravindhebbali aravindhebbali changed the title Calling variable selection procedures inside functions Extracting model data May 2, 2020
@jens-daniel-mueller
Copy link

Running ols_step_all_possible() returns a data.frame with following columns:

"mindex" "n" "predictors" "rsquare" "adjr" "predrsq" "cp" "aic" "sbic" "sbc" "msep" "fpe" "apc" "hsp"

This does not match the output described in the manual, which lists:

n model number
predictors predictors in the model
rsquare rsquare of the model
adjr adjusted rsquare of the model
predrsq predicted rsquare of the model
cp mallow's Cp
aic akaike information criteria
sbic sawa bayesian information criteria
sbc schwarz bayes information criteria
gmsep estimated MSE of prediction, assuming multivariate normality
jp final prediction error
pc amemiya prediction criteria
sp hocking's Sp

Can this be corrected?
Is it currently possible to extract the root mean square error (RMSE) for each model?

Thanks! Jens

@aravindhebbali
Copy link
Member Author

Yes.. there is a mismatch between the column names and what is mentioned in the documentation. We will fix this. RMSE is not returned by ols_step_all_possible() in the current version but we can add it in the output.

@jens-daniel-mueller
Copy link

Cool, thanks! Just to give me an idea: Can you estimate how long it might take to add the RMSE output? It would be very helpful for me...

@aravindhebbali
Copy link
Member Author

I have pushed the changes. Install the develop branch and you are good to go 😃

library(olsrr)
#> 
#> Attaching package: 'olsrr'
#> The following object is masked from 'package:datasets':
#> 
#>     rivers
model <- lm(mpg ~ disp + hp, data = mtcars);
k <- ols_step_all_possible(model);
k$result$rmse
#> [1] 3.251454 3.862962 3.126601

Created on 2020-07-27 by the reprex package (v0.3.0)

@jens-daniel-mueller
Copy link

Perfect, the solution seems to work and it's a big help for me. Thank you for implementing it so quickly!
The only thing I noticed is that now

names(lm_all)
[1] "result"

whereas before, the same command allowed to access the output names directly. I prefered the previous output structure, but I guess there is a reason for the change.

@jens-daniel-mueller
Copy link

Sorry, have to come back to this.
After testing the rmse output of the ols_step_all_possible function against three ways to calculate it manually (see reprex below), it seems rather the residual standard deviation is calculated, and not the root mean squared error.
Can you fix this again? Thanks a lot!

library(olsrr)
#> 
#> Attaching package: 'olsrr'
#> The following object is masked from 'package:datasets':
#> 
#>     rivers

# reproducing example from
# https://olsrr.rsquaredacademy.com/articles/variable_selection.html

model <- lm(mpg ~ disp + hp + wt + qsec, data = mtcars)
k <- ols_step_all_possible(model)

# Apply three approaches to calculate rmse as found on:
# https://stackoverflow.com/questions/43123462/how-to-obtain-rmse-out-of-lm-result

sqrt(sum((model$residuals)^2)/length(model$residuals))
#> [1] 2.408548
sqrt(mean(model$residuals^2))
#> [1] 2.408548

RSS <- c(crossprod(model$residuals))
MSE <- RSS / length(model$residuals)
RMSE <- sqrt(MSE)
RMSE
#> [1] 2.408548

# extract rmse from olsrr output

k$result[k$result$predictors == "disp hp wt qsec",]$rmse
#> [1] 2.622095

# calculate the residual standard deviation

sigma(model)
#> [1] 2.622095

Created on 2020-07-28 by the reprex package (v0.3.0)

@aravindhebbali
Copy link
Member Author

Let me look into this and get back to you.

@aravindhebbali
Copy link
Member Author

library(olsrr)
#> 
#> Attaching package: 'olsrr'
#> The following object is masked from 'package:datasets':
#> 
#>     rivers
model <- lm(mpg ~ disp + hp + wt + qsec, data = mtcars)
k <- ols_step_all_possible(model);
k$result$rmse
#>  [1] 2.949163 3.148207 3.740297 5.387066 2.468854 2.471485 2.776478 2.976436
#>  [9] 3.130180 3.574623 2.411297 2.468493 2.471479 2.941023 2.408548

Created on 2020-07-28 by the reprex package (v0.3.0)

@aravindhebbali
Copy link
Member Author

Perfect, the solution seems to work and it's a big help for me. Thank you for implementing it so quickly!
The only thing I noticed is that now

names(lm_all)
[1] "result"

whereas before, the same command allowed to access the output names directly. I prefered the previous output structure, but I guess there is a reason for the change.

We are doing a complete review and redesign of olsrr API to make it as user friendly as possible. We will take into account your feedback regarding the output names in this case 😃

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants