-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
transform xgboostImpute and rangerImpute into a generic function with methods for formula and data #73
Comments
@alexkowa looks good! I just made some slight changes to rangerImpute() in the case of factors. Now when a factor is imputed the imputed value is randomly drawn using the predicted probabilites from the model output. I would personally opt for version A. |
Good idea to sample. Yes, let's do that for XGBoost too if someone has time to implement it. |
Not sure if introducing braking changes is worth it although I totally agree that the dataset as the first argument makes more sense especially for usage with the native pipe from R 4.0. Making an S3 dispatch should allow this change without breaking old code. rangerImpute.formula <- function(x ...) {}
rangerImpute.data.frame <- function(x, ...) {} |
In the end of the day, it would be also good to compare both rangerImpute and xgboostImpute with missRanger and mixgb (althought both can be used in a chain to impute multivariate missingness), especially not only for precision measures (comparing imputed and original data values in a simulation) but also on coverage rates and root mean squared errors on estimators. I can do this when there is a bit time for it. It might give an idea about if imputation uncertainty and model uncertainty are treated well. One argument against almost all imputation methods in VIM that I hear often is that we only account for imputation uncertainty (draw from predictive distributions, one can also think about PMM and midastouch) but not for model uncertainty (e.g. with a bootstrap which would be very simple to implement (at least as an option)). I recently implemented PMM and midastouch in function imputeRobust (just committed).
What is missing is testing and code improvement (almost no checks implemented) - its currently a working solution and - of course - there is no time to do this since months :-( If somebody is interested...? So, one might use the PMM and midastouch from |
@GregorDeCillia I included a new function xgboostImpute very similar to your rangerImpute function. On first sight, it performs very well. The functions take formulas as first input.
To make it more pipe-friendly and aligned with other imputation functions, should we
A) simply change the order of parameters so the first input is the data set (possibly breaking code of some)
B) create new generic functions that make a method dispatch based on the first input
@GregorDeCillia and also @matthias-da @JohannesGuss what do you think?
The text was updated successfully, but these errors were encountered: