simputation
An R package to make imputation simple. Currently supported methods include
- Model based (optionally add [non-]parametric random residual)
- linear regression
- robust linear regression (M-estimation)
- ridge/elasticnet/lasso regression (from version >= 0.2.1)
- CART models
- Random forest
- Model based, multivariate
- Imputation based on EM-estimated parameters (from version >= 0.2.1)
- missForest (from version >= 0.2.1)
- Donor imputation (including various donor pool specifications)
- k-nearest neigbour (based on gower's distance)
- sequential hotdeck (LOCF, NOCB)
- random hotdeck
- Predictive mean matching
- Other
- (groupwise) median imputation (optional random residual)
- Proxy imputation (copy from other variable)
Installation
To install simputation and all packages needed to support various imputation models do the following.
install.packages("simputation", dependencies=TRUE)Example usage
Create some data suffering from missings
library(simputation) # current package
library(magrittr) # for the %>% not-a-pipe operator
dat <- iris
# empty a few fields
dat[1:3,1] <- dat[3:7,2] <- dat[8:10,5] <- NA
head(dat,10)Now impute Sepal.Length and Sepal.Width by regression on Petal.Length and Species, and impute Species using a CART model, that uses all other variables (including the imputed variables in this case).
dat %>%
impute_lm(Sepal.Length + Sepal.Width ~ Petal.Length + Species) %>%
impute_cart(Species ~ .) %>% # use all variables except 'Species' as predictor
head(10)Materials
Beta versions can be installed from my drat repo. If you use the OS whose name shall not be spoken, first install Rtools.
if(!require(drat)) install.packages("drat")
drat::addRepo("markvanderloo")
install.packages("simputation",type="source")