mice package implements a method to deal with missing data. The package creates multiple imputations (replacement values) for multivariate missing data. The method is based on Fully Conditional Specification, where each incomplete variable is imputed by a separate model. The
MICE algorithm can impute mixes of continuous, binary, unordered categorical and ordered categorical data. In addition, MICE can impute continuous two-level data, and maintain consistency between imputations by means of passive imputation. Many diagnostic plots are implemented to inspect the quality of the imputations.
mice package can be installed from CRAN as follows:
The latest version is can be installed from GitHub as follows:
install.packages("devtools") devtools::install_github(repo = "stefvanbuuren/mice")
library(mice, warn.conflicts = FALSE) #> Loading required package: lattice # show the missing data pattern md.pattern(nhanes)
#> age hyp bmi chl #> 13 1 1 1 1 0 #> 3 1 1 1 0 1 #> 1 1 1 0 1 1 #> 1 1 0 0 1 2 #> 7 1 0 0 0 3 #> 0 8 9 10 27
The table and the graph summarize where the missing data occur in the
# multiple impute the missing values imp <- mice(nhanes, maxit = 2, m = 2, seed = 1) #> #> iter imp variable #> 1 1 bmi hyp chl #> 1 2 bmi hyp chl #> 2 1 bmi hyp chl #> 2 2 bmi hyp chl # inspect quality of imputations stripplot(imp, chl, pch = 19, xlab = "Imputation number")
In general, we would like the imputations to be plausible, i.e., values that could have been observed if they had not been missing.
# fit complete-data model fit <- with(imp, lm(chl ~ age + bmi)) # pool and summarize the results summary(pool(fit)) #> estimate std.error statistic df p.value #> (Intercept) -54.50 54.75 -0.995 14.33 0.331861 #> age 33.70 12.09 2.788 2.29 0.011629 #> bmi 6.86 1.65 4.170 19.26 0.000506
The complete-data is fit to each imputed dataset, and the results are combined to arrive at estimates that properly account for the missing data.
Version 3.0 represents a major update that implements the following features:
blocks: The main algorithm iterates over blocks. A block is simply a collection of variables. In the common MICE algorithm each block was equivalent to one variable, which - of course - is the default; The
blocksargument allows mixing univariate imputation method multivariate imputation methods. The
blocksfeature bridges two seemingly disparate approaches, joint modeling and fully conditional specification, into one framework;
whereargument is a logical matrix of the same size of
datathat specifies which cells should be imputed. This opens up some new analytic possibilities;
Multivariate tests: There are new functions
anova()that perform multivariate parameter tests on the repeated analysis from on multiply-imputed data;
formulas: The old
formargument has been redesign and is now renamed to
formulas. This provides an alternative way to specify imputation models that exploits the full power of R's native formula's.
Better integration with the
tidyverseframework, especially for packages
Improved numerical algorithms for low-level imputation function. Better handling of duplicate variables.
Last but not least: A brand new edition AND online version of Flexible Imputation of Missing Data. Second Edition.
See MICE: Multivariate Imputation by Chained Equations for more resources.
I'll be happy to take feedback and discuss suggestions. Please submit these through Github's issues facility.
- Van Buuren, S. (2018). Flexible Imputation of Missing Data. Second Edition.. Chapman & Hall/CRC. Boca Raton, FL.
- Ad hoc methods and the MICE algorithm
- Convergence and pooling
- Inspecting how the observed data and missingness are related
- Passive imputation and post-processing
- Imputing multilevel data
- Sensitivity analysis with
- Generate missing values with