Missing data frequently complicate data analysis. A robust technique for addressing missing data is multiple imputation. In R
, multiple imputation is commonly implemented through the mice
package which utilizes the multiple imputation by chained equations (MICE) algorithm. It solves the missing data problem iteratively on a variable-by-variable basis and can yield unbiased and confidence valid inferences under many missing data conditions. However, such a standard choice is not yet established for Python
.
This repository contains code for a model-based simulation study that is used to evaluate different Python
imputation methods under different missingness mechanisms and proportions to whether they can produce valid inferences. The Python
imputation methods KNNImputer
, IterativeImputer
, miceforest
and MIDASpy
are considered.