A python implementation for missing value imputation using kNN.
Require Scikit-learn, Numpy and Pandas installed. Initialise:
from imputer import Imputer
impute = Imputer()
Default Usage (X
should be a pandas.dataframe, column is the name or index of the dataframe):
X_imputed = impute.knn(X = data, column = 'age')#default 10nn
Change Number of k:
X_imputed = impute.knn(X = data, column = 'age', k = 3)
Default impute for numerical features, for categorical feature imputation:
X_imputed = impute.knn(X = data, column = 'gender', k = 10, is_categorical = True)
Troyanskaya O, Cantor M, Sherlock G, et al. Missing value estimation methods for DNA microarrays[J]. Bioinformatics, 2001, 17(6): 520-525.