# Despre

Se exemplifica cateva pachete Python 3rd party pentru missing value imputation.

Surse:
1. [fancyimpute](https://github.com/iskandr/fancyimpute)
1. [missingpy](https://github.com/epsilon-machine/missingpy)
1. [sklearn](https://scikit-learn.org/stable/modules/impute.html)

# Setup

In [1]:
import numpy as np
np.random.seed = 1

In [2]:
# pregateste date fara valori lipsa

X = np.random.rand(200, 5)
X_orig = X.copy()
assert not np.isnan(X_orig.sum())  # niciun nan



In [3]:
# producem artificial niste valori lipsa in setul de date

lines = np.random.choice(X.shape[0], 10)
cols = np.random.choice(X.shape[1], 10)
X[lines, cols] = np.nan
assert np.isnan(X.sum()) # cel putin un nan in X

# fancyimpute

In [4]:
# instalarea dureaza cateva minute, puteti rula comanda in conda prompt pentru a vedea progresul 

!pip install fancyimpute



ERROR: Could not install packages due to an OSError: [WinError 5] Access is denied: 'C:\\anaconda3\\envs\\ids\\lib\\site-packages\\~1mpy\\.libs\\libopenblas.GK7GX5KEQ4F6UYO3P26ULGBQYHGQO7J4.gfortran-win_amd64.dll'
Consider using the `--user` option or check the permissions.




Collecting numpy>=1.10
  Using cached numpy-1.19.5-cp38-cp38-win_amd64.whl (13.3 MB)
Installing collected packages: numpy
  Attempting uninstall: numpy
    Found existing installation: numpy 1.20.2
    Uninstalling numpy-1.20.2:
      Successfully uninstalled numpy-1.20.2


**Nota:** daca la importul de mai jos apare eroare de forma:
`RuntimeError: module compiled against API version 0xe but this version of numpy is 0xd`
atunci faceti upgrade de numpy, folosind:
```
pip install -U numpy
```

Este posibil ca instalarea lui fancyimpute sa faca downgrade la pachetul numpy, acesta trebuie restaurat.

Desigur, puteti rula comenzile de pip direct in conda prompt, in virtual environment dedicat. 

In [5]:
!pip install -U numpy

Collecting numpy
  Using cached numpy-1.20.2-cp38-cp38-win_amd64.whl (13.7 MB)
Installing collected packages: numpy
  Attempting uninstall: numpy
    Found existing installation: numpy 1.19.5
    Uninstalling numpy-1.19.5:
      Successfully uninstalled numpy-1.19.5
Successfully installed numpy-1.20.2


ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
tensorflow 2.4.1 requires numpy~=1.19.2, but you have numpy 1.20.2 which is incompatible.


In [6]:
from fancyimpute import KNN, NuclearNormMinimization, SoftImpute, BiScaler

In [7]:
X_filled_fancyimpute = KNN(k=3).fit_transform(X)
assert not np.isnan(X_filled_fancyimpute.sum())

Imputing row 1/200 with 0 missing, elapsed time: 0.035
Imputing row 101/200 with 0 missing, elapsed time: 0.040


In [8]:
np.linalg.norm(X_filled_fancyimpute-X_orig)

0.9584087274831389

# missingpy

In [9]:
!pip install missingpy



In [10]:
# trick from https://stackoverflow.com/questions/60145652/no-module-named-sklearn-neighbors-base

import sys
import sklearn.neighbors._base
sys.modules['sklearn.neighbors.base'] = sklearn.neighbors._base

In [11]:
from missingpy import MissForest

In [12]:
imputer = MissForest()

In [13]:
X_filled_missingpy = imputer.fit_transform(X)

Iteration: 0
Iteration: 1
Iteration: 2


In [14]:
assert not np.isnan(X_filled_missingpy.sum())  # nu exista nan in X_imputed

In [15]:
np.linalg.norm(X_filled_missingpy-X_orig)


0.8320616210802555

# sklearn

In [16]:
from sklearn.experimental import enable_iterative_imputer
from sklearn.impute import IterativeImputer

In [17]:
imp = IterativeImputer(max_iter=10, random_state=0)
imp.fit(X)

IterativeImputer(random_state=0)

In [19]:
X_filled_sklearn = imp.transform(X)
assert not np.isnan(X_filled_sklearn.sum())

In [21]:
np.linalg.norm(X_filled_sklearn-X_orig)

0.6716997300055806