# Imputation of missing values

This is a component for imputing missing values using mean, median or most frequent. It makes use of an implementation from [Scikit-learn](https://scikit-learn.org/stable/modules/generated/sklearn.impute.SimpleImputer.html#sklearn.impute.SimpleImputer). 
<br>
Scikit-learn is an open source machine learning library that supports supervised and unsupervised learning. It also provides various tools for model fitting, data preprocessing, model selection and evaluation, and many other utilities.

This notebook shows:
- how to use SDK to load the dataset and save a model.
- how to receive parameters from the platform.

In [None]:
dataset = "titanic" #@param {type:"string"}
out_dataset = "titanic-imputed" #@param {type:"string"}
target = "Survived" #@param {type:"string"}
experiment_id = "b48b03b1-5581-4e1d-a429-f31a34f78e1c" #@param {type:"string"}
operator_id = "76f521c2-ff17-4850-944d-09188eeb27c1" #@param {type:"string"}
strategy = "most_frequent" #@param {type:"string"}

## Load dataset

Import and put the whole dataset in a pandas.DataFrame.

In [None]:
from platiagro import load_dataset

df = load_dataset(name=dataset)
X = df.drop(target, axis=1).to_numpy()
y = df[target].to_numpy()

In [None]:
columns = df.columns.tolist()
columns.remove(target)

## Fit a model using sklearn.impute.SimpleImputer

In [None]:
import numpy as np
import pandas as pd
from sklearn.impute import SimpleImputer

# Replace None with np.nan
df.fillna(value=pd.np.nan, inplace=True)

# Put data back in a numpy.ndarray
X = df.drop(target, axis=1).to_numpy()

estimator = SimpleImputer(missing_values=np.nan, strategy=strategy)

X = estimator.fit_transform(X)
df = pd.DataFrame(data=X, columns=columns)
df[target] = pd.Series(y)

## Save dataset

Stores the transformed dataset in a object storage.<br>

In [None]:
from platiagro import save_dataset

save_dataset(name=out_dataset, df=df)

## Save model

Stores the model artifacts in a object storage.<br>
It will make the model available for future deployments.

In [None]:
from platiagro import save_model

save_model(experiment_id=experiment_id,
           model={"estimator": estimator,
                  "columns": columns})