# Quantile Transformer

This is a component that transforms the features to follow a uniform or a normal distribution. Therefore, for a given feature, this transformation tends to spread out the most frequent values. It also reduces the impact of (marginal) outliers: this is therefore a robust preprocessing scheme. It makes use of an implementation from [Scikit-learn](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.QuantileTransformer.html). 
<br>
Scikit-learn is an open source machine learning library that supports supervised and unsupervised learning. It also provides various tools for model fitting, data preprocessing, model selection and evaluation, and many other utilities.

This notebook shows:
- how to use SDK to load the dataset and save a model.
- how to receive parameters from the platform.

In [None]:
dataset = "titanic" #@param {type:"string"}
out_dataset = "titanic-quantile-ransformer" #@param {type:"string"}
target = "Survived" #@param {type:"string"}
experiment_id = "f2ca4337-bb78-4c0c-845b-1e6c5bb7b832" #@param {type:"string"}
operator_id = "30c43969-9e4e-42b7-8d7c-a42516628145" #@param {type:"string"}
n_quantiles = 10 #@param {type:"number"}

## Load dataset

Import and put the whole dataset in a pandas.DataFrame.

In [None]:
from platiagro import load_dataset

df = load_dataset(name=dataset)
X = df.drop(target, axis=1).to_numpy()
y = df[target].to_numpy()

In [None]:
import numpy as np

columns = df.columns.to_numpy()
target_index = np.argwhere(columns == target)
columns = np.delete(columns, target_index)

## Load metadata about the dataset
For example, below we get the feature type for each column in the dataset. (eg. categorical, numerical, or datetime)

In [None]:
from platiagro import stat_dataset
from platiagro.featuretypes import infer_featuretypes

try:
    metadata = stat_dataset(name=dataset)
    featuretypes = metadata["featuretypes"]
except KeyError:
    featuretypes = infer_featuretypes(df)

featuretypes = np.array(featuretypes)
featuretypes = np.delete(featuretypes, target_index)

## Fit a model using sklearn.preprocessing.QuantileTransformer

In [None]:
import pandas as pd
from platiagro.featuretypes import NUMERICAL
from sklearn.preprocessing import QuantileTransformer

# selects the indexes of numerical features
numerical_indexes = (featuretypes == NUMERICAL)

estimator = QuantileTransformer(n_quantiles=n_quantiles)

if np.ma.any(numerical_indexes) > 0:
    X[:, numerical_indexes] = estimator.fit_transform(X[:, numerical_indexes])

    # Put data back in a pandas.DataFrame
    df = pd.DataFrame(data=X, columns=columns)
    df[target] = pd.Series(y)

## Save dataset

Stores the transformed dataset in a object storage.<br>

In [None]:
from platiagro import save_dataset

save_dataset(name=out_dataset, df=df)

## Save model

Stores the model artifacts in a object storage.<br>
It will make the model available for future deployments.

In [None]:
from platiagro import save_model

save_model(experiment_id=experiment_id,
           model={"estimator": estimator,
                  "columns": columns.tolist(),
                  "numerical_indexes": numerical_indexes})