# Robust Scaler

This is a component that scale features using statistics that are robust to outliers. This Scaler removes the median and scales the data according to the quantile range (defaults to IQR: Interquartile Range). The IQR is the range between the 1st quartile (25th quantile) and the 3rd quartile (75th quantile). It makes use of an implementation from [Scikit-learn](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.RobustScaler.html). 
<br>
Scikit-learn is an open source machine learning library that supports supervised and unsupervised learning. It also provides various tools for model fitting, data preprocessing, model selection and evaluation, and many other utilities.

This notebook shows:
- how to use the [SDK](https://platiagro.github.io/sdk/) to load datasets, save models and other artifacts.
- how to declare parameters and use them to build reusable components.

## Declare parameters
Components may declare (and use) these default parameters:
- dataset
- target

Use these parameters to load/save datasets, models, metrics, and figures with the help of [PlatIAgro SDK](https://platiagro.github.io/sdk/). <br />
You may also declare custom parameters to set when running an experiment.

Select the hyperparameters and their respective values to be used when training the model:
- with_centering
- with_scaling

These parameters are just a few offered by the model class, you may also use another existing parameter. <br />
Check the [model parametes](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.RobustScaler.html#sklearn.preprocessing.RobustScaler) for more information.

In [None]:
# parameters
dataset = "iris" #@param {type:"string"}
target = "Species" #@param {type:"string"}

# hyperparameters
with_centering = True #@param {type:"boolean", label:"Centralização", description:"Centralizar os dados antes de dimensionar. Ocorre exceção quando usado com matrizes esparsas"}
with_scaling = True #@param {type:"boolean", label:"Dimensionamento", description:"Dimensionar os dados para um intervalo interquartil"}

## Load dataset

Import and put the whole dataset in a pandas.DataFrame.

In [None]:
from platiagro import load_dataset

df = load_dataset(name=dataset)
X = df.drop(target, axis=1).to_numpy()
y = df[target].to_numpy()

## Load metadata about the dataset
For example, below we get the feature type for each column in the dataset. (eg. categorical, numerical, or datetime)

In [None]:
import numpy as np
from platiagro import stat_dataset

metadata = stat_dataset(name=dataset)
featuretypes = metadata["featuretypes"]

columns = df.columns.to_numpy()
featuretypes = np.array(featuretypes)
target_index = np.argwhere(columns == target)
columns = np.delete(columns, target_index)
featuretypes = np.delete(featuretypes, target_index)

## Fit a model using sklearn.preprocessing.RobustScaler

In [None]:
import pandas as pd
from platiagro.featuretypes import NUMERICAL
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import RobustScaler

# selects the indexes of numerical features
numerical_indexes = (featuretypes == NUMERICAL)

pipeline = Pipeline(steps=[
    ('estimator', RobustScaler(with_centering=with_centering,
                               with_scaling=with_scaling))
])

X[:, numerical_indexes] = pipeline.fit_transform(X[:, numerical_indexes])

# Put data back in a pandas.DataFrame
df = pd.DataFrame(data=X, columns=columns)
df[target] = pd.Series(y)

## Save dataset

Stores the transformed dataset in a object storage.<br>

In [None]:
from platiagro import save_dataset

save_dataset(name=dataset, df=df)

## Save model

Stores the model artifacts in a object storage.<br>
It will make the model available for future deployments.

In [None]:
from platiagro import save_model

save_model(pipeline=pipeline,
           columns=columns,
           numerical_indexes=numerical_indexes)