# Normalizer

This is a component that normalize samples individually to unit norm using an implementation from [Scikit-learn](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.Normalizer.html). 
<br>
Scikit-learn is an open source machine learning library that supports supervised and unsupervised learning. It also provides various tools for model fitting, data preprocessing, model selection and evaluation, and many other utilities.

This notebook shows:
- how to use the [SDK](https://platiagro.github.io/sdk/) to load datasets, save models and other artifacts.
- how to declare parameters and use them to build reusable components.

## Declare parameters
Components may declare (and use) these default parameters:
- dataset
- target
- experiment_id
- operator_id

Use these parameters to load/save datasets, models, metrics, and figures with the help of [PlatIAgro SDK](https://platiagro.github.io/sdk/).

You may also declare custom parameters to set when running an experiment.

In [None]:
dataset = "boston" #@param {type:"string"}
target = "medv" #@param {type:"string"}
experiment_id = "e6c07a4e-4902-43c3-b02f-efe8f051cb0c" #@param {type:"string"}
operator_id = "c0deb81a-540e-4d51-bf8f-c332f9b9fd73" #@param {type:"string"}

## Load dataset

Import and put the whole dataset in a pandas.DataFrame.

In [None]:
from platiagro import load_dataset

df = load_dataset(name=dataset)
X = df.drop(target, axis=1).to_numpy()
y = df[target].to_numpy()

## Load metadata about the dataset
For example, below we get the feature type for each column in the dataset. (eg. categorical, numerical, or datetime)

In [None]:
import numpy as np
from platiagro import stat_dataset
from platiagro.featuretypes import infer_featuretypes

try:
    metadata = stat_dataset(name=dataset)
    featuretypes = metadata["featuretypes"]
except KeyError:
    featuretypes = infer_featuretypes(df)

featuretypes = np.array(featuretypes)

## Replace NaN values
Remove features that all values are NA.<br>
If some values are missing, then use the mean for numerical features, and the mode for categorical features.

In [None]:
na_free = df.dropna(axis="columns", how="all")
only_na = df.loc[:, ~df.columns.isin(na_free.columns)]

featuretypes = featuretypes[df.columns.isin(na_free.columns)]
df = na_free

In [None]:
from platiagro.featuretypes import CATEGORICAL, NUMERICAL

numerical_indexes = (featuretypes == NUMERICAL)
numerical_nan_replacement = df.iloc[:, numerical_indexes].mean(axis=0)
df.fillna(numerical_nan_replacement, inplace=True)

categorical_indexes = (featuretypes == CATEGORICAL)
categorical_nan_replacement = df.iloc[:, categorical_indexes].mode(axis=0).iloc[0]
df.fillna(categorical_nan_replacement, inplace=True)

In [None]:
X = df.drop(target, axis=1).to_numpy()
columns = df.columns.to_numpy()
target_index = np.argwhere(columns == target)
columns = np.delete(columns, target_index)
featuretypes = np.delete(featuretypes, target_index)

## Fit a model using sklearn.preprocessing.Normalizer

In [None]:
from sklearn.preprocessing import Normalizer

numerical_indexes = (featuretypes == NUMERICAL)
estimator = Normalizer()

if np.ma.any(numerical_indexes):
    X[:, numerical_indexes] = estimator.fit_transform(X[:, numerical_indexes])

## Save dataset

Stores the transformed dataset in a object storage.<br>

In [None]:
from platiagro import save_dataset

save_dataset(name=dataset, df=df)

## Save model

Stores the model artifacts in a object storage.<br>
It will make the model available for future deployments.

In [None]:
from platiagro import save_model

save_model(experiment_id=experiment_id,
           model={"estimator": estimator,
                  "columns": columns,
                  "numerical_indexes": numerical_indexes,
                  "numerical_nan_replacement": numerical_nan_replacement,
                  "categorical_nan_replacement": categorical_nan_replacement})