# Robust Scaler

This is a component that scale features using statistics that are robust to outliers. This Scaler removes the median and scales the data according to the quantile range (defaults to IQR: Interquartile Range). The IQR is the range between the 1st quartile (25th quantile) and the 3rd quartile (75th quantile). It makes use of an implementation from [Scikit-learn](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.RobustScaler.html).
<br>
Scikit-learn is an open source machine learning library that supports supervised and unsupervised learning. It also provides various tools for model fitting, data preprocessing, model selection and evaluation, and many other utilities.

This notebook shows:
- how to use the [SDK](https://platiagro.github.io/sdk/) to load a model and other artifacts.
- how to use a model to provide real-time transformations.

In [None]:
%%writefile Model.py
import logging
from typing import List, Iterable, Dict, Union

import numpy as np
import pandas as pd
from platiagro import load_model

logger = logging.getLogger(__name__)


class Model(object):
    def __init__(self, dataset: str = None, target: str = None, experiment_id: str = None):
        logger.info(f"dataset: {dataset}")
        logger.info(f"target: {target}")
        logger.info(f"experiment_id: {experiment_id}")

        # Load Artifacts: Estimator, etc
        model = load_model(experiment_id=experiment_id)
        self.estimator = model["estimator"]
        self.columns = model["columns"]
        self.numerical_indexes = model["numerical_indexes"]

    def class_names(self):
        return self.columns

    def predict(self, X: np.ndarray, feature_names: Iterable[str], meta: Dict = None) -> Union[np.ndarray, List, str, bytes]:
        """Takes an array (numpy) X and feature_names and scales features.

        Args:
            X (numpy.array): Array-like with data.
            feature_names (iterable, optional): Array of feature names.
            meta (dict, optional): Dict of metadata.
        """
        # Put data in a pandas.DataFrame
        df = pd.DataFrame(X, columns=feature_names)

        # Put data back in a numpy.ndarray
        X = df[self.columns].to_numpy()

        # Perform Transformation
        if np.ma.any(self.numerical_indexes):
            X[:, self.numerical_indexes] = self.estimator.transform(X[:, self.numerical_indexes])

        return X

## Deployment Test

It simulates a model deployed by PlatIAgro

In [None]:
%%writefile env.sh
export MODEL_NAME="Model"
export API_TYPE="REST"
export SERVICE_TYPE="MODEL"
export PERSISTENCE=0
export LOG_LEVEL="DEBUG"
export PARAMETERS='[
{"type":"STRING","name":"dataset","value":"iris"},
{"type":"STRING","name":"target","value":"Species"},
{"type":"STRING","name":"experiment_id","value":"94c3e6b9-0420-4d48-a5df-2d31fc2ad3af"}]'

In [None]:
%%bash
source env.sh
seldon-core-microservice "$MODEL_NAME" "$API_TYPE" \
    --service-type "$SERVICE_TYPE" \
    --persistence "$PERSISTENCE" \
    --parameters "$PARAMETERS" \
    --log-level "$LOG_LEVEL" > log.txt 2>&1 &

ATTEMPT=0
until $(curl --output /dev/null --silent --head --fail http://localhost:5000/health/ping); do
    # exit process if not healthy after 10 seconds
    if [ "$ATTEMPT" -gt 10 ]; then
        cat log.txt
        exit 1
    fi
    ATTEMPT=$((ATTEMPT + 1))
    sleep 1
done
echo "Deployment successful. Waiting for requests."

## Make transformations

In [None]:
%%bash
curl -sSL localhost:5000/predict --data-binary @- << EOF
json={
    "data": {
        "names": ["SepalLengthCm","SepalWidthCm","PetalLengthCm","PetalWidthCm"],
        "ndarray": [
            [5.1,3.5,1.4,0.2]
        ]
    }
}
EOF

## View logs

In [None]:
!cat log.txt

## Clean up the test

In [None]:
!ps -ef | grep [s]eldon-core-microservice | awk '{print $2}' | xargs -r kill