# AutoML Regressor

This is a component that trains an AutoML Regressor model using [sklearn](https://github.com/automl/auto-sklearn). 
<br>
auto-sklearn is an automated machine learning toolkit and a drop-in replacement for a scikit-learn estimator.

This notebook shows:
- how to use SDK to load the dataset and save a model.
- how to receive parameters from the platform.

In [None]:
dataset = "boston" #@param {type:"string"}
target = "medv" #@param {type:"string"}
experiment_id = "6db0fff7-ba9d-4f64-8cbe-9a31bd8b3644" #@param {type:"string"}
duration = 60 #@param {type:"integer"}

## Load dataset

Import and put the whole dataset in a pandas.DataFrame.

In [None]:
from platiagro import load_dataset

df = load_dataset(name=dataset)
X = df.drop(target, axis=1).to_numpy()
y = df[target].to_numpy()

## Load metadata
In this context, metadata means information about the dataset.<br>
For example, below we get the feature type for each column in the dataset. (eg. categorical, numerical, or datetime)

In [None]:
from platiagro import load_metadata
from platiagro.featuretypes import infer_featuretypes

try:
    metadata = load_metadata(name=dataset)
    featuretypes = metadata["featuretypes"]
except KeyError:
    featuretypes = infer_featuretypes(df)

## Encode categorical features

Many machine learning algorithms cannot operate on categorical data directly. They require all input variables and output variables to be numeric.<br>
This means that categorical data must be converted to a numerical form.<br>
The features are converted to ordinal integers. This results in a single column of integers (0 to n_categories - 1) per feature.

In [None]:
from platiagro.featuretypes import CATEGORICAL
from sklearn.preprocessing import OrdinalEncoder

target_idx = df.columns.tolist().index(target)
# selects the categorical features
categorical_idxs = [idx for idx, ft in enumerate(featuretypes) if ft == CATEGORICAL and idx != target_idx]
feature_encoder = OrdinalEncoder()

if len(categorical_idxs) > 0:
    X[:, categorical_idxs] = feature_encoder.fit_transform(X[:, categorical_idxs])

## Split dataset into train/test splits

Training Dataset: the sample of data used to fit the model.

Test Dataset: the sample of data used to provide an unbiased evaluation of a model fit on the training dataset.

In [None]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y,  train_size=0.7)

## Fit a model using autosklearn.regression.AutoSklearnRegressor

In [None]:
from autosklearn.regression import AutoSklearnRegressor

# removes target column from 'featuretypes'
index = df.columns.tolist().index(target)
featuretypes.pop(index)

estimator = AutoSklearnRegressor(
    time_left_for_this_task=duration,
    per_run_time_limit=duration,
)
estimator.fit(X_train, y_train, feat_type=featuretypes)
estimator.refit(X_train, y_train)

## Save model 

In [None]:
from platiagro import save_model

save_model(experiment_id=experiment_id, model={"estimator": estimator, "feature_encoder": feature_encoder})