In [None]:
using Lale

In [None]:
using Random
using Statistics
using Test
using DataFrames: DataFrame, nrow

In [None]:
iris = getiris();
trx,tstx = holdout(nrow(iris),0.30)
training = iris[trx,:]
testing = iris[tstx,:];

In [None]:
clf_tr_X = training[:,1:4] |> DataFrame
clf_tr_y = training[:,5]   |> Vector
clf_tst_X = testing[:,1:4] |> DataFrame
clf_tst_y = testing[:,5] |> Vector;

## AutoML for classifier pipeline

This example uses Lale for combined algorithm selection and hyperparameter tuning
on a classifier pipeline.

The first step to create a pipeline is to instantiate the operators. `laleoperator` takes the name of the operator and an optional package argument. The default package is sklearn, which means by default it will try to instantiate operators from sklearn.

In [None]:
PCA = laleoperator("PCA")
RobustScaler = laleoperator("RobustScaler")
ConcatFeatures = laleoperator("ConcatFeatures", "lale")
LogisticRegression = laleoperator("LogisticRegression")
RandomForestClassifier = laleoperator("RandomForestClassifier");

The next step is to compose a pipeline using the operators and combinators defined in Lale. The table below summarizes the available pipeline combinators which can be used to define the pipeline directed acyclic graph:

| Symbol | Name | Description  | Sklearn feature |
| ------ | ---- | ------------ | --------------- |
| >>     | pipe | Feed to next | `make_pipeline` |
| &      | and  | Run both     | `make_union`, includes concat |
| &#x7c; | or   | Choose one   | (missing) |


In [None]:
clf_planned = (PCA & RobustScaler) >> ConcatFeatures >> (LogisticRegression | RandomForestClassifier);

`LalePipeOptimizer` takes the pipeline graph from above and we can provide a budget of the number of optimizer iterations along with other parameters such as cross validation number of folds. It internally uses hyperopt for performing the algorithm selection and hyperparameter tuning.
Lale follows sklearn API, so `fit` is for trainind and `predict` for obtaining the predictions.

In [None]:
clf_hopt = LalePipeOptimizer(clf_planned, max_evals=10, cv=3)
clf_trained = fit(clf_hopt, clf_tr_X, clf_tr_y);

In [None]:
clf_pred = predict(clf_trained, clf_tst_X)
clf_accu = score(:accuracy, clf_pred, clf_tst_y)

## AutoML for regressor pipeline

This example uses Lale for combined algorithm selection and hyperparameter tuning
on a regressor pipeline.

In [None]:
reg_tr_X = training[:,1:3] |> DataFrame
reg_tr_y = training[:,4]   |> Vector
reg_tst_X = testing[:,1:3] |> DataFrame
reg_tst_y = testing[:,4]   |> Vector;

In [None]:
PCA = laleoperator("PCA")
NoOp = laleoperator("NoOp", "lale")
LinearRegression = laleoperator("LinearRegression")
RandomForestRegressor = laleoperator("RandomForestRegressor");

In [None]:
reg_planned = (PCA | NoOp) >> (LinearRegression | RandomForestRegressor);

In [None]:
reg_hopt = LalePipeOptimizer(reg_planned, max_evals=10, cv=3)
reg_trained = fit(reg_hopt, reg_tr_X, reg_tr_y);

In [None]:
reg_pred = predict(reg_trained, reg_tst_X)
reg_rmse = score(:rmse, reg_pred, reg_tst_y)

In [None]:
using Distributed

In [None]:
nprocs() == 1 && addprocs(; exeflags="--project")
@everywhere using Lale
@everywhere using Statistics
@everywhere using Random: seed!
@everywhere using DataFrames
@everywhere using DataFrames: DataFrame, nrow

In [None]:
workers()

In [None]:
trials=10
results = @distributed (vcat) for i in 1:trials
    clf_planned = (PCA & RobustScaler) >> ConcatFeatures >> (LogisticRegression | RandomForestClassifier)
    clf_hopt = LalePipeOptimizer(clf_planned, max_evals=5, cv=3)
    clf_trained = fit(clf_hopt, clf_tr_X, clf_tr_y)
    clf_pred = predict(clf_trained, clf_tst_X)
    clf_accu = score(:accuracy, clf_pred, clf_tst_y)
    println(clf_accu)
    clf_accu
end

In [None]:
results |> mean