# Introduction
<hr style="border:2px solid black"> </hr>

<div class="alert alert-warning">
<font color=black>

**What?** HyperOpt with Scikit-Learn

</font>
</div>

# Import modules
<hr style="border:2px solid black"> </hr>

In [None]:
from pandas import read_csv
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from hpsklearn import HyperoptEstimator
from hpsklearn import any_classifier
from hpsklearn import any_preprocessing
from hpsklearn import any_regressor
from hyperopt import tpe
from sklearn.metrics import mean_absolute_error

# HyperOpt
<hr style="border:2px solid black"> </hr>

<div class="alert alert-info">
<font color=black>

- **HyperOpt** is an open-source Python library based on Bayesian optimisation. HyperOpt is *challenging* to use directly, requiring the optimization procedure and search space to be carefully specified.

- An extension to HyperOpt was created called **HyperOpt-Sklearn** that allows the HyperOpt procedure to be applied to data preparation and machine learning models provided by the popular Scikit-Learn open-source machine learning library.
- In conclusion, Hyperopt-Sklearn is an open-source library for AutoML with scikit-learn data preparation and machine learning models.

</font>
</div>

# Classification
<hr style="border:2px solid black"> </hr>

In [None]:
# load dataset
url = '../DATASETS/sonar.csv'
dataframe = read_csv(url, header=None)
# split into input and output elements
data = dataframe.values
X, y = data[:, :-1], data[:, -1]

# minimally prepare dataset
X = X.astype('float32')
y = LabelEncoder().fit_transform(y.astype('str'))

# split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.33, random_state=1)

# define search
model = HyperoptEstimator(classifier=any_classifier('cla'), preprocessing=any_preprocessing(
    'pre'), algo=tpe.suggest, max_evals=50, trial_timeout=30)

# perform the search
model.fit(X_train, y_train)

# summarize performance
acc = model.score(X_test, y_test)
print("Accuracy: %.3f" % acc)

# summarize the best model
print(model.best_model())

# Regression
<hr style="border:2px solid black"> </hr>

In [None]:
# load dataset
url = '../DATASETS/housing_1.csv'
dataframe = read_csv(url, header=None)

# split into input and output elements
data = dataframe.values
data = data.astype('float32')
X, y = data[:, :-1], data[:, -1]

# split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=1)

# define search

model = HyperoptEstimator(regressor=any_regressor('reg'), preprocessing=any_preprocessing('pre'), loss_fn=mean_absolute_error, algo=tpe.suggest, max_evals=50, trial_timeout=30)

# perform the search
model.fit(X_train, y_train)

# summarize performance
mae = model.score(X_test, y_test)
print("MAE: %.3f" % mae)

# summarize the best model
print(model.best_model())

# References
<hr style="border:2px solid black"> </hr>

<div class="alert alert-warning">
<font color=black>

- https://hyperopt.github.io/hyperopt-sklearn/
- https://machinelearningmastery.com/hyperopt-for-automated-machine-learning-with-scikit-learn/
- https://conference.scipy.org/proceedings/scipy2014/pdfs/komer.pdf
    
</font>
</div>