# Hyperopt-sklearn

Following Jason Brownlee's example:
https://machinelearningmastery.com/hyperopt-for-automated-machine-learning-with-scikit-learn/

See More documentation at 
https://hyperopt.github.io/hyperopt-sklearn/

In [1]:
# example of hyperopt-sklearn for the sonar classification dataset
from pandas import read_csv
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from hpsklearn import HyperoptEstimator
from hpsklearn import any_classifier
from hpsklearn import any_preprocessing
from hyperopt import tpe

WARN: OMP_NUM_THREADS=None =>
... If you are using openblas if you are using openblas set OMP_NUM_THREADS=1 or risk subprocess calls hanging indefinitely


Using "sonar" data set. Description at 
https://raw.githubusercontent.com/jbrownlee/Datasets/master/sonar.names

In [2]:
# load dataset
url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/sonar.csv'
dataframe = read_csv(url, header=None)

In [3]:
# split into input and output elements
data = dataframe.values
X, y = data[:, :-1], data[:, -1]
# minimally prepare dataset
X = X.astype('float32')
y = LabelEncoder().fit_transform(y.astype('str'))

In [4]:
# split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=1)

In [5]:
# define search
# see more options at See More documentation at 
#https://hyperopt.github.io/hyperopt-sklearn/

model = HyperoptEstimator(
    classifier=any_classifier('cla')
    , preprocessing=any_preprocessing('pre')
    , algo=tpe.suggest
    , max_evals=50
    , trial_timeout=30
) 

## Classifiers
- SVC
- LinearSVCKNeightborsClassifier
- RandomForestClassifier
- ExtraTreesClassifier
- SGDClassifier
- MultinomialNB
- BernoulliRBM
- ColumnKMeans

## Preprocessing
- PCA
- TfidfVectorizer
- StandardScalar
- MinMaxScalar
- Normalizer
- OneHotEncoder

In [6]:
# perform the search
model.fit(X_train, y_train)

100%|██████████████████████████████████████████████████| 1/1 [00:01<00:00,  1.17s/trial, best loss: 0.1785714285714286]
100%|██████████████████████████████████████████████████| 2/2 [00:01<00:00,  1.40s/trial, best loss: 0.0357142857142857]
100%|██████████████████████████████████████████████████| 3/3 [00:01<00:00,  1.15s/trial, best loss: 0.0357142857142857]
100%|██████████████████████████████████████████████████| 4/4 [00:01<00:00,  1.15s/trial, best loss: 0.0357142857142857]
100%|██████████████████████████████████████████████████| 5/5 [00:01<00:00,  1.19s/trial, best loss: 0.0357142857142857]
100%|██████████████████████████████████████████████████| 6/6 [00:01<00:00,  1.15s/trial, best loss: 0.0357142857142857]
100%|██████████████████████████████████████████████████| 7/7 [00:01<00:00,  1.15s/trial, best loss: 0.0357142857142857]
100%|██████████████████████████████████████████████████| 8/8 [00:01<00:00,  1.63s/trial, best loss: 0.0357142857142857]
100%|███████████████████████████████████

In [7]:
# summarize performance
acc = model.score(X_test, y_test)
print("Accuracy: %.3f" % acc)

Accuracy: 0.797


In [8]:
# summarize the best model
print(model.best_model())

{'learner': RandomForestClassifier(bootstrap=False, max_features='sqrt', n_estimators=307,
                       n_jobs=1, random_state=0, verbose=False), 'preprocs': (StandardScaler(with_std=False),), 'ex_preprocs': ()}


## Specify loss function
Default is accuracy for classifier and R-squared for regression: https://github.com/hyperopt/hyperopt-sklearn/blob/master/hpsklearn/estimator.py

In [None]:
from sklearn.metrics import mean_absolute_error

In [None]:
model = HyperoptEstimator(
    classifier=any_classifier('cla')
    , preprocessing=any_preprocessing('pre')
    , algo=tpe.suggest
    , max_evals=50
    , trial_timeout=30
    , loss_fn=mean_absolute_error
) 

In [None]:
# perform the search
model.fit(X_train, y_train)