# HyperOpt-Sklearn for Classification
In this notebook, we will use HyperOpt-Sklearn to discover a model for the sonar dataset.

In [1]:
#import libraries
from pandas import read_csv
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from hpsklearn import HyperoptEstimator
from hpsklearn import any_classifier
from hpsklearn import any_preprocessing
from hyperopt import tpe

WARN: OMP_NUM_THREADS=None =>
... If you are using openblas if you are using openblas set OMP_NUM_THREADS=1 or risk subprocess calls hanging indefinitely


In [2]:
# load dataset
url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/sonar.csv'
dataframe = read_csv(url, header=None)
# split into input and output elements
data = dataframe.values
X, y = data[:, :-1], data[:, -1]
print(X.shape, y.shape)

(208, 60) (208,)


In [3]:
...
# minimally prepare dataset
X = X.astype('float32')
y = LabelEncoder().fit_transform(y.astype('str'))
# split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=1)

In [4]:
...
# define search
model = HyperoptEstimator(classifier=any_classifier('cla'), preprocessing=any_preprocessing('pre'), algo=tpe.suggest, max_evals=50, trial_timeout=30)

In [5]:
...
# perform the search
model.fit(X_train, y_train)

100%|██████████| 1/1 [00:00<00:00,  2.17trial/s, best loss: 0.1785714285714286]
100%|██████████| 2/2 [00:00<00:00, 33.56trial/s, best loss: 0.1785714285714286]
100%|██████████| 3/3 [00:00<00:00, 17.62trial/s, best loss: 0.0714285714285714]
100%|██████████| 4/4 [00:00<00:00, 62.39trial/s, best loss: 0.0714285714285714]
100%|██████████| 5/5 [00:04<00:00,  1.02trial/s, best loss: 0.0714285714285714]
100%|██████████| 6/6 [00:00<00:00,  9.84trial/s, best loss: 0.0714285714285714]
100%|██████████| 7/7 [00:00<00:00, 47.49trial/s, best loss: 0.0714285714285714]
 88%|████████▊ | 7/8 [00:00<?, ?trial/s, best loss=?]




100%|██████████| 8/8 [00:00<00:00,  9.00trial/s, best loss: 0.0714285714285714]
100%|██████████| 9/9 [00:00<00:00, 29.57trial/s, best loss: 0.0714285714285714]
100%|██████████| 10/10 [00:00<00:00, 96.02trial/s, best loss: 0.0714285714285714]
 91%|█████████ | 10/11 [00:00<?, ?trial/s, best loss=?]




100%|██████████| 11/11 [00:00<00:00, 42.79trial/s, best loss: 0.0714285714285714]
100%|██████████| 12/12 [00:00<00:00, 214.73trial/s, best loss: 0.0714285714285714]
100%|██████████| 13/13 [00:05<00:00,  2.45trial/s, best loss: 0.0714285714285714]
100%|██████████| 14/14 [00:00<00:00, 79.10trial/s, best loss: 0.0714285714285714]
100%|██████████| 15/15 [00:00<00:00, 84.94trial/s, best loss: 0.0714285714285714]
100%|██████████| 16/16 [00:00<00:00, 86.36trial/s, best loss: 0.0714285714285714]
100%|██████████| 17/17 [00:00<00:00, 97.97trial/s, best loss: 0.0714285714285714]
100%|██████████| 18/18 [00:00<00:00, 21.58trial/s, best loss: 0.0714285714285714]
100%|██████████| 19/19 [00:00<00:00, 64.04trial/s, best loss: 0.0714285714285714]
100%|██████████| 20/20 [00:00<00:00, 150.34trial/s, best loss: 0.0714285714285714]
100%|██████████| 21/21 [00:03<00:00,  6.30trial/s, best loss: 0.0714285714285714]
100%|██████████| 22/22 [00:01<00:00, 17.62trial/s, best loss: 0.0357142857142857]
100%|█████████




100%|██████████| 32/32 [00:00<00:00, 142.40trial/s, best loss: 0.0357142857142857]
100%|██████████| 33/33 [00:02<00:00, 11.97trial/s, best loss: 0.0357142857142857]
100%|██████████| 34/34 [00:00<00:00, 85.47trial/s, best loss: 0.0357142857142857]
100%|██████████| 35/35 [00:04<00:00,  7.23trial/s, best loss: 0.0357142857142857]
100%|██████████| 36/36 [00:00<00:00, 174.55trial/s, best loss: 0.0357142857142857]
100%|██████████| 37/37 [00:00<00:00, 177.39trial/s, best loss: 0.0357142857142857]
100%|██████████| 38/38 [00:00<00:00, 56.73trial/s, best loss: 0.0357142857142857]
100%|██████████| 39/39 [00:00<00:00, 200.09trial/s, best loss: 0.0357142857142857]
 98%|█████████▊| 39/40 [00:00<?, ?trial/s, best loss=?]




100%|██████████| 40/40 [00:00<00:00, 64.25trial/s, best loss: 0.0357142857142857]
100%|██████████| 41/41 [00:00<00:00, 62.10trial/s, best loss: 0.0357142857142857]
100%|██████████| 42/42 [00:01<00:00, 24.59trial/s, best loss: 0.0357142857142857]
100%|██████████| 43/43 [00:00<00:00, 171.84trial/s, best loss: 0.0357142857142857]
100%|██████████| 44/44 [00:00<00:00, 160.37trial/s, best loss: 0.0357142857142857]
100%|██████████| 45/45 [00:01<00:00, 32.48trial/s, best loss: 0.0357142857142857]
 98%|█████████▊| 45/46 [00:00<?, ?trial/s, best loss=?]




100%|██████████| 46/46 [00:00<00:00, 151.86trial/s, best loss: 0.0357142857142857]
100%|██████████| 47/47 [00:00<00:00, 150.15trial/s, best loss: 0.0357142857142857]
100%|██████████| 48/48 [00:00<00:00, 227.08trial/s, best loss: 0.0357142857142857]
100%|██████████| 49/49 [00:00<00:00, 105.60trial/s, best loss: 0.0357142857142857]
100%|██████████| 50/50 [00:00<00:00, 176.07trial/s, best loss: 0.0357142857142857]


In [6]:
...
# summarize performance
acc = model.score(X_test, y_test)
print("Accuracy: %.3f" % acc)
# summarize the best model
print(model.best_model())

Accuracy: 0.855
{'learner': ExtraTreesClassifier(bootstrap=False, ccp_alpha=0.0, class_weight=None,
                     criterion='entropy', max_depth=None, max_features='log2',
                     max_leaf_nodes=None, max_samples=None,
                     min_impurity_decrease=0.0, min_impurity_split=None,
                     min_samples_leaf=1, min_samples_split=2,
                     min_weight_fraction_leaf=0.0, n_estimators=579, n_jobs=1,
                     oob_score=False, random_state=2, verbose=False,
                     warm_start=False), 'preprocs': (), 'ex_preprocs': ()}


read more in this aricle https://machinelearningmastery.com/hyperopt-for-automated-machine-learning-with-scikit-learn/