# Scikit-learn hyperparameter search wrapper

Iaroslav Shcherbatyi, June 2017.

## Introduction

This example assumes basic familiarity with scikit - learn. 

Parameter search is important to get best performance with ML models. A standard approach using scikit-learn: `GridSearchCV`, which is exhausive enumeration, non - trivial to scale. A more scalable search approach, but does not take advantage of structure of model search: `RandomizedSearchCV`.

Scikit-optimize provides a scalable drop in replacement for `GridSearchCV`, which utilizes BayesianOptimization in order to arrive at good solutions as soon as possible.


## Minimal example
 
A minimalistic example of optimizing weights of SVC is given below.


In [16]:
from skopt.space import Real, Categorical, Integer
from skopt.wrappers import SkoptSearchCV

from sklearn.datasets import load_iris
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split

X, y = load_iris(True)
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.75, random_state=0)

opt = SkoptSearchCV(
    SVC(),
    [{
        'C': Real(1e-6, 1e+6, prior='log-uniform'),
        'gamma': Real(1e-6, 1e+1, prior='log-uniform'),
        'degree': Integer(1,8),
        'kernel': Categorical(['linear', 'poly', 'rbf']),
    }],
    n_jobs=1, n_iter=32,
)

opt.fit(X_train, y_train)
print(opt.score(X_test, y_test))

0.973684210526


## Advanced example 

In many practical cases one wants to apply multiple predictive model classes, where different number of evaluations is expected to yields satisfactory results. An example of this using `Pipeline` class is given below. 

In [17]:
from skopt.space import Real, Categorical, Integer

from sklearn.datasets import load_iris
from sklearn.svm import SVC, LinearSVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.pipeline import Pipeline
from sklearn.model_selection import train_test_split

X, y = load_iris(True)
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.75, random_state=0)

pipe = Pipeline([
    ('model', SVC())
])

lin_search = {
    'model': Categorical([LinearSVC()]),
    'model__C': Real(1e-6, 1e+6, prior='log-uniform'),
}

svc_search = {
    'model': Categorical([SVC()]),
    'model__C': Real(1e-6, 1e+6, prior='log-uniform'),
    'model__gamma': Real(1e-6, 1e+1, prior='log-uniform'),
    'model__degree': Integer(1,8),
    'model__kernel': Categorical(['linear', 'poly', 'rbf']),
}

dtc_search = {
    'model': Categorical([DecisionTreeClassifier()]),
    'model__max_depth': Integer(1,32),
    'model__min_samples_split': Real(1e-3, 1.0, prior='log-uniform'),
}

opt = SkoptSearchCV(
    pipe,
    [(lin_search, 16), (dtc_search, 24), (svc_search, 32)],
)

opt.fit(X_train, y_train)
print(opt.score(X_test, y_test))

0.973684210526


## Iterative search utilizing `step` function

The class also provides a  `step` function which allows to have custom stopping criterions and to set up recovery from failures more easily. 

A minimalistic example is shown below. 

In [15]:
from skopt.space import Real, Categorical, Integer
from skopt.wrappers import SkoptSearchCV

from sklearn.datasets import load_iris
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split

X, y = load_iris(True)
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.75, random_state=0)

search_space = {
    'C': Real(1e-6, 1e+6, prior='log-uniform'),
    'gamma': Real(1e-6, 1e+1, prior='log-uniform'),
    'degree': Integer(1,8),
    'kernel': Categorical(['linear', 'poly', 'rbf']),
}

opt = SkoptSearchCV(
    SVC(),
    None
)

for i in range(32):
    opt.step(X_train, y_train, search_space)
    # save the model and use custom stopping criterion here
    # ...
    print(i, opt.score(X_test, y_test))

0 0.236842105263
1 0.868421052632
2 0.973684210526
3 0.973684210526
4 0.973684210526
5 0.973684210526
6 0.973684210526
7 0.973684210526
8 0.973684210526
9 0.973684210526
10 0.973684210526
11 0.973684210526
12 0.973684210526
13 0.973684210526
14 0.973684210526
15 0.973684210526
16 0.973684210526
17 0.973684210526
18 0.973684210526
19 0.973684210526
20 0.973684210526
21 0.973684210526
22 0.973684210526
23 0.973684210526
24 0.973684210526
25 0.973684210526
26 0.973684210526
27 0.973684210526
28 0.973684210526
29 0.973684210526
30 0.973684210526
31 0.973684210526
