# Hyper Parameter Search with Hyperband

The Hyperband algorithm finds good hyperparameters when using incremental models that receive many chunks of data piece by piece.  It works by trying many parameters on small pieces of data, and then only following up with those parameter sets that seem to be converging quickly.

This example simulates searching over two parameters for the sklearn's SGDClassifier, `alpha` and `loss`. These control what objective function we're minimizing and how much regularization is present.

Hyperband is similar to `RandomizedSearchCV` and works best with continuous random variables. We simulate log-uniform random variable with lots of samples: `np.logspace(-4, 1, num=1000)`.

In [None]:
import numpy as np
import dask.array as da

from sklearn.model_selection import train_test_split
from sklearn.linear_model import SGDClassifier

import dask
import dask_ml
from dask_ml.datasets import make_classification
from dask_ml.model_selection import HyperbandCV
from dask_ml.model_selection import train_test_split

In [None]:
from distributed import Client, LocalCluster
client = Client()
client

## Set up example problem

In [None]:
n, d = 100000, 100
X, y = make_classification(n_features=d, 
                           n_samples=n,
                           n_informative=d // 10,
                           chunks=(n // 50, d))
classes = da.unique(y)
X_train, X_test, y_train, y_test = dask.persist(*train_test_split(X, y))

model = SGDClassifier(
    penalty='elasticnet',
    max_iter=1.0, 
    warm_start=True,
)

params = {'alpha': np.logspace(-4, 1, num=1000),
          'loss': ['hinge', 'log', 'modified_huber', 'squared_hinge']}

## Fit quickly with Hyperband

The Hyperband algorithm is relatively fast and will find a good set of hyperparameters quickly.

In [None]:
alg = HyperbandCV(model, params)

In [None]:
%%time
alg.fit(X_train, y_train, classes=da.unique(y))

In [None]:
alg.score(X_test, y_test)

In [None]:
alg.best_params_

## Compare to GridSearchCV

We can compare with the traditional GridSearchCV algorithm, which is exhaustive, though comparatively slow

In [None]:
from dask_ml.model_selection import GridSearchCV
params = {'alpha': np.logspace(-4, 1, num=100),
          'loss': ['hinge', 'log', 'modified_huber', 'squared_hinge']}
grid = GridSearchCV(model, params, return_train_score=True)

In [None]:
%%time
grid.fit(X, y)  # this may take a few minutes

In [None]:
grid.best_params_

In [None]:
grid.score(X_test, y_test)

## Compare results

We find that the parameters are not exactly the same, but the results are quite similar

In [None]:
import pandas as pd
df = pd.DataFrame(grid.cv_results_)
df.columns

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt

In [None]:
fig, ax = plt.subplots()
for loss in df.param_loss.unique():
    show = df[df.param_loss == loss]
    show.plot(x='param_alpha', y='mean_test_score',
              logx=True, ax=ax,
             label=loss)
plt.plot(2 * [grid.best_params_['alpha']], plt.ylim(), 'p--',
         label=f'Hyperband alpha')
plt.legend(loc='best')
plt.ylabel('mean_test_score')
print('Hyperband loss function =', grid.best_params_['loss'])
plt.show()