# Hyperband
Hyperband is useful when limited by computational resources. Some examples cases are when

* there are many parameters to search over
* models take a long time to train

Hyperband does only require *one* input, the computational budget. For more information on this, see the documentation: https://dask-ml.readthedocs.io/en/latest/hyper-parameter-search.html

Hyperband is an *adaptive* algorithm: it spends as much time as possible on high-performing models by "killing" off the lower portion. More detail in mentioned in the `HyperbandCV` class description: https://dask-ml.readthedocs.io/en/latest/modules/generated/dask_ml.model_selection.GridSearchCV.html#dask_ml.model_selection.HyperbandCV

Below, we'll simulate having many parameters to search over by having one parameters. We would have two, but we want to have a easy-to-interpret visualization at the end.

Hyperband is very similar to `RandomizedSearchCV` and works best with continuous random variables. We simulate log-uniform random variable with lots of samples: `np.logspace(-4, 1, num=1000)`.

In [None]:
import numpy as np
import dask.array as da

from sklearn.linear_model import SGDClassifier

import dask_ml
from dask_ml.datasets import make_classification
from dask_ml.wrappers import Incremental
from dask_ml.model_selection import HyperbandCV, GridSearchCV, train_test_split

In [None]:
from distributed import Client, LocalCluster
client = Client()

In [None]:
n, d = int(10e3), int(100)
X, y = make_classification(n_features=d, n_samples=n,
                           n_informative=d // 10,
                           chunks=(n // 10, d))
classes = da.unique(y)
X_train, X_test, y_train, y_test = train_test_split(X, y)

kwargs = dict(penalty='elasticnet', max_iter=1.0, warm_start=True, loss='log')
model = Incremental(SGDClassifier(**kwargs))
params = {'alpha': np.logspace(-4, 1, num=1000)}

In [None]:
alg = HyperbandCV(model, params, max_iter=81)

In [None]:
%%time
alg.fit(X_train, y_train, classes=da.unique(y))

In [None]:
alg.score(X_test, y_test)

In [None]:
hyperband_alpha = alg.best_params_['alpha']
alg.best_params_

Now will we compare with an exhaustive evaluation, which we can do because we're only simulating being computationally limited.

We will use `GridSearchCV`, and set the loss of the model to be the loss Hyperband found. We'll do this because this is the really the only to show an fair visualization: otherwise we're comparing `alpha`s across loss functions, which doesn't make sense.

Note that this visualization hides the fact that Hyperband was searching between 5 different loss functions.

In [None]:
%%time
params = {'alpha': np.logspace(-4, 1, num=50)}
grid = GridSearchCV(model.estimator, params, return_train_score=False)
grid.fit(X, y)

In [None]:
grid.best_params_

In [None]:
import matplotlib.pyplot as plt
import pandas as pd
df = pd.DataFrame(grid.cv_results_)

fig, ax = plt.subplots()
df.plot(x='param_alpha', y='mean_test_score',
        yerr='std_test_score',
        logx=True, ax=ax)
ax.plot(2 * [hyperband_alpha], plt.ylim(), 'r--',
         label="Hyperband's chosen alpha")
plt.legend(loc='lower left')
plt.ylabel('mean_test_score')
plt.show()