# Hyperband
Hyperband is useful when limited by computational resources. Some examples cases are when

* there are many parameters to search over
* models take a long time to train

Hyperband does only require *one* input, the computational budget. For more information on this, see the documentation: https://dask-ml.readthedocs.io/en/latest/hyper-parameter-search.html

Hyperband is an *adaptive* algorithm: it spends as much time as possible on high-performing models by "killing" off the lower portion. More detail in mentioned in the `HyperbandCV` class description: https://dask-ml.readthedocs.io/en/latest/modules/generated/dask_ml.model_selection.GridSearchCV.html#dask_ml.model_selection.HyperbandCV

Below, we'll simulate having many parameters to search over by having two parameters. These are the most basic for the sklearn's SGDClassifier, `alpha` and `loss`. These control what objective function we're minimizing and how much regularization is present.

Hyperband is very similar to `RandomizedSearchCV` and works best with continuous random variables. We simulate log-uniform random variable with lots of samples: `np.logspace(-4, 1, num=1000)`.

In [None]:
from time import time, sleep

import numpy as np
import scipy.stats
import dask.array as da

from sklearn.model_selection import train_test_split
from sklearn.linear_model import SGDClassifier
import sklearn

import dask_ml
from dask_ml.datasets import make_classification
from dask_ml.wrappers import Incremental
from dask_ml.model_selection import HyperbandCV
from dask_ml.model_selection import train_test_split

In [None]:
from distributed import Client, LocalCluster
client = Client()

In [None]:
n, d = int(10e3), int(100)
X, y = make_classification(n_features=d, n_samples=n,
                           n_informative=d // 10,
                           chunks=(n // 10, d))
classes = da.unique(y)
X_train, X_test, y_train, y_test = train_test_split(X, y)

kwargs = dict(loss='hinge', penalty='elasticnet',
              max_iter=1.0, warm_start=True)
model = Incremental(SGDClassifier(**kwargs))
params = {'alpha': np.logspace(-4, 1, num=1000),
          'loss': ['hinge', 'log', 'modified_huber', 'squared_hinge']}

In [None]:
alg = HyperbandCV(model, params)

In [None]:
start = time()
alg.fit(X_train, y_train, classes=da.unique(y))
actual_time = time() - start
print(f"Fitting time = {actual_time}")

In [None]:
alg.score(X_test, y_test)

In [None]:
alg.best_params_

In [None]:
from dask_ml.model_selection import GridSearchCV
params = {'alpha': np.logspace(-4, 1, num=10),
          'loss': ['hinge', 'log', 'modified_huber', 'squared_hinge']}
grid = GridSearchCV(model.estimator, params, return_train_score=True)
start = time()
grid.fit(X, y)
print("Grid search time =", time() - start)

In [None]:
opt_alpha = grid.best_params_['alpha']
opt_loss = grid.best_params_['loss']
grid.best_params_

In [None]:
import pandas as pd
df = pd.DataFrame(grid.cv_results_)
df.columns

In [None]:
import matplotlib.pyplot as plt

fig, ax = plt.subplots()
for loss in df.param_loss.unique():
    show = df[df.param_loss == loss]
    show.plot(x='param_alpha', y='mean_test_score',
              logx=True, ax=ax,
             label=loss)
plt.plot(2 * [opt_alpha], plt.ylim(), 'p--',
         label=f'Hyperband alpha')
plt.legend(loc='best')
plt.ylabel('mean_test_score')
print('Hyperband loss function =', opt_loss)
plt.savefig('hyperband.png', dpi=300)
plt.show()