# Asynchronous Successive Halving (ASHA)
Successive halving is an algorithm based on the multi-armed bandit methodology. The ASHA algorithm is a way to combine random search with principled early stopping in an asynchronous way. We highly recommend this blog post by the authors of this method: https://blog.ml.cmu.edu/2018/12/12/massively-parallel-hyperparameter-optimization/ .

In [None]:
import sherpa
import sherpa.algorithms.bayesian_optimization as bayesian_optimization
import keras
from keras.models import Sequential, load_model
from keras.layers import Dense, Flatten
from keras.datasets import mnist
from keras.optimizers import Adam
import tempfile
import os
import shutil

## Dataset Preparation

In [2]:
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train/255.0, x_test/255.0

## Sherpa Setup
In this example we use $R=9$ and $\eta=3$. That means to obtain one finished configuration we will train 9 configurations for 1 epochs, pick 3 configurations of those and train for 3 more epochs, then pick one out of those and train for another 9 epochs. You can increase the *max_finished_configs* argument to do a larger search.

In [20]:
parameters = [sherpa.Continuous('learning_rate', [1e-4, 1e-2], 'log'),
              sherpa.Discrete('num_units', [32, 128]),
              sherpa.Choice('activation', ['relu', 'tanh', 'sigmoid'])]
algorithm = alg = sherpa.algorithms.SuccessiveHalving(r=1, R=9, eta=3, s=0, max_finished_configs=1)
study = sherpa.Study(parameters=parameters,
                     algorithm=algorithm,
                     lower_is_better=False,
                     dashboard_port=8995)

INFO:sherpa.core:
-------------------------------------------------------
SHERPA Dashboard running. Access via
http://128.195.75.106:8995 if on a cluster or
http://localhost:8995 if running locally.
-------------------------------------------------------


Make a temporary directory to store model files in. Successive Halving tries hyperparameter configurations for bigger and bigger budgets (training epochs). Therefore, intermediate models have to be saved.

In [21]:
model_dir = tempfile.mkdtemp()

## Hyperparameter Optimization
**Note**: we manually infer the number of epochs that the model has trained for so we can give this information to Keras.

In [22]:
for trial in study:
    # Getting number of training epochs
    initial_epoch = {1: 0, 3: 1, 9: 4}[trial.parameters['resource']]
    epochs = trial.parameters['resource'] + initial_epoch
    
    print("-"*100)
    print(f"Trial:\t{trial.id}\nEpochs:\t{initial_epoch} to {epochs}\nParameters:{trial.parameters}\n")
    
    if trial.parameters['load_from'] == "":
        print(f"Creating new model for trial {trial.id}...\n")
        
        # Get hyperparameters
        lr = trial.parameters['learning_rate']
        num_units = trial.parameters['num_units']
        act = trial.parameters['activation']

        # Create model
        model = Sequential([Flatten(input_shape=(28, 28)),
                            Dense(num_units, activation=act),
                            Dense(10, activation='softmax')])
        optimizer = Adam(lr=lr)
        model.compile(loss='sparse_categorical_crossentropy',
                      optimizer=optimizer,
                      metrics=['accuracy'])
    else:
        print(f"Loading model from: ", os.path.join(model_dir, trial.parameters['load_from']), "...\n")
        
        # Loading model
        model = load_model(os.path.join(model_dir, trial.parameters['load_from']))
        

    # Train model
    for i in range(initial_epoch, epochs):
        model.fit(x_train, y_train, initial_epoch=i, epochs=i+1)
        loss, accuracy = model.evaluate(x_test, y_test)
        
        print("Validation accuracy: ", accuracy)
        study.add_observation(trial=trial, iteration=i,
                              objective=accuracy,
                              context={'loss': loss})
    
    study.finalize(trial=trial)
    print(f"Saving model at: ", os.path.join(model_dir, trial.parameters['save_to']))
    model.save(os.path.join(model_dir, trial.parameters['save_to']))
    
    study.save(model_dir)

----------------------------------------------------------------------------------------------------
Trial:	1
Epochs:	0 to 1
Parameters:{'learning_rate': 0.0006779922111149317, 'num_units': 67, 'activation': 'tanh', 'resource': 1, 'rung': 0, 'load_from': '', 'save_to': '1'}
Creating new model for trial 1...

Epoch 1/1
Validation accuracy:  0.9426
Saving model at:  /var/folders/5v/l788ch2j7tg0q0y1rt04c08w0000gn/T/tmpa7vbw5xz/1
----------------------------------------------------------------------------------------------------
Trial:	2
Epochs:	0 to 1
Parameters:{'learning_rate': 0.0007322493943507595, 'num_units': 53, 'activation': 'sigmoid', 'resource': 1, 'rung': 0, 'load_from': '', 'save_to': '2'}
Creating new model for trial 2...

Epoch 1/1
Validation accuracy:  0.9213
Saving model at:  /var/folders/5v/l788ch2j7tg0q0y1rt04c08w0000gn/T/tmpa7vbw5xz/2
----------------------------------------------------------------------------------------------------
Trial:	3
Epochs:	0 to 1
Parameters:{

The best found hyperparameter configuration is:

In [23]:
study.get_best_result()

{'Iteration': 6,
 'Objective': 0.9744,
 'Trial-ID': 12,
 'activation': 'tanh',
 'learning_rate': 0.0025240507488864423,
 'load_from': '11',
 'loss': 0.08811961327217287,
 'num_units': 124,
 'resource': 9,
 'rung': 2,
 'save_to': '12'}

This model is stored at:

In [24]:
print(os.path.join(model_dir, study.get_best_result()['save_to']))

/var/folders/5v/l788ch2j7tg0q0y1rt04c08w0000gn/T/tmpa7vbw5xz/12


To remove the model directory:

In [25]:
# Remove model_dir
shutil.rmtree(model_dir)