Choosing the good hyperparameters of parameters is one of most imprtant procedures, but pretty much annoying and time consuming. As long as you are working on small subsets of hyperparameters, you may find an optimal hyperparameters after a few trials. That is, however, not the case for complex models like neural network. As basic algirhmts to tune them, we can consider grid, random, and Bayesian optimization. 

# Background
Hyperparameter optimization can mostly be considered as black-box optimization. Black-box optimization is defined as the following:

> "Black Box" optimization refers to a problem setup in which an optimization algorithm is supposed to optimize (e.g., minimize) an objective function through a so-called black-box interface: the algorithm may query the value f(x) for a point x, but it does not obtain gradient information, and in particular it cannot make any assumptions on the analytic form of f (e.g., being linear or quadratic). We think of such an objective function as being wrapped in a black-box. The goal of optimization is to find an as good as possible value f(x) within a predefined time, often defined by the number of available queries to the black box. Problems of this type regularly appear in practice, e.g., when optimizing parameters of a model that is either in fact hidden in a black box (e.g., a third party software library) or just too complex to be modeled explicitly.

> by [Balck-Box Optimization Competition homepage](https://bbcomp.ini.rub.de/).

\* There are some hyperparameter optimization methods to make use of gradient information of models, e.g., [paper1](http://proceedings.mlr.press/v37/maclaurin15.pdf).

When optimizing hyperparameters, information available is mostly only score value of defined metrics(e.g., accuracy for classification) with respect each set of hyper parameters. Thus, we query a set of hyperparameters and get a score value as a response. How to make efficient queries depends on which problem you are working on. In this article, we go through the most basic algorithms: grid, random, and Bayesian optimization. Then, we compare their performances on toy problems.
 

# Grid Search
Grid search is the simplest 

In [1]:
import tensorflow as tf


mnist = tf.contrib.learn.datasets.load_dataset("mnist")

  return f(*args, **kwds)
  from ._conv import register_converters as _register_converters


Extracting MNIST-data/train-images-idx3-ubyte.gz
Extracting MNIST-data/train-labels-idx1-ubyte.gz
Extracting MNIST-data/t10k-images-idx3-ubyte.gz
Extracting MNIST-data/t10k-labels-idx1-ubyte.gz


In [2]:
from sklearn.preprocessing import OneHotEncoder
import numpy as np


train = mnist.train
X = train.images
train_X = X
train_y = np.expand_dims(train.labels, -1)
train_y = OneHotEncoder().fit_transform(train_y)

valid = mnist.validation
X = valid.images
valid_X = X 
valid_y = np.expand_dims(valid.labels, -1)
valid_y = OneHotEncoder().fit_transform(valid_y)

test = mnist.test
X = test.images
test_X = X
test_y = test.labels

For the sake of the simplicity, we are going to use the following six parameters:

- the number of layers
- the number of hidden units
- learning rate
- weight regularizer
- optimization algorithm

In [3]:
from hedgeable_ai.optimizer.tuner import BayesOptimizer, RandomOptimizer
from keras.models import Sequential
from keras.layers import Dense, BatchNormalization, Dropout
from keras.layers import Activation, Reshape
from keras.optimizers import Adam, Adadelta, SGD, RMSprop
from keras.regularizers import l1, l2


def get_optimzier(name, **kwargs):
    if name == "rmsprop":
        return RMSprop(**kwargs)
    elif name == "adam":
        return Adam(**kwargs)
    elif name == "sgd":
        return SGD(**kwargs)
    elif name == "adadelta":
        return Adadelta(**kwargs)
    else:
        raise ValueError(name)


def construct_NN(params):
    model = Sequential()
    model.add(Reshape((784,), input_shape=(784,)))
    
    def update_model(_model, _params, name):
        _model.add(Dropout(_params[name + "_drop_rate"]))
        _model.add(Dense(units=_params[name + "_num_units"],
                    activation=None,
                    kernel_regularizer=l2(_params[name + "_w_reg"])))
        if _params[name + "_is_batch"]:
            _model.add(BatchNormalization())
        if _params[name + "_activation"] is not None:
            _model.add(Activation(_params[name + "_activation"]))
        return _model
    
    # Add input layer    
    model = update_model(model, params, "input")
    # Add hidden layer
    for i in range(params["num_hidden_layers"]):
        model = update_model(model, params, "hidden")
    # Add output layer
    model = update_model(model, params, "output")
    optimizer = get_optimzier(params["optimizer"],
                              lr=params["learning_rate"])
    model.compile(optimizer=optimizer,
              loss='categorical_crossentropy',
              metrics=['accuracy'])
    return model
        

def score_func(params):
    print("parameters", params)
    model = construct_NN(params)
    model.fit(train_X, train_y,
              epochs=params["epochs"],
              batch_size=params["batch_size"], verbose=1)
    print("###################", model.metrics_names)
    score = model.evaluate(valid_X, valid_y,
                  batch_size=params["batch_size"])
    idx = model.metrics_names.index("acc")
    score = score[idx]
    return score

params_conf = [
    {"name": "num_hidden_layers", "type": "integer",
     "domain": (0, 5)},
    {"name": "batch_size", "type": "integer",
     "domain": (16, 128), "scale": "log"},
    {"name": "learning_rate", "type": "continuous",
     "domain": (1e-5, 1e-1), "scale": "log"},
    {"name": "epochs", "type": "fixed",
     "domain": 1, "scale": "log"},
    {"name": "optimizer", "type": "categorical",
     "domain": ("rmsprop", "sgd", "adam", "adadelta")},
    
    {"name": "input_drop_rate", "type": "continuous",
     "domain": (0, 0.5)},
    {"name": "input_num_units", "type": "integer",
     "domain": (32, 256), "scale": "log"},
    {"name": "input_w_reg", "type": "continuous",
     "domain": (1e-10, 1e-1), "scale": "log"},
    {"name": "input_is_batch", "type": "categorical",
     "domain": (True, False)},
    {"name": "input_activation", "type": "categorical",
     "domain": ("relu", "sigmoid", "tanh")},
    
    {"name": "hidden_drop_rate", "type": "continuous",
     "domain": (0, 0.75)},
    {"name": "hidden_num_units", "type": "integer",
     "domain": (32, 256), "scale": "log"},
    {"name": "hidden_w_reg", "type": "continuous",
     "domain": (1e-10, 1e-1), "scale": "log"},
    {"name": "hidden_is_batch", "type": "categorical",
     "domain": (True, False)},
    {"name": "hidden_activation", "type": "categorical",
     "domain": ("relu", "sigmoid", "tanh")},
    
    {"name": "output_drop_rate", "type": "continuous",
     "domain": (0, 0.5)},
    {"name": "output_num_units", "type": "fixed",
     "domain": 10},
    {"name": "output_w_reg", "type": "continuous",
     "domain": (1e-10, 1e-1), "scale": "log"},
    {"name": "output_is_batch", "type": "categorical",
     "domain": (True, False)},
    {"name": "output_activation", "type": "fixed",
     "domain": "softmax"},
    
]

Using TensorFlow backend.


In [4]:
from bboptimizer.samplers.random import RandomSampler
from bboptimizer import Optimizer

opt = Optimizer(score_func, params_conf, sampler="random")

In [5]:
opt.search()

parameters {'num_hidden_layers': 0, 'batch_size': 117, 'learning_rate': 0.0055451421884380216, 'epochs': 1, 'optimizer': 'adadelta', 'input_drop_rate': 0.27502278061733854, 'input_num_units': 243, 'input_w_reg': 0.005581602649551183, 'input_is_batch': False, 'input_activation': 'relu', 'hidden_drop_rate': 0.22328253884527435, 'hidden_num_units': 132, 'hidden_w_reg': 5.942947680444149e-07, 'hidden_is_batch': False, 'hidden_activation': 'tanh', 'output_drop_rate': 0.2973426860984296, 'output_num_units': 10, 'output_w_reg': 2.4430326159511334e-07, 'output_is_batch': False, 'output_activation': 'softmax'}
Epoch 1/1
################### ['loss', 'acc']
parameters {'num_hidden_layers': 2, 'batch_size': 32, 'learning_rate': 0.0009684732902796724, 'epochs': 1, 'optimizer': 'adam', 'input_drop_rate': 0.2952232471941791, 'input_num_units': 38, 'input_w_reg': 0.0026921354562820204, 'input_is_batch': True, 'input_activation': 'sigmoid', 'hidden_drop_rate': 0.41563823220032997, 'hidden_num_units': 3

({'batch_size': 16,
  'epochs': 1,
  'hidden_activation': 'tanh',
  'hidden_drop_rate': 0.35704411145920645,
  'hidden_is_batch': True,
  'hidden_num_units': 256,
  'hidden_w_reg': 1.4273219587170685e-09,
  'input_activation': 'relu',
  'input_drop_rate': 0.13539594937933058,
  'input_is_batch': False,
  'input_num_units': 70,
  'input_w_reg': 0.0019045750018549848,
  'learning_rate': 1.5338416143333576e-05,
  'num_hidden_layers': 4,
  'optimizer': 'adadelta',
  'output_activation': 'softmax',
  'output_drop_rate': 0.17539879252232166,
  'output_is_batch': True,
  'output_num_units': 10,
  'output_w_reg': 1.850005755930935e-06},
 0.0746)

In [7]:
opt._sampler.sample(3)

[{'batch_size': 122,
  'epochs': 1,
  'hidden_activation': 'relu',
  'hidden_drop_rate': 0.41800143601312734,
  'hidden_is_batch': False,
  'hidden_num_units': 58,
  'hidden_w_reg': 1.3843095452312436e-09,
  'input_activation': 'relu',
  'input_drop_rate': 0.19152239944957644,
  'input_is_batch': True,
  'input_num_units': 180,
  'input_w_reg': 1.75967787780324e-08,
  'learning_rate': 0.003043047288864343,
  'num_hidden_layers': 0,
  'optimizer': 'sgd',
  'output_activation': 'softmax',
  'output_drop_rate': 0.24792116099023453,
  'output_is_batch': False,
  'output_num_units': 10,
  'output_w_reg': 1.0194675255696844e-07},
 {'batch_size': 19,
  'epochs': 1,
  'hidden_activation': 'tanh',
  'hidden_drop_rate': 0.03856142201832366,
  'hidden_is_batch': False,
  'hidden_num_units': 117,
  'hidden_w_reg': 4.170240016976096e-08,
  'input_activation': 'tanh',
  'input_drop_rate': 0.43644352622194327,
  'input_is_batch': False,
  'input_num_units': 42,
  'input_w_reg': 0.0012481399408733633,

In [4]:
opt = BayesOptimizer(score_func=score_func,
                      params_conf=params_conf, is_display=True, timeout=None)
opt.search(num_iter=30)

parameters {'num_hidden_layers': 4, 'batch_size': 42, 'learning_rate': 0.00028985783786979956, 'optimizer': 'sgd', 'input_drop_rate': 0.0683744739733505, 'input_num_units': 164, 'input_w_reg': 2.3580152177448804e-07, 'input_is_batch': True, 'input_activation': 'relu', 'hidden_drop_rate': 0.5749382175908537, 'hidden_num_units': 107, 'hidden_w_reg': 0.03589453327038853, 'hidden_is_batch': True, 'hidden_activation': 'tanh', 'output_drop_rate': 0.08960135174196227, 'output_w_reg': 0.0007725908760472369, 'output_is_batch': True, 'epochs': 1, 'output_num_units': 10, 'output_activation': 'softmax'}
Epoch 1/1
################### ['loss', 'acc']
parameters {'num_hidden_layers': 0, 'batch_size': 23, 'learning_rate': 0.012815775018421094, 'optimizer': 'adam', 'input_drop_rate': 0.02827919776496557, 'input_num_units': 123, 'input_w_reg': 0.0004914563612437838, 'input_is_batch': True, 'input_activation': 'relu', 'hidden_drop_rate': 0.6736427441705218, 'hidden_num_units': 60, 'hidden_w_reg': 2.54316

################### ['loss', 'acc']
parameters {'num_hidden_layers': 5, 'batch_size': 16, 'learning_rate': 1e-05, 'optimizer': 'adadelta', 'input_drop_rate': 0.0, 'input_num_units': 32, 'input_w_reg': 2.8722806174478596e-08, 'input_is_batch': True, 'input_activation': 'tanh', 'hidden_drop_rate': 0.75, 'hidden_num_units': 32, 'hidden_w_reg': 0.1, 'hidden_is_batch': True, 'hidden_activation': 'tanh', 'output_drop_rate': 0.5, 'output_w_reg': 1e-10, 'output_is_batch': True, 'epochs': 1, 'output_num_units': 10, 'output_activation': 'softmax'}
Epoch 1/1
################### ['loss', 'acc']
parameters {'num_hidden_layers': 5, 'batch_size': 16, 'learning_rate': 1e-05, 'optimizer': 'sgd', 'input_drop_rate': 0.0, 'input_num_units': 32, 'input_w_reg': 0.00010911849070218321, 'input_is_batch': True, 'input_activation': 'tanh', 'hidden_drop_rate': 0.75, 'hidden_num_units': 32, 'hidden_w_reg': 3.8138052664855576e-05, 'hidden_is_batch': True, 'hidden_activation': 'tanh', 'output_drop_rate': 0.0, 'outp

Epoch 1/1
################### ['loss', 'acc']
parameters {'num_hidden_layers': 5, 'batch_size': 16, 'learning_rate': 0.00010485153024168855, 'optimizer': 'rmsprop', 'input_drop_rate': 0.0, 'input_num_units': 32, 'input_w_reg': 1.1075240606183763e-05, 'input_is_batch': True, 'input_activation': 'sigmoid', 'hidden_drop_rate': 0.75, 'hidden_num_units': 32, 'hidden_w_reg': 0.1, 'hidden_is_batch': True, 'hidden_activation': 'sigmoid', 'output_drop_rate': 0.5, 'output_w_reg': 1e-10, 'output_is_batch': True, 'epochs': 1, 'output_num_units': 10, 'output_activation': 'softmax'}
Epoch 1/1
################### ['loss', 'acc']
parameters {'num_hidden_layers': 5, 'batch_size': 16, 'learning_rate': 1e-05, 'optimizer': 'sgd', 'input_drop_rate': 0.0, 'input_num_units': 32, 'input_w_reg': 0.1, 'input_is_batch': True, 'input_activation': 'sigmoid', 'hidden_drop_rate': 0.75, 'hidden_num_units': 32, 'hidden_w_reg': 0.1, 'hidden_is_batch': True, 'hidden_activation': 'relu', 'output_drop_rate': 0.0, 'output_

({'batch_size': 16,
  'epochs': 1,
  'hidden_activation': 'tanh',
  'hidden_drop_rate': 0.75,
  'hidden_is_batch': True,
  'hidden_num_units': 32,
  'hidden_w_reg': 0.1,
  'input_activation': 'sigmoid',
  'input_drop_rate': 0.0,
  'input_is_batch': True,
  'input_num_units': 32,
  'input_w_reg': 8.552940899039238e-05,
  'learning_rate': 1e-05,
  'num_hidden_layers': 5,
  'optimizer': 'adadelta',
  'output_activation': 'softmax',
  'output_drop_rate': 0.5,
  'output_is_batch': True,
  'output_num_units': 10,
  'output_w_reg': 3.4993703550624235e-10},
 0.0614)

In [5]:
import tensorflow as tf

tf.reset_default_graph()
x = tf.placeholder(tf.float32)

In [6]:
opt.optimizer.

AttributeError: 'BayesOptimizer' object has no attribute 'optimizert'

In [13]:
%pylab inline
import GPy
import GPyOpt
import matplotlib.pyplot as plt

Populating the interactive namespace from numpy and matplotlib


In [49]:
def myf(x):
    print(x)
    return sum((2*x)**2)

bounds = [{'name': 'var_1', 'type': 'continuous', 'domain': (-1,1)},
         {'name': 'var_1', 'type': 'continuous', 'domain': (-1,1)}]

# bounds = [{'name': 'var_1', 'type': 'continuous', 'domain': (-1,1)}]

In [50]:
max_iter = 15
opt = GPyOpt.methods.BayesianOptimization(myf,bounds)
opt.run_optimization(max_iter, verbosity=True)

[[ 0.83472294 -0.35045541]]
[[-0.81677674 -0.20893279]]
[[0.56892437 0.98436479]]
[[-0.70499574  0.15934302]]
[[ 0.36236546 -0.37212878]]
[[ 0.3276707 -0.3794702]]
[[ 0.19993882 -0.76537162]]
[[ 0.16673283 -0.17536047]]
[[-2.46435936e-02 -7.75201312e-05]]
[[-1.  1.]]
[[0.05523789 0.08312356]]
[[-0.0745971  0.0846349]]
[[-0.13251386 -0.12509652]]
[[-0.0692017   0.49375632]]
[[-0.99652227 -0.99553451]]
[[-0.28496292 -0.90985205]]
[[0.12828897 0.05891816]]
[[0.47730358 0.37926197]]
[[-0.58558844 -0.02097188]]
[[-0.67679943  0.899019  ]]


In [38]:
print(opt.X)
print(opt.x_opt)

[[ 9.87769699e-01 -5.22068678e-01]
 [-2.60640122e-01 -5.92168000e-01]
 [ 3.62671315e-01  3.08434924e-01]
 [ 4.23333810e-01 -5.07602750e-01]
 [ 6.87402157e-01  2.10656733e-01]
 [ 3.12278334e-01  1.83631054e-01]
 [ 8.31197264e-04  5.77286352e-02]
 [-1.00000000e+00  9.75483572e-01]
 [ 6.73192012e-02 -9.58554176e-02]
 [-1.00000000e+00 -1.00000000e+00]
 [-6.75609333e-02 -3.06036996e-02]
 [ 1.39019074e-02 -7.44473136e-04]
 [-9.62346335e-01 -2.63916597e-02]
 [ 9.52792460e-01  9.90318356e-01]
 [ 3.20600327e-01 -3.70925950e-01]
 [ 8.19614846e-01 -5.49740227e-01]
 [ 1.98597971e-02 -4.48193390e-01]
 [ 8.05345803e-01  7.37027584e-01]
 [-4.62028321e-01  4.55104405e-01]
 [-7.58883852e-01  1.00326861e-02]]
[ 0.01390191 -0.00074447]


In [56]:
opt.Y - np.sum((2 * opt.X) ** 2, axis=-1, keepdims=True)

array([[0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.]])

In [55]:
opt.X.shape

(20, 2)

In [29]:
opt.Y_best

array([6.61410245e-01, 7.33240385e-02, 7.33240385e-02, 7.33240385e-02,
       7.33240385e-02, 4.15628331e-04, 4.15628331e-04, 4.15628331e-04,
       4.15628331e-04, 4.15628331e-04, 4.15628331e-04, 4.15628331e-04,
       4.15628331e-04, 4.15628331e-04, 4.15628331e-04, 4.15628331e-04,
       4.15628331e-04, 4.15628331e-04, 4.15628331e-04, 4.15628331e-04])

In [35]:
if [3]:
    print("hey")

hey


In [19]:
import numpy as np
np.array(None)

array(None, dtype=object)

In [14]:
x.get('scale', None)

'log'

In [6]:
import math
math.log10(2)

0.3010299956639812

In [5]:
1e-5

1e-05

In [1]:
from hedgeable_ai.optimizer.tuner import RandomOptimizer


mixed_domain =[{'name': 'var1', 'type': 'continuous', 'domain': (-5,5),'dimensionality': 3},
               {'name': 'var3', 'type': 'discrete', 'domain': (3,8,10),'dimensionality': 2},
               {'name': 'var4', 'type': 'categorical', 'domain': ('hey', 'what up', 'oops'),'dimensionality': 1},
               {'name': 'var5', 'type': 'integer', 'domain': (1, 10), 'scale':'log'}]

opt = RandomOptimizer(score_func=lambda x: x, params_conf=mixed_domain)

  from ._conv import register_converters as _register_converters


<hedgeable_ai.optimizer.tuner.variables.ContinuousVariable object at 0x7f513b731908>
{'name': 'var1', 'type': 'continuous', 'domain': (-5, 5), 'dimensionality': 3}
3
<hedgeable_ai.optimizer.tuner.variables.DiscreteVariable object at 0x7f513b7477b8>
{'name': 'var3', 'type': 'discrete', 'domain': (3, 8, 10), 'dimensionality': 2}
2
<hedgeable_ai.optimizer.tuner.variables.CategoricalVariable object at 0x7f513b747978>
{'name': 'var4', 'type': 'categorical', 'domain': ('hey', 'what up', 'oops'), 'dimensionality': 1}
1
<hedgeable_ai.optimizer.tuner.variables.IntegerVariable object at 0x7f513b747b38>
{'name': 'var5', 'type': 'integer', 'domain': (1, 10), 'scale': 'log'}
1


In [2]:
x = [0., 0., 0., 8., 1., 1., 0., 0., 1.]
param = opt.vec2params(x)

[0.0, 0.0, 0.0, 8, 3, 'hey', 10]


In [3]:
param

{'var1_1': 0.0,
 'var1_2': 0.0,
 'var1_3': 0.0,
 'var3_1': 8,
 'var3_2': 3,
 'var4_1': 'hey',
 'var5': 10}

In [4]:
opt.params2vec(param)

array([0., 0., 0., 8., 3., 1., 0., 0., 1.])

In [5]:
opt.design_space.dimensionality

7

In [6]:
opt.design_space.model_dimensionality

9