In [1]:
%run ../talktools.py

# Hyperparameter Optimization (and Overfitting)


"The Theory Behind Overfitting, Cross Validation, Regularization, Bagging, and Boosting: Tutorial" (Benyamin Ghojogh, Mark Crowley)
https://arxiv.org/abs/1905.12787

We already saw `GridSearchCV`. This is an exhaustive approach to optimization and is prone to overfitting.  `RandomSearch` often gives comparable answers but protects, to some degree, overfitting. See "Random Search for Hyper-Parameter Optimization"
James Bergstra, Yoshua Bengio; 13(Feb):281−305, 2012 for a theoretical discussion.

There are Bayesian-like approaches as well, that seek to minimize an objective function over a large range of parameters.

[`hyperopt`](http://hyperopt.github.io/hyperopt/) is one such popular approach, which is used as `hyperas` in `keras` (see e.g., https://arxiv.org/abs/1801.01596)

In [6]:
#!pip install hyperopt hyperas

Collecting hyperas
  Downloading https://files.pythonhosted.org/packages/04/34/87ad6ffb42df9c1fa9c4c906f65813d42ad70d68c66af4ffff048c228cd4/hyperas-0.4.1-py3-none-any.whl
Installing collected packages: hyperas
Successfully installed hyperas-0.4.1


In [1]:
# see http://maxpumperla.com/hyperas/
from hyperopt import Trials, STATUS_OK, tpe
from hyperas import optim
from hyperas.distributions import choice, uniform


def data():
    '''
    Data providing function:

    Make sure to have every relevant import statement included here and return data as
    used in model function below. This function is separated from model() so that hyperopt
    won't reload data for each evaluation run.
    '''
    from tensorflow.keras.datasets import mnist
    from tensorflow.keras.utils import to_categorical

    (X_train, y_train), (X_test, y_test) = mnist.load_data()
    X_train = X_train.reshape(60000, 784)
    X_test = X_test.reshape(10000, 784)
    X_train = X_train.astype('float32')
    X_test = X_test.astype('float32')
    X_train /= 255
    X_test /= 255
    nb_classes = 10
    Y_train = to_categorical(y_train, nb_classes)
    Y_test = to_categorical(y_test, nb_classes)
    return X_train, Y_train, X_test, Y_test


def model(X_train, Y_train, X_test, Y_test):
    '''
    Model providing function:

    Create Keras model with double curly brackets dropped-in as needed.
    Return value has to be a valid python dictionary with two customary keys:
        - loss: Specify a numeric evaluation metric to be minimized
        - status: Just use STATUS_OK and see hyperopt documentation if not feasible
    The last one is optional, though recommended, namely:
        - model: specify the model just created so that we can later use it again.
    '''
    from tensorflow.keras.models import Sequential
    from tensorflow.keras.layers import Dense, Dropout, Activation

    model = Sequential()
    model.add(Dense(512, input_shape=(784,)))
    model.add(Activation('relu'))
    model.add(Dropout({{uniform(0, 1)}}))
    model.add(Dense({{choice([256, 512, 1024])}}))
    model.add(Activation({{choice(['relu', 'sigmoid'])}}))
    model.add(Dropout({{uniform(0, 1)}}))

    model.add(Dense(10))
    model.add(Activation('softmax'))

    model.compile(loss='categorical_crossentropy', optimizer={{choice(['rmsprop', 'adam', 'sgd'])}})

    model.fit(X_train, Y_train,
              batch_size={{choice([64, 128])}},
              nb_epoch=1,
              verbose=1,
              validation_data=(X_test, Y_test))
    acc = model.evaluate(X_test, Y_test, verbose=0)
    print('Test accuracy:', acc)
    return {'loss': -acc, 'status': STATUS_OK, 'model': model}

Using TensorFlow backend.


In [2]:
X_train, Y_train, X_test, Y_test = data()

best_run, best_model = optim.minimize(model=model,
                                      data=data,
                                      algo=tpe.suggest,
                                      max_evals=5, verbose=False,
                                      trials=Trials(),
                                      notebook_name='03_hyperparameter_optimization')


W0606 22:48:08.857167 140735763825472 nn_ops.py:4224] Large dropout rate: 0.73717 (>0.5). In TensorFlow 2.x, dropout() uses dropout rate instead of keep_prob. Please ensure that this is intended.
W0606 22:48:08.921252 140735763825472 nn_ops.py:4224] Large dropout rate: 0.651797 (>0.5). In TensorFlow 2.x, dropout() uses dropout rate instead of keep_prob. Please ensure that this is intended.
W0606 22:48:08.985095 140735763825472 training.py:593] The `nb_epoch` argument in `fit` has been renamed `epochs`.
W0606 22:48:09.051643 140735763825472 deprecation.py:323] From /Users/jbloom/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/math_grad.py:1250: add_dispatch_support.<locals>.wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where


Train on 60000 samples, validate on 10000 samples
  128/60000 [..............................]
 - ETA: 55s - loss: 3.2828

 1152/60000 [..............................]
 - ETA: 8s - loss: 2.9537 

 2176/60000 [>.............................]
 - ETA: 6s - loss: 2.8517

 3200/60000 [>.............................]
 - ETA: 4s - loss: 2.8160

 4224/60000 [=>............................]
 - ETA: 4s - loss: 2.7767

 5120/60000 [=>............................]
 - ETA: 4s - loss: 2.7546

 6016/60000 [==>...........................]
 - ETA: 3s - loss: 2.7467

 6912/60000 [==>...........................]
 - ETA: 3s - loss: 2.7330

 7808/60000 [==>...........................]
 - ETA: 3s - loss: 2.7266

 8704/60000 [===>..........................]
 - ETA: 3s - loss: 2.7180

 9600/60000 [===>..........................]
 - ETA: 3s - loss: 2.7114

10496/60000 [====>.........................]
 - ETA: 3s - loss: 2.7012

11520/60000 [====>.........................]
 - ETA: 3s - loss: 2.6934



W0606 22:48:13.906703 140735763825472 nn_ops.py:4224] Large dropout rate: 0.836667 (>0.5). In TensorFlow 2.x, dropout() uses dropout rate instead of keep_prob. Please ensure that this is intended.
W0606 22:48:14.045001 140735763825472 nn_ops.py:4224] Large dropout rate: 0.912829 (>0.5). In TensorFlow 2.x, dropout() uses dropout rate instead of keep_prob. Please ensure that this is intended.


Test accuracy:
1.8747559463500976


W0606 22:48:14.109874 140735763825472 training.py:593] The `nb_epoch` argument in `fit` has been renamed `epochs`.


Train on 60000 samples, validate on 10000 samples
   64/60000 [..............................]
 - ETA: 2:22 - loss: 4.0212

  384/60000 [..............................]
 - ETA: 31s - loss: 3.3269 

  768/60000 [..............................]
 - ETA: 19s - loss: 3.0616

 1152/60000 [..............................]
 - ETA: 15s - loss: 2.9117

 1472/60000 [..............................]
 - ETA: 14s - loss: 2.8174

 1792/60000 [..............................]
 - ETA: 13s - loss: 2.7429

 2176/60000 [>.............................]
 - ETA: 12s - loss: 2.6709

 2496/60000 [>.............................]
 - ETA: 12s - loss: 2.6121

 2816/60000 [>.............................]
 - ETA: 11s - loss: 2.5693

 3136/60000 [>.............................]
 - ETA: 11s - loss: 2.5308

 3456/60000 [>.............................]
 - ETA: 11s - loss: 2.4910

 3776/60000 [>.............................]
 - ETA: 11s - loss: 2.4535

 4096/60000 [=>............................]
 - ETA: 10s - loss: 2.4223


 - ETA: 3s - loss: 1.1737

 - ETA: 3s - loss: 1.1684

 - ETA: 3s - loss: 1.1652

 - ETA: 3s - loss: 1.1621

 - ETA: 3s - loss: 1.1578

 - ETA: 3s - loss: 1.1537

 - ETA: 3s - loss: 1.1505

 - ETA: 3s - loss: 1.1473

 - ETA: 2s - loss: 1.1431

 - ETA: 2s - loss: 1.1395

 - ETA: 2s - loss: 1.1353

 - ETA: 2s - loss: 1.1304

 - ETA: 2s - loss: 1.1256

 - ETA: 2s - loss: 1.1212

 - ETA: 2s - loss: 1.1171

 - ETA: 2s - loss: 1.1143

 - ETA: 2s - loss: 1.1100

 - ETA: 2s - loss: 1.1071

 - ETA: 2s - loss: 1.1036

 - ETA: 2s - loss: 1.0995

 - ETA: 2s - loss: 1.0966

 - ETA: 2s - loss: 1.0942

 - ETA: 2s - loss: 1.0907

 - ETA: 2s - loss: 1.0874

 - ETA: 2s - loss: 1.0839

 - ETA: 2s - loss: 1.0819

 - ETA: 2s - loss: 1.0795

 - ETA: 1s - loss: 1.0772

 - ETA: 1s - loss: 1.0752

 - ETA: 1s - loss: 1.0719

 - ETA: 1s - loss: 1.0707

 - ETA: 1s - loss: 1.0682

 - ETA: 1s - loss: 1.0666

 - ETA: 1s - loss: 1.0648

 - ETA: 1s - loss: 1.0620

 - ETA: 1s - loss:

W0606 22:48:25.710746 140735763825472 nn_ops.py:4224] Large dropout rate: 0.975819 (>0.5). In TensorFlow 2.x, dropout() uses dropout rate instead of keep_prob. Please ensure that this is intended.
W0606 22:48:25.808398 140735763825472 training.py:593] The `nb_epoch` argument in `fit` has been renamed `epochs`.


Test accuracy:
0.31170902903676034
Train on 60000 samples, validate on 10000 samples
   64/60000 [..............................]
 - ETA: 1:52 - loss: 3.3697

  576/60000 [..............................]
 - ETA: 17s - loss: 2.9777 

 1024/60000 [..............................]
 - ETA: 12s - loss: 2.8745

 1472/60000 [..............................]
 - ETA: 11s - loss: 2.7416

 1856/60000 [..............................]
 - ETA: 10s - loss: 2.6849

 2240/60000 [>.............................]
 - ETA: 9s - loss: 2.6222 

 2688/60000 [>.............................]
 - ETA: 9s - loss: 2.5683

 3200/60000 [>.............................]
 - ETA: 8s - loss: 2.5135

 3712/60000 [>.............................]
 - ETA: 8s - loss: 2.4602

 4288/60000 [=>............................]
 - ETA: 7s - loss: 2.4140

 4800/60000 [=>............................]
 - ETA: 7s - loss: 2.3834

 5312/60000 [=>............................]
 - ETA: 7s - loss: 2.3585

 5824/60000 [=>....................


 - ETA: 1s - loss: 1.8048

 - ETA: 1s - loss: 1.8031

 - ETA: 1s - loss: 1.8013

 - ETA: 0s - loss: 1.7986

 - ETA: 0s - loss: 1.7975

 - ETA: 0s - loss: 1.7960

 - ETA: 0s - loss: 1.7944

 - ETA: 0s - loss: 1.7930

 - ETA: 0s - loss: 1.7918

 - ETA: 0s - loss: 1.7889

 - ETA: 0s - loss: 1.7873

 - ETA: 0s - loss: 1.7863

 - ETA: 0s - loss: 1.7839

 - ETA: 0s - loss: 1.7826

 - ETA: 0s - loss: 1.7804

 - ETA: 0s - loss: 1.7788

 - ETA: 0s - loss: 1.7771

 - ETA: 0s - loss: 1.7762

 - ETA: 0s - loss: 1.7750

 - ETA: 0s - loss: 1.7733

 - ETA: 0s - loss: 1.7730

 - ETA: 0s - loss: 1.7717

 - ETA: 0s - loss: 1.7714

 - ETA: 0s - loss: 1.7701

 - ETA: 0s - loss: 1.7684

 - 8s 136us/sample - loss: 1.7682 - val_loss: 0.9307



W0606 22:48:35.399667 140735763825472 training.py:593] The `nb_epoch` argument in `fit` has been renamed `epochs`.


Test accuracy:
0.9307292781829833
Train on 60000 samples, validate on 10000 samples
  128/60000 [..............................]
 - ETA: 56s - loss: 2.7554

 1152/60000 [..............................]
 - ETA: 8s - loss: 2.4448 

 2304/60000 [>.............................]
 - ETA: 5s - loss: 2.2362

 3584/60000 [>.............................]
 - ETA: 4s - loss: 2.0663

 4864/60000 [=>............................]
 - ETA: 3s - loss: 1.9279

 6144/60000 [==>...........................]
 - ETA: 3s - loss: 1.7939

 7296/60000 [==>...........................]
 - ETA: 3s - loss: 1.6974

 8320/60000 [===>..........................]
 - ETA: 3s - loss: 1.6274

 9216/60000 [===>..........................]
 - ETA: 2s - loss: 1.5600

10112/60000 [====>.........................]
 - ETA: 2s - loss: 1.5032

11008/60000 [====>.........................]
 - ETA: 2s - loss: 1.4584

12032/60000 [=====>........................]
 - ETA: 2s - loss: 1.4088

13056/60000 [=====>....................

W0606 22:48:39.731167 140735763825472 training.py:593] The `nb_epoch` argument in `fit` has been renamed `epochs`.


Test accuracy:
0.22913656501471996
Train on 60000 samples, validate on 10000 samples
  128/60000 [..............................]
 - ETA: 1:00 - loss: 2.5028

  896/60000 [..............................]
 - ETA: 12s - loss: 2.0000 

 1024/60000 [..............................]
 - ETA: 15s - loss: 1.9069

 1664/60000 [..............................]
 - ETA: 11s - loss: 1.5613

 2432/60000 [>.............................]
 - ETA: 8s - loss: 1.3332 

 3200/60000 [>.............................]
 - ETA: 7s - loss: 1.1617

 3968/60000 [>.............................]
 - ETA: 6s - loss: 1.0544

 4736/60000 [=>............................]
 - ETA: 6s - loss: 0.9721

 5504/60000 [=>............................]
 - ETA: 5s - loss: 0.9054

 6400/60000 [==>...........................]
 - ETA: 5s - loss: 0.8493

 7168/60000 [==>...........................]
 - ETA: 5s - loss: 0.8055

 8192/60000 [===>..........................]
 - ETA: 4s - loss: 0.7560

 9216/60000 [===>..................

In [3]:
print("Evalutation of best performing model:")
print(best_model.evaluate(X_test, Y_test))

Evalutation of best performing model:
1.8747559463500976


In [4]:
best_run

{'Activation': 1,
 'Dense': 1,
 'Dropout': 0.7371698374615214,
 'Dropout_1': 0.6517968154887782,
 'batch_size': 1,
 'optimizer': 2}

<img src="https://github.com/keras-team/autokeras/raw/master/logo.png?raw=true" width="40%">
> The ultimate goal of AutoML is to provide easily accessible deep learning tools to domain experts with limited data science or machine learning background. Auto-Keras provides functions to automatically search for architecture and hyperparameters of deep learning models.

https://github.com/keras-team/autokeras

This competes with Google's hosted service called AutoML (https://cloud.google.com/automl/).

There are more complex ways to train deep networks, such as using genetic algorithms or deep learning itself!  See "Learning to learn by gradient descent by gradient descent" (https://arxiv.org/abs/1606.04474).