Skip to content
Hyerparameter Optimization for PyTorch
Branch: master
Clone or download
Type Name Latest commit message Commit time
Failed to load latest commit information.
assets bug fixes. 0.4 fix Jun 4, 2018
proofs finished reading paper and taking notes Feb 4, 2018
results argmax to argmin correction Jun 7, 2018
.gitignore added bookkeeping dict Feb 17, 2018
LICENSE Initial commit Feb 1, 2018 bug fixes. 0.4 fix Jun 4, 2018 argmax to argmin correction Jun 7, 2018 bug fixes. 0.4 fix Jun 4, 2018 argmax to argmin correction Jun 7, 2018 argmax to argmin correction Jun 7, 2018 bug fixes. 0.4 fix Jun 4, 2018 update progress Feb 4, 2018
requirements.txt enhanced readme Feb 17, 2018 bug fix Feb 17, 2018


Tune the hyperparameters of your PyTorch models with HyperSearch.



Note: We currently only support FC networks. ConvNet support coming soon!

  • Install requirements using:
pip install -r requirements.txt
  • Define your model in This should return a nn.Sequential object. Take note of the last layer, i.e. using nn.LogSoftmax() vs. nn.Softmax() will require possible changes in the training method. For example, let's define a 4 layer FC network as follows:
  (0): Linear(in_features=784, out_features=512)
  (1): ReLU()
  (2): Linear(in_features=512, out_features=256)
  (3): ReLU()
  (4): Linear(in_features=256, out_features=128)
  (5): ReLU()
  (6): Linear(in_features=128, out_features=10)
  (7): LogSoftmax()
  • Write your own if you do not have a dataset that is supported by torchvision.datasets. Else, slightly edit to suit your dataset of choice: CIFAR-10, CIFAR-100, Fashion-MNIST, MNIST, etc.
  • Create your hyperparameter dictionary in You must follow the following syntax:
params = {
    '2_hidden': ['quniform', 512, 1000, 1],
    '4_hidden': ['quniform', 128, 512, 1],
    'all_act': ['choice', [[0], ['choice', ['selu', 'elu', 'tanh']]]],
    'all_dropout': ['choice', [[0], ['uniform', 0.1, 0.5]]],
    'all_batchnorm': ['choice', [0, 1]],
    'all_l2': ['uniform', 1e-8, 1e-5],
    'optim': ['choice', ["adam", "sgd"]],

Keys are of the form {layer_num}_{hyperparameter} where layer_num can be a layer from your nn.Sequential model or all to signify all layers. Values are of the form [distribution, x] where distribution can be one of uniform, quniform, choice, etc.

For example, 2_hidden: ['quniform', 512, 1000, 1] means to sample the hidden size of layer 2 of the model (Linear(in_features=512, out_features=256)) from a quantile uniform distribution with lower bound 512, upper bound 1000 and q = 1.

all_dropout: ['choice', [[0], ['uniform', 0.1, 0.5]]] means to choose whether to apply dropout or not to all layers. choice means pick from elements in a list and [0] means False while the other choice, implicitly implied to mean true, means to sample Dropout probability from a uniform distribution with lower bound 0.1 and upper bound 0.5.

  • Edit the file to suit your needs. Concretely, you can edit the hyperparameters of HyperBand, the default learning rate, the dataset of choice, etc. There are 2 parameters that control the HyperBand algorithm:
    • max_iter: maximum number of iterations allocated to a given hyperparam config
    • eta: proportion of configs discarded in each round of successive halving.
    • epoch_scale: a boolean indicating whether max_iter should be computed in terms of mini-batch iterations or epochs. This is useful if you want to speed up HyperBand and don't want to evaluate a full pass on a large dataset.

Set max_iter to the usual amount you would train neural networks for. It's mostly a rule fo thumb, but something in the range [80, 150] epochs. Larger values of nu correspond to a more aggressive elimination schedule and thus fewer rounds of elimination. Increase to receive faster results at the cost of a sub-optimal performance. Authors advise a value of 3 or 4.

  • As a last step, depending on the last layer in your model, you may wish to edit the train_one_epoch() method in the file. The default uses F.nll_loss because it assumes the user used LogSoftmax but feel free to edit the loss to tailor to your needs.

Finally, you can run the algorithm using:


Hyperparameter Support

  • Activation
    • all
    • per layer
  • L1/L2 regularization (weights & biases)
    • all
    • per layer
  • Add Batch Norm
    • sandwiched between every layer
  • Add Dropout
    • sandwiched between every layer
  • Add Layers
    • conv Layers
    • fc Layers
  • Change Layer Params
    • change fc output size
    • change conv params
  • Optimization
    • batch size
    • learning rate
    • optimizer (adam, sgd)


  • conv nn support
  • max exploration option (s = s_max)
  • input error checking
  • improve plotting and logging
  • multi-gpu and multi-cpu support


You can’t perform that action at this time.