Skip to content

Commit

Permalink
Update hyperband (#911)
Browse files Browse the repository at this point in the history
* update hyperband

* change STEP to TRIAL_BUDGET

* modify the function of n r

* fix typo
  • Loading branch information
PurityFan authored and chicm-ms committed Apr 8, 2019
1 parent b358278 commit c3074a8
Show file tree
Hide file tree
Showing 7 changed files with 39 additions and 53 deletions.
4 changes: 2 additions & 2 deletions docs/en_US/Builtin_Tuner.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ Currently we support the following algorithms:
|[__SMAC__](#SMAC)|SMAC is based on Sequential Model-Based Optimization (SMBO). It adapts the most prominent previously used model class (Gaussian stochastic process models) and introduces the model class of random forests to SMBO, in order to handle categorical parameters. The SMAC supported by nni is a wrapper on the SMAC3 Github repo. Notice, SMAC need to be installed by `nnictl package` command. [Reference Paper,](https://www.cs.ubc.ca/~hutter/papers/10-TR-SMAC.pdf) [Github Repo](https://github.com/automl/SMAC3)|
|[__Batch tuner__](#Batch)|Batch tuner allows users to simply provide several configurations (i.e., choices of hyper-parameters) for their trial code. After finishing all the configurations, the experiment is done. Batch tuner only supports the type choice in search space spec.|
|[__Grid Search__](#GridSearch)|Grid Search performs an exhaustive searching through a manually specified subset of the hyperparameter space defined in the searchspace file. Note that the only acceptable types of search space are choice, quniform, qloguniform. The number q in quniform and qloguniform has special meaning (different from the spec in search space spec). It means the number of values that will be sampled evenly from the range low and high.|
|[__Hyperband__](#Hyperband)|Hyperband tries to use the limited resource to explore as many configurations as possible, and finds out the promising ones to get the final result. The basic idea is generating many configurations and to run them for the small number of STEPs to find out promising one, then further training those promising ones to select several more promising one.[Reference Paper](https://arxiv.org/pdf/1603.06560.pdf)|
|[__Hyperband__](#Hyperband)|Hyperband tries to use the limited resource to explore as many configurations as possible, and finds out the promising ones to get the final result. The basic idea is generating many configurations and to run them for the small number of trial budget to find out promising one, then further training those promising ones to select several more promising one.[Reference Paper](https://arxiv.org/pdf/1603.06560.pdf)|
|[__Network Morphism__](#NetworkMorphism)|Network Morphism provides functions to automatically search for architecture of deep learning models. Every child network inherits the knowledge from its parent network and morphs into diverse types of networks, including changes of depth, width, and skip-connection. Next, it estimates the value of a child network using the historic architecture and metric pairs. Then it selects the most promising one to train. [Reference Paper](https://arxiv.org/abs/1806.10282)|
|[__Metis Tuner__](#MetisTuner)|Metis offers the following benefits when it comes to tuning parameters: While most tools only predict the optimal configuration, Metis gives you two outputs: (a) current prediction of optimal configuration, and (b) suggestion for the next trial. No more guesswork. While most tools assume training datasets do not have noisy data, Metis actually tells you if you need to re-sample a particular hyper-parameter. [Reference Paper](https://www.microsoft.com/en-us/research/publication/metis-robustly-tuning-tail-latencies-cloud-systems/)|

Expand Down Expand Up @@ -233,7 +233,7 @@ It is suggested when you have limited computation resource but have relatively l
**Requirement of classArg**

* **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', the tuner will target to maximize metrics. If 'minimize', the tuner will target to minimize metrics.
* **R** (*int, optional, default = 60*) - the maximum STEPS (could be the number of mini-batches or epochs) can be allocated to a trial. Each trial should use STEPS to control how long it runs.
* **R** (*int, optional, default = 60*) - the maximum budget given to a trial (could be the number of mini-batches or epochs) can be allocated to a trial. Each trial should use TRIAL_BUDGET to control how long it runs.
* **eta** (*int, optional, default = 3*) - `(eta-1)/eta` is the proportion of discarded trials

**Usage example**
Expand Down
10 changes: 5 additions & 5 deletions docs/en_US/hyperbandAdvisor.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ advisor:
#choice: Hyperband
builtinAdvisorName: Hyperband
classArgs:
#R: the maximum STEPS
#R: the maximum trial budget
R: 100
#eta: proportion of discarded trials
eta: 3
Expand All @@ -26,13 +26,13 @@ advisor:
```

Note that once you use advisor, it is not allowed to add tuner and assessor spec in the config file any more.
If you use Hyperband, among the hyperparameters (i.e., key-value pairs) received by a trial, there is one more key called `STEPS` besides the hyperparameters defined by user. **By using this `STEPS`, the trial can control how long it runs**.
If you use Hyperband, among the hyperparameters (i.e., key-value pairs) received by a trial, there is one more key called `TRIAL_BUDGET` besides the hyperparameters defined by user. **By using this `TRIAL_BUDGET`, the trial can control how long it runs**.

For `report_intermediate_result(metric)` and `report_final_result(metric)` in your trial code, **`metric` should be either a number or a dict which has a key `default` with a number as its value**. This number is the one you want to maximize or minimize, for example, accuracy or loss.

`R` and `eta` are the parameters of Hyperband that you can change. `R` means the maximum STEPS that can be allocated to a configuration. Here, STEPS could mean the number of epochs or mini-batches. This `STEPS` should be used by the trial to control how long it runs. Refer to the example under `examples/trials/mnist-hyperband/` for details.
`R` and `eta` are the parameters of Hyperband that you can change. `R` means the maximum trial budget that can be allocated to a configuration. Here, trial budget could mean the number of epochs or mini-batches. This `TRIAL_BUDGET` should be used by the trial to control how long it runs. Refer to the example under `examples/trials/mnist-advisor/` for details.

`eta` means `n/eta` configurations from `n` configurations will survive and rerun using more STEPS.
`eta` means `n/eta` configurations from `n` configurations will survive and rerun using more budgets.

Here is a concrete example of `R=81` and `eta=3`:

Expand All @@ -45,7 +45,7 @@ Here is a concrete example of `R=81` and `eta=3`:
|3 |3 27 |1 81 | | | |
|4 |1 81 | | | | |

`s` means bucket, `n` means the number of configurations that are generated, the corresponding `r` means how many STEPS these configurations run. `i` means round, for example, bucket 4 has 5 rounds, bucket 3 has 4 rounds.
`s` means bucket, `n` means the number of configurations that are generated, the corresponding `r` means how many budgets these configurations run. `i` means round, for example, bucket 4 has 5 rounds, bucket 3 has 4 rounds.

About how to write trial code, please refer to the instructions under `examples/trials/mnist-hyperband/`.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,11 +9,11 @@ searchSpacePath: search_space.json
#choice: true, false
useAnnotation: false
advisor:
#choice: Hyperband
#choice: Hyperband, BOHB
builtinAdvisorName: Hyperband
classArgs:
#R: the maximum STEPS (could be the number of mini-batches or epochs) can be
# allocated to a trial. Each trial should use STEPS to control how long it runs.
#R: the maximum trial budget (could be the number of mini-batches or epochs) can be
# allocated to a trial. Each trial should use trial budget to control how long it runs.
R: 100
#eta: proportion of discarded trials
eta: 3
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,10 +9,10 @@ searchSpacePath: search_space.json
#choice: true, false
useAnnotation: false
advisor:
#choice: Hyperband
#choice: Hyperband, BOHB
builtinAdvisorName: Hyperband
classArgs:
#R: the maximum STEPS
#R: the maximum trial budget
R: 100
#eta: proportion of discarded trials
eta: 3
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
"""A deep MNIST classifier using convolutional layers."""

import argparse
import logging
import math
import tempfile
Expand All @@ -17,7 +18,7 @@

class MnistNetwork(object):
'''
MnistNetwork is for initlizing and building basic network for mnist.
MnistNetwork is for initializing and building basic network for mnist.
'''
def __init__(self,
channel_1_num,
Expand Down Expand Up @@ -188,7 +189,7 @@ def main(params):
mnist_network.keep_prob: 1 - params['dropout_rate']}
)

if i % 10 == 0:
if i % 100 == 0:
test_acc = mnist_network.accuracy.eval(
feed_dict={mnist_network.images: mnist.test.images,
mnist_network.labels: mnist.test.labels,
Expand All @@ -207,38 +208,31 @@ def main(params):
logger.debug('Final result is %g', test_acc)
logger.debug('Send final result done.')


def generate_default_params():
'''
Generate default parameters for mnist network.
'''
params = {
'data_dir': '/tmp/tensorflow/mnist/input_data',
'dropout_rate': 0.5,
'channel_1_num': 32,
'channel_2_num': 64,
'conv_size': 5,
'pool_size': 2,
'hidden_size': 1024,
'learning_rate': 1e-4,
'batch_size': 32}
return params

def get_params():
''' Get parameters from command line '''
parser = argparse.ArgumentParser()
parser.add_argument("--data_dir", type=str, default='/tmp/tensorflow/mnist/input_data', help="data directory")
parser.add_argument("--dropout_rate", type=float, default=0.5, help="dropout rate")
parser.add_argument("--channel_1_num", type=int, default=32)
parser.add_argument("--channel_2_num", type=int, default=64)
parser.add_argument("--conv_size", type=int, default=5)
parser.add_argument("--pool_size", type=int, default=2)
parser.add_argument("--hidden_size", type=int, default=1024)
parser.add_argument("--learning_rate", type=float, default=1e-4)
parser.add_argument("--batch_num", type=int, default=2700)
parser.add_argument("--batch_size", type=int, default=32)

args, _ = parser.parse_known_args()
return args

if __name__ == '__main__':
try:
# get parameters form tuner
RCV_PARAMS = nni.get_next_parameter()
logger.debug(RCV_PARAMS)
# run
params = generate_default_params()
params.update(RCV_PARAMS)
'''
If you use Hyperband, among the hyperparameters (i.e., key-value pairs) received by a trial,
there is one more key called `STEPS` besides the hyperparameters defined by user.
By using this `STEPS`, the trial can control how long it runs.
'''
params['batch_num'] = RCV_PARAMS['STEPS'] * 10
tuner_params = nni.get_next_parameter()
logger.debug(tuner_params)
tuner_params['batch_num'] = tuner_params['TRIAL_BUDGET'] * 100
params = vars(get_params())
params.update(tuner_params)
main(params)
except Exception as exception:
logger.exception(exception)
Expand Down
14 changes: 3 additions & 11 deletions src/sdk/pynni/nni/hyperband_advisor/hyperband_advisor.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@
_logger = logging.getLogger(__name__)

_next_parameter_id = 0
_KEY = 'STEPS'
_KEY = 'TRIAL_BUDGET'
_epsilon = 1e-6

@unique
Expand Down Expand Up @@ -320,7 +320,7 @@ def __init__(self, R, eta=3, optimize_mode='maximize'):
def load_checkpoint(self):
pass

def save_checkpont(self):
def save_checkpoint(self):
pass

def handle_initialize(self, data):
Expand Down Expand Up @@ -351,15 +351,7 @@ def _request_one_trial_job(self):
"""get one trial job, i.e., one hyperparameter configuration."""
if not self.generated_hyper_configs:
if self.curr_s < 0:
# have tried all configurations
ret = {
'parameter_id': '-1_0_0',
'parameter_source': 'algorithm',
'parameters': ''
}
send(CommandType.NoMoreTrialJobs, json_tricks.dumps(ret))
self.credit += 1
return True
self.curr_s = self.s_max
_logger.debug('create a new bracket, self.curr_s=%d', self.curr_s)
self.brackets[self.curr_s] = Bracket(self.curr_s, self.s_max, self.eta, self.R, self.optimize_mode)
next_n, next_r = self.brackets[self.curr_s].get_n_r()
Expand Down

0 comments on commit c3074a8

Please sign in to comment.