Update hyperband (#911)

* update hyperband * change STEP to TRIAL_BUDGET * modify the function of n r * fix typo
microsoft · Apr 8, 2019 · c3074a8 · c3074a8
1 parent b358278
commit c3074a8
Show file tree

Hide file tree

Showing 7 changed files with 39 additions and 53 deletions.
diff --git a/docs/en_US/Builtin_Tuner.md b/docs/en_US/Builtin_Tuner.md
@@ -15,7 +15,7 @@ Currently we support the following algorithms:
 |[__SMAC__](#SMAC)|SMAC is based on Sequential Model-Based Optimization (SMBO). It adapts the most prominent previously used model class (Gaussian stochastic process models) and introduces the model class of random forests to SMBO, in order to handle categorical parameters. The SMAC supported by nni is a wrapper on the SMAC3 Github repo. Notice, SMAC need to be installed by `nnictl package` command. [Reference Paper,](https://www.cs.ubc.ca/~hutter/papers/10-TR-SMAC.pdf) [Github Repo](https://github.com/automl/SMAC3)|
 |[__Batch tuner__](#Batch)|Batch tuner allows users to simply provide several configurations (i.e., choices of hyper-parameters) for their trial code. After finishing all the configurations, the experiment is done. Batch tuner only supports the type choice in search space spec.|
 |[__Grid Search__](#GridSearch)|Grid Search performs an exhaustive searching through a manually specified subset of the hyperparameter space defined in the searchspace file. Note that the only acceptable types of search space are choice, quniform, qloguniform. The number q in quniform and qloguniform has special meaning (different from the spec in search space spec). It means the number of values that will be sampled evenly from the range low and high.|
-|[__Hyperband__](#Hyperband)|Hyperband tries to use the limited resource to explore as many configurations as possible, and finds out the promising ones to get the final result. The basic idea is generating many configurations and to run them for the small number of STEPs to find out promising one, then further training those promising ones to select several more promising one.[Reference Paper](https://arxiv.org/pdf/1603.06560.pdf)|
+|[__Hyperband__](#Hyperband)|Hyperband tries to use the limited resource to explore as many configurations as possible, and finds out the promising ones to get the final result. The basic idea is generating many configurations and to run them for the small number of trial budget to find out promising one, then further training those promising ones to select several more promising one.[Reference Paper](https://arxiv.org/pdf/1603.06560.pdf)|
 |[__Network Morphism__](#NetworkMorphism)|Network Morphism provides functions to automatically search for architecture of deep learning models. Every child network inherits the knowledge from its parent network and morphs into diverse types of networks, including changes of depth, width, and skip-connection. Next, it estimates the value of a child network using the historic architecture and metric pairs. Then it selects the most promising one to train. [Reference Paper](https://arxiv.org/abs/1806.10282)|
 |[__Metis Tuner__](#MetisTuner)|Metis offers the following benefits when it comes to tuning parameters: While most tools only predict the optimal configuration, Metis gives you two outputs: (a) current prediction of optimal configuration, and (b) suggestion for the next trial. No more guesswork. While most tools assume training datasets do not have noisy data, Metis actually tells you if you need to re-sample a particular hyper-parameter. [Reference Paper](https://www.microsoft.com/en-us/research/publication/metis-robustly-tuning-tail-latencies-cloud-systems/)|
 
@@ -233,7 +233,7 @@ It is suggested when you have limited computation resource but have relatively l
 **Requirement of classArg**
 
 * **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', the tuner will target to maximize metrics. If 'minimize', the tuner will target to minimize metrics.
-* **R** (*int, optional, default = 60*) - the maximum STEPS (could be the number of mini-batches or epochs) can be allocated to a trial. Each trial should use STEPS to control how long it runs.
+* **R** (*int, optional, default = 60*) - the maximum budget given to a trial (could be the number of mini-batches or epochs) can be allocated to a trial. Each trial should use TRIAL_BUDGET to control how long it runs.
 * **eta** (*int, optional, default = 3*) - `(eta-1)/eta` is the proportion of discarded trials
 
 **Usage example**

diff --git a/docs/en_US/hyperbandAdvisor.md b/docs/en_US/hyperbandAdvisor.md
@@ -17,7 +17,7 @@ advisor:
   #choice: Hyperband
   builtinAdvisorName: Hyperband
   classArgs:
-    #R: the maximum STEPS
+    #R: the maximum trial budget
     R: 100
     #eta: proportion of discarded trials
     eta: 3
@@ -26,13 +26,13 @@ advisor:
 ```
 
 Note that once you use advisor, it is not allowed to add tuner and assessor spec in the config file any more.
-If you use Hyperband, among the hyperparameters (i.e., key-value pairs) received by a trial, there is one more key called `STEPS` besides the hyperparameters defined by user. **By using this `STEPS`, the trial can control how long it runs**.
+If you use Hyperband, among the hyperparameters (i.e., key-value pairs) received by a trial, there is one more key called `TRIAL_BUDGET` besides the hyperparameters defined by user. **By using this `TRIAL_BUDGET`, the trial can control how long it runs**.
 
 For `report_intermediate_result(metric)` and `report_final_result(metric)` in your trial code, **`metric` should be either a number or a dict which has a key `default` with a number as its value**. This number is the one you want to maximize or minimize, for example, accuracy or loss.
 
-`R` and `eta` are the parameters of Hyperband that you can change. `R` means the maximum STEPS that can be allocated to a configuration. Here, STEPS could mean the number of epochs or mini-batches. This `STEPS` should be used by the trial to control how long it runs. Refer to the example under `examples/trials/mnist-hyperband/` for details.
+`R` and `eta` are the parameters of Hyperband that you can change. `R` means the maximum trial budget that can be allocated to a configuration. Here, trial budget could mean the number of epochs or mini-batches. This `TRIAL_BUDGET` should be used by the trial to control how long it runs. Refer to the example under `examples/trials/mnist-advisor/` for details.
 
-`eta` means `n/eta` configurations from `n` configurations will survive and rerun using more STEPS.
+`eta` means `n/eta` configurations from `n` configurations will survive and rerun using more budgets.
 
 Here is a concrete example of `R=81` and `eta=3`:
 
@@ -45,7 +45,7 @@ Here is a concrete example of `R=81` and `eta=3`:
 |3     |3 27 |1 81 |     |     |     |
 |4     |1 81 |     |     |     |     |
 
-`s` means bucket, `n` means the number of configurations that are generated, the corresponding `r` means how many STEPS these configurations run. `i` means round, for example, bucket 4 has 5 rounds, bucket 3 has 4 rounds.
+`s` means bucket, `n` means the number of configurations that are generated, the corresponding `r` means how many budgets these configurations run. `i` means round, for example, bucket 4 has 5 rounds, bucket 3 has 4 rounds.
 
 About how to write trial code, please refer to the instructions under `examples/trials/mnist-hyperband/`.
 

diff --git a/examples/trials/mnist-hyperband/config.yml → ...trials/mnist-advisor/config_hyperband.yml b/examples/trials/mnist-hyperband/config.yml → ...trials/mnist-advisor/config_hyperband.yml
@@ -9,11 +9,11 @@ searchSpacePath: search_space.json
 #choice: true, false
 useAnnotation: false
 advisor:
-  #choice: Hyperband
+  #choice: Hyperband, BOHB
   builtinAdvisorName: Hyperband
   classArgs:
-    #R: the maximum STEPS (could be the number of mini-batches or epochs) can be
-    #   allocated to a trial. Each trial should use STEPS to control how long it runs.
+    #R: the maximum trial budget (could be the number of mini-batches or epochs) can be
+    #   allocated to a trial. Each trial should use trial budget to control how long it runs.
     R: 100
     #eta: proportion of discarded trials
     eta: 3

diff --git a/...les/trials/mnist-hyperband/config_pai.yml → examples/trials/mnist-advisor/config_pai.yml b/...les/trials/mnist-hyperband/config_pai.yml → examples/trials/mnist-advisor/config_pai.yml
@@ -9,10 +9,10 @@ searchSpacePath: search_space.json
 #choice: true, false
 useAnnotation: false
 advisor:
-  #choice: Hyperband
+  #choice: Hyperband, BOHB
   builtinAdvisorName: Hyperband
   classArgs:
-    #R: the maximum STEPS
+    #R: the maximum trial budget
     R: 100
     #eta: proportion of discarded trials
     eta: 3

diff --git a/examples/trials/mnist-hyperband/mnist.py → examples/trials/mnist-advisor/mnist.py b/examples/trials/mnist-hyperband/mnist.py → examples/trials/mnist-advisor/mnist.py
@@ -1,5 +1,6 @@
 """A deep MNIST classifier using convolutional layers."""
 
+import argparse
 import logging
 import math
 import tempfile
@@ -17,7 +18,7 @@
 
 class MnistNetwork(object):
     '''
-    MnistNetwork is for initlizing and building basic network for mnist.
+    MnistNetwork is for initializing and building basic network for mnist.
     '''
     def __init__(self,
                  channel_1_num,
@@ -188,7 +189,7 @@ def main(params):
                                                     mnist_network.keep_prob: 1 - params['dropout_rate']}
                                         )
 
-            if i % 10 == 0:
+            if i % 100 == 0:
                 test_acc = mnist_network.accuracy.eval(
                     feed_dict={mnist_network.images: mnist.test.images,
                                mnist_network.labels: mnist.test.labels,
@@ -207,38 +208,31 @@ def main(params):
         logger.debug('Final result is %g', test_acc)
         logger.debug('Send final result done.')
 
-
-def generate_default_params():
-    '''
-    Generate default parameters for mnist network.
-    '''
-    params = {
-        'data_dir': '/tmp/tensorflow/mnist/input_data',
-        'dropout_rate': 0.5,
-        'channel_1_num': 32,
-        'channel_2_num': 64,
-        'conv_size': 5,
-        'pool_size': 2,
-        'hidden_size': 1024,
-        'learning_rate': 1e-4,
-        'batch_size': 32}
-    return params
-
+def get_params():
+    ''' Get parameters from command line '''
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--data_dir", type=str, default='/tmp/tensorflow/mnist/input_data', help="data directory")
+    parser.add_argument("--dropout_rate", type=float, default=0.5, help="dropout rate")
+    parser.add_argument("--channel_1_num", type=int, default=32)
+    parser.add_argument("--channel_2_num", type=int, default=64)
+    parser.add_argument("--conv_size", type=int, default=5)
+    parser.add_argument("--pool_size", type=int, default=2)
+    parser.add_argument("--hidden_size", type=int, default=1024)
+    parser.add_argument("--learning_rate", type=float, default=1e-4)
+    parser.add_argument("--batch_num", type=int, default=2700)
+    parser.add_argument("--batch_size", type=int, default=32)
+
+    args, _ = parser.parse_known_args()
+    return args
 
 if __name__ == '__main__':
     try:
         # get parameters form tuner
-        RCV_PARAMS = nni.get_next_parameter()
-        logger.debug(RCV_PARAMS)
-        # run
-        params = generate_default_params()
-        params.update(RCV_PARAMS)
-        '''
-        If you use Hyperband, among the hyperparameters (i.e., key-value pairs) received by a trial, 
-        there is one more key called `STEPS` besides the hyperparameters defined by user. 
-        By using this `STEPS`, the trial can control how long it runs.
-        '''
-        params['batch_num'] = RCV_PARAMS['STEPS'] * 10
+        tuner_params = nni.get_next_parameter()
+        logger.debug(tuner_params)
+        tuner_params['batch_num'] = tuner_params['TRIAL_BUDGET'] * 100
+        params = vars(get_params())
+        params.update(tuner_params)
         main(params)
     except Exception as exception:
         logger.exception(exception)

diff --git a/.../trials/mnist-hyperband/search_space.json → ...es/trials/mnist-advisor/search_space.json b/.../trials/mnist-hyperband/search_space.json → ...es/trials/mnist-advisor/search_space.json
diff --git a/src/sdk/pynni/nni/hyperband_advisor/hyperband_advisor.py b/src/sdk/pynni/nni/hyperband_advisor/hyperband_advisor.py
@@ -37,7 +37,7 @@
 _logger = logging.getLogger(__name__)
 
 _next_parameter_id = 0
-_KEY = 'STEPS'
+_KEY = 'TRIAL_BUDGET'
 _epsilon = 1e-6
 
 @unique
@@ -320,7 +320,7 @@ def __init__(self, R, eta=3, optimize_mode='maximize'):
     def load_checkpoint(self):
         pass
 
-    def save_checkpont(self):
+    def save_checkpoint(self):
         pass
 
     def handle_initialize(self, data):
@@ -351,15 +351,7 @@ def _request_one_trial_job(self):
         """get one trial job, i.e., one hyperparameter configuration."""
         if not self.generated_hyper_configs:
             if self.curr_s < 0:
-                # have tried all configurations
-                ret = {
-                    'parameter_id': '-1_0_0',
-                    'parameter_source': 'algorithm',
-                    'parameters': ''
-                }
-                send(CommandType.NoMoreTrialJobs, json_tricks.dumps(ret))
-                self.credit += 1
-                return True
+                self.curr_s = self.s_max
             _logger.debug('create a new bracket, self.curr_s=%d', self.curr_s)
             self.brackets[self.curr_s] = Bracket(self.curr_s, self.s_max, self.eta, self.R, self.optimize_mode)
             next_n, next_r = self.brackets[self.curr_s].get_n_r()