Skip to content

Latest commit

 

History

History
227 lines (154 loc) · 8.2 KB

tuner.rst

File metadata and controls

227 lines (154 loc) · 8.2 KB

Configuring Hyperparameter Tuning

The Ray AIR Tuner <ray.tune.Tuner> is the recommended way to tune hyperparameters in Ray AIR.

The Tuner will take in a Trainer and execute multiple training runs, each with different hyperparameter configurations.

The Tuner will take in a Trainer and execute multiple training runs, each with different hyperparameter configurations.

As part of Ray Tune, the Tuner provides an interface that works with AIR Trainers to perform distributed hyperparameter tuning. It provides a variety of state-of-the-art hyperparameter tuning algorithms for optimizing model performance.

What follows next is basic coverage of what a Tuner is and how you can use it for basic examples. If you are interested in reading more, please take a look at the Ray Tune documentation <tune-main>.

Key Concepts

There are a number of key concepts that dictate proper use of a Tuner:

  • A set of hyperparameters you want to tune in a search space.
  • A search algorithm to effectively optimize your parameters and optionally use a scheduler to stop searches early and speed up your experiments.
  • The search space, search algorithm, scheduler, and Trainer are passed to a Tuner, which runs the hyperparameter tuning workload by evaluating multiple hyperparameters in parallel.
  • Each individual hyperparameter evaluation run is called a trial.
  • The Tuner returns its results in a ResultGrid.

Note

Tuners can also be used to launch hyperparameter tuning without using Ray AIR Trainers. See the Ray Tune documentation <tune-main> for more guides and examples.

Basic usage

Below, we demonstrate how you can use a Trainer object with a Tuner.

doc_code/tuner.py

How to configure a search space?

A Tuner takes in a param_space argument where you can define the search space from which hyperparameter configurations will be sampled.

Depending on the model and dataset, you may want to tune:

  • The training batch size
  • The learning rate for deep learning training (e.g., image classification)
  • The maximum depth for tree-based models (e.g., XGBoost)

The following shows some example code on how to specify the param_space.

XGBoost

doc_code/tuner.py

Pytorch

doc_code/tuner.py

Read more about Tune search spaces here <tune-search-space-tutorial>.

You can use a Tuner to tune most arguments and configurations in Ray AIR, including but not limited to:

  • Ray Data
  • Preprocessors
  • Scaling configurations
  • and other hyperparameters.

There are a couple gotchas about parameter specification when using Tuners with Trainers:

  • By default, configuration dictionaries and config objects will be deep-merged.
  • Parameters that are duplicated in the Trainer and Tuner will be overwritten by the Tuner param_space.
  • Exception: all arguments of the RunConfig <ray.air.config.RunConfig> and TuneConfig <ray.tune.tune_config.TuneConfig> are inherently un-tunable.

See /tune/tutorials/tune_get_data_in_and_out for an example.

How to configure a Tuner?

There are two main configuration objects that can be passed into a Tuner: the TuneConfig <ray.tune.tune_config.TuneConfig> and the RunConfig <ray.air.config.RunConfig>.

The TuneConfig <ray.tune.tune_config.TuneConfig> contains tuning specific settings, including:

  • the tuning algorithm to use
  • the metric and mode to rank results
  • the amount of parallelism to use

Here are some common configurations for `TuneConfig`:

doc_code/tuner.py

See the TuneConfig API reference <ray.tune.tune_config.TuneConfig> for more details.

The RunConfig <ray.air.config.RunConfig> contains configurations that are more generic than tuning specific settings. This may include:

  • failure/retry configurations
  • verbosity levels
  • the name of the experiment
  • the logging directory
  • checkpoint configurations
  • custom callbacks
  • integration with cloud storage

Below we showcase some common configurations of RunConfig <ray.air.config.RunConfig>.

doc_code/tuner.py

See the RunConfig API reference <ray.air.config.RunConfig> for more details.

How to specify parallelism?

You can specify parallelism via the TuneConfig <ray.tune.tune_config.TuneConfig> by setting the following flags:

  • num_samples which specifies the number of trials to run in total
  • max_concurrent_trials which specifies the max number of trials to run concurrently

Note that actual parallelism can be less than max_concurrent_trials and will be determined by how many trials can fit in the cluster at once (i.e., if you have a trial that requires 16 GPUs, your cluster has 32 GPUs, and max_concurrent_trials=10, the Tuner can only run 2 trials concurrently).

doc_code/tuner.py

Read more about this in tune-parallelism section.

How to specify an optimization algorithm?

You can specify your hyperparameter optimization method via the TuneConfig <ray.tune.tune_config.TuneConfig> by setting the following flags:

  • search_alg which provides an optimizer for selecting the optimal hyperparameters
  • scheduler which provides a scheduling/resource allocation algorithm for accelerating the search process

doc_code/tuner.py

Read more about this in the Search Algorithm <search-alg-ref> and Scheduler <schedulers-ref> section.

How to analyze results?

Tuner.fit() generates a ResultGrid <result-grid-docstring> object. This object contains metrics, results, and checkpoints of each trial. Below is a simple example:

doc_code/tuner.py

See /tune/examples/tune_analyze_results for more usage examples.

Advanced Tuning

Tuners also offer the ability to tune different data preprocessing steps, as shown in the following snippet.

doc_code/tuner.py

Additionally, you can sample different train/validation datasets:

doc_code/tuner.py

Restoring and resuming

A Tuner regularly saves its state, so that a tuning run can be resumed after being interrupted.

Additionally, if trials fail during a tuning run, they can be retried - either from scratch or from the latest available checkpoint.

To restore the Tuner state, pass the path to the experiment directory as an argument to Tuner.restore(...).

This path is obtained from the output of a tuning run, namely "Result logdir". However, if you specify a name in the RunConfig <ray.air.config.RunConfig>, it is located under ~/ray_results/<name>.

doc_code/tuner.py

For more resume options, please see the documentation of Tuner.restore() <ray.tune.tuner.Tuner.restore>.