Skip to content

Commit

Permalink
[tune] Add documentation for reproducible runs (setting seeds) (#18849)
Browse files Browse the repository at this point in the history
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
  • Loading branch information
krfricke and Yard1 authored Sep 24, 2021
1 parent 7c99aae commit 9b0d804
Showing 1 changed file with 63 additions and 0 deletions.
63 changes: 63 additions & 0 deletions doc/source/tune/user-guide.rst
Original file line number Diff line number Diff line change
Expand Up @@ -218,6 +218,69 @@ During training, Tune will automatically log the below metrics in addition to th

All of these metrics can be seen in the ``Trial.last_result`` dictionary.

.. _tune-reproducible:

Reproducible runs
-----------------
Exact reproducibility of machine learning runs is hard to achieve. This
is even more true in a distributed setting, as more non-determinism is
introduced. For instance, if two trials finish at the same time, the
convergence of the search algorithm might be influenced by which trial
result is processed first. This depends on the searcher - for random search,
this shouldn't make a difference, but for most other searchers it will.

If you try to achieve some amount of reproducibility, there are two
places where you'll have to set random seeds:

1. On the driver program, e.g. for the search algorithm. This will ensure
that at least the initial configurations suggested by the search
algorithms are the same.

2. In the trainable (if required). Neural networks are usually initialized
with random numbers, and many classical ML algorithms, like GBDTs, make use of
randomness. Thus you'll want to make sure to set a seed here
so that the initialization is always the same.

Here is an example that will always produce the same result (except for trial
runtimes).

.. code-block:: python
import numpy as np
from ray import tune
def train(config):
# Set seed for trainable random result.
# If you remove this line, you will get different results
# each time you run the trial, even if the configuration
# is the same.
np.random.seed(config["seed"])
random_result = np.random.uniform(0, 100, size=1).item()
tune.report(result=random_result)
# Set seed for Ray Tune's random search.
# If you remove this line, you will get different configurations
# each time you run the script.
np.random.seed(1234)
tune.run(
train,
config={
"seed": tune.randint(0, 1000)
},
search_alg=tune.suggest.BasicVariantGenerator(),
num_samples=10)
Some searchers use their own random states to sample new configurations.
These searchers usually accept a ``seed`` parameter that can be passed on
initialization. Other searchers use Numpy's ``np.random`` interface -
these seeds can be then set with ``np.random.seed()``. We don't offer an
interface to do this in the searcher classes as setting a random seed
globally could have side effects. For instance, it could influence the
way your dataset is split. Thus, we leave it up to the user to make
these global configuration changes.

.. _tune-checkpoint:

Checkpointing
Expand Down

0 comments on commit 9b0d804

Please sign in to comment.