[tune] Add documentation for reproducible runs (setting seeds) (#18849)

Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
ray-project · Sep 24, 2021 · 9b0d804 · 9b0d804
1 parent 7c99aae
commit 9b0d804
Showing 1 changed file with 63 additions and 0 deletions.
diff --git a/doc/source/tune/user-guide.rst b/doc/source/tune/user-guide.rst
@@ -218,6 +218,69 @@ During training, Tune will automatically log the below metrics in addition to th
 
 All of these metrics can be seen in the ``Trial.last_result`` dictionary.
 
+.. _tune-reproducible:
+
+Reproducible runs
+-----------------
+Exact reproducibility of machine learning runs is hard to achieve. This
+is even more true in a distributed setting, as more non-determinism is
+introduced. For instance, if two trials finish at the same time, the
+convergence of the search algorithm might be influenced by which trial
+result is processed first. This depends on the searcher - for random search,
+this shouldn't make a difference, but for most other searchers it will.
+
+If you try to achieve some amount of reproducibility, there are two
+places where you'll have to set random seeds:
+
+1. On the driver program, e.g. for the search algorithm. This will ensure
+   that at least the initial configurations suggested by the search
+   algorithms are the same.
+
+2. In the trainable (if required). Neural networks are usually initialized
+   with random numbers, and many classical ML algorithms, like GBDTs, make use of
+   randomness. Thus you'll want to make sure to set a seed here
+   so that the initialization is always the same.
+
+Here is an example that will always produce the same result (except for trial
+runtimes).
+
+.. code-block:: python
+
+    import numpy as np
+    from ray import tune
+
+
+    def train(config):
+        # Set seed for trainable random result.
+        # If you remove this line, you will get different results
+        # each time you run the trial, even if the configuration
+        # is the same.
+        np.random.seed(config["seed"])
+        random_result = np.random.uniform(0, 100, size=1).item()
+        tune.report(result=random_result)
+
+
+    # Set seed for Ray Tune's random search.
+    # If you remove this line, you will get different configurations
+    # each time you run the script.
+    np.random.seed(1234)
+    tune.run(
+        train,
+        config={
+            "seed": tune.randint(0, 1000)
+        },
+        search_alg=tune.suggest.BasicVariantGenerator(),
+        num_samples=10)
+
+Some searchers use their own random states to sample new configurations.
+These searchers usually accept a ``seed`` parameter that can be passed on
+initialization. Other searchers use Numpy's ``np.random`` interface -
+these seeds can be then set with ``np.random.seed()``. We don't offer an
+interface to do this in the searcher classes as setting a random seed
+globally could have side effects. For instance, it could influence the
+way your dataset is split. Thus, we leave it up to the user to make
+these global configuration changes.
+
 .. _tune-checkpoint:
 
 Checkpointing