[RLlib] Documentation do-over 01: Announce new API stack as alpha; ad…

…d hints to all RLlib pages; describe how to use it in new page. (ray-project#44090)
ryanaoleary · Jun 7, 2024 · d1ff9ad · d1ff9ad
1 parent 9d16d30
commit d1ff9ad
Show file tree

Hide file tree

Showing 44 changed files with 442 additions and 24 deletions.
diff --git a/doc/source/_includes/rllib/new_api_stack.rst b/doc/source/_includes/rllib/new_api_stack.rst
@@ -0,0 +1,11 @@
+.. note::
+
+    Ray 2.10.0 introduces the alpha stage of RLlib's "new API stack".
+    The Ray Team plans to transition algorithms, example scripts, and documentation to the new code base
+    thereby incrementally replacing the "old API stack" (e.g., ModelV2, Policy, RolloutWorker) throughout the subsequent minor releases leading up to Ray 3.0.
+
+    Note, however, that so far only PPO (single- and multi-agent) and SAC (single-agent only)
+    support the "new API stack" and continue to run by default with the old APIs.
+    You can continue to use the existing custom (old stack) classes.
+
+    `See </rllib/package_ref/rllib-new-api-stack.html>`__ for more details on how to use the new API stack.
diff --git a/doc/source/_includes/rllib/new_api_stack_component.rst b/doc/source/_includes/rllib/new_api_stack_component.rst
@@ -0,0 +1,3 @@
+.. note::
+
+    This doc is related to RLlib's `new API stack </rllib/package_ref/rllib-new-api-stack.html>`__ and therefore experimental.
diff --git a/doc/source/_includes/rllib/rlm_learner_migration_banner.rst b/doc/source/_includes/rllib/rlm_learner_migration_banner.rst
diff --git a/doc/source/_includes/rllib/rlmodules_rollout.rst b/doc/source/_includes/rllib/rlmodules_rollout.rst
diff --git a/doc/source/rllib/doc_code/new_api_stack.py b/doc/source/rllib/doc_code/new_api_stack.py
@@ -0,0 +1,135 @@
+# __enabling-new-api-stack-sa-ppo-begin__
+
+from ray.rllib.algorithms.ppo import PPOConfig
+from ray.rllib.env.single_agent_env_runner import SingleAgentEnvRunner
+
+
+config = (
+    PPOConfig().environment("CartPole-v1")
+    # Switch the new API stack flag to True (False by default).
+    # This enables the use of the RLModule (replaces ModelV2) AND Learner (replaces
+    # Policy) classes.
+    .experimental(_enable_new_api_stack=True)
+    # However, the above flag only activates the RLModule and Learner APIs. In order
+    # to utilize all of the new API stack's classes, you also have to specify the
+    # EnvRunner (replaces RolloutWorker) to use.
+    # Note that this step will be fully automated in the next release.
+    # Set the `env_runner_cls` to `SingleAgentEnvRunner` for single-agent setups and
+    # `MultiAgentEnvRunner` for multi-agent cases.
+    .rollouts(env_runner_cls=SingleAgentEnvRunner)
+    # We are using a simple 1-CPU setup here for learning. However, as the new stack
+    # supports arbitrary scaling on the learner axis, feel free to set
+    # `num_learner_workers` to the number of available GPUs for multi-GPU training (and
+    # `num_gpus_per_learner_worker=1`).
+    .resources(
+        num_learner_workers=0,  # <- in most cases, set this value to the number of GPUs
+        num_gpus_per_learner_worker=0,  # <- set this to 1, if you have at least 1 GPU
+        num_cpus_for_local_worker=1,
+    )
+    # When using RLlib's default models (RLModules) AND the new EnvRunners, you should
+    # set this flag in your model config. Having to set this, will no longer be required
+    # in the near future. It does yield a small performance advantage as value function
+    # predictions for PPO are no longer required to happen on the sampler side (but are
+    # now fully located on the learner side, which might have GPUs available).
+    .training(model={"uses_new_env_runners": True})
+)
+
+# __enabling-new-api-stack-sa-ppo-end__
+
+# Test whether it works.
+print(config.build().train())
+
+
+# __enabling-new-api-stack-ma-ppo-begin__
+
+from ray.rllib.algorithms.ppo import PPOConfig  # noqa
+from ray.rllib.env.multi_agent_env_runner import MultiAgentEnvRunner  # noqa
+from ray.rllib.examples.env.multi_agent import MultiAgentCartPole  # noqa
+
+
+# A typical multi-agent setup (otherwise using the exact same parameters as before)
+# looks like this.
+config = (
+    PPOConfig().environment(MultiAgentCartPole, env_config={"num_agents": 2})
+    # Switch the new API stack flag to True (False by default).
+    # This enables the use of the RLModule (replaces ModelV2) AND Learner (replaces
+    # Policy) classes.
+    .experimental(_enable_new_api_stack=True)
+    # However, the above flag only activates the RLModule and Learner APIs. In order
+    # to utilize all of the new API stack's classes, you also have to specify the
+    # EnvRunner (replaces RolloutWorker) to use.
+    # Note that this step will be fully automated in the next release.
+    # Set the `env_runner_cls` to `SingleAgentEnvRunner` for single-agent setups and
+    # `MultiAgentEnvRunner` for multi-agent cases.
+    .rollouts(env_runner_cls=MultiAgentEnvRunner)
+    # We are using a simple 1-CPU setup here for learning. However, as the new stack
+    # supports arbitrary scaling on the learner axis, feel free to set
+    # `num_learner_workers` to the number of available GPUs for multi-GPU training (and
+    # `num_gpus_per_learner_worker=1`).
+    .resources(
+        num_learner_workers=0,  # <- in most cases, set this value to the number of GPUs
+        num_gpus_per_learner_worker=0,  # <- set this to 1, if you have at least 1 GPU
+        num_cpus_for_local_worker=1,
+    )
+    # When using RLlib's default models (RLModules) AND the new EnvRunners, you should
+    # set this flag in your model config. Having to set this, will no longer be required
+    # in the near future. It does yield a small performance advantage as value function
+    # predictions for PPO are no longer required to happen on the sampler side (but are
+    # now fully located on the learner side, which might have GPUs available).
+    .training(model={"uses_new_env_runners": True})
+    # Because you are in a multi-agent env, you have to set up the usual multi-agent
+    # parameters:
+    .multi_agent(
+        policies={"p0", "p1"},
+        # Map agent 0 to p0 and agent 1 to p1.
+        policy_mapping_fn=lambda agent_id, episode, **kwargs: f"p{agent_id}",
+    )
+)
+
+# __enabling-new-api-stack-ma-ppo-end__
+
+# Test whether it works.
+print(config.build().train())
+
+
+# __enabling-new-api-stack-sa-sac-begin__
+
+from ray.rllib.algorithms.sac import SACConfig  # noqa
+from ray.rllib.env.single_agent_env_runner import SingleAgentEnvRunner  # noqa
+
+
+config = (
+    SACConfig().environment("Pendulum-v1")
+    # Switch the new API stack flag to True (False by default).
+    # This enables the use of the RLModule (replaces ModelV2) AND Learner (replaces
+    # Policy) classes.
+    .experimental(_enable_new_api_stack=True)
+    # However, the above flag only activates the RLModule and Learner APIs. In order
+    # to utilize all of the new API stack's classes, you also have to specify the
+    # EnvRunner (replaces RolloutWorker) to use.
+    # Note that this step will be fully automated in the next release.
+    .rollouts(env_runner_cls=SingleAgentEnvRunner)
+    # We are using a simple 1-CPU setup here for learning. However, as the new stack
+    # supports arbitrary scaling on the learner axis, feel free to set
+    # `num_learner_workers` to the number of available GPUs for multi-GPU training (and
+    # `num_gpus_per_learner_worker=1`).
+    .resources(
+        num_learner_workers=0,  # <- in most cases, set this value to the number of GPUs
+        num_gpus_per_learner_worker=0,  # <- set this to 1, if you have at least 1 GPU
+        num_cpus_for_local_worker=1,
+    )
+    # When using RLlib's default models (RLModules) AND the new EnvRunners, you should
+    # set this flag in your model config. Having to set this, will no longer be required
+    # in the near future. It does yield a small performance advantage as value function
+    # predictions for PPO are no longer required to happen on the sampler side (but are
+    # now fully located on the learner side, which might have GPUs available).
+    .training(
+        model={"uses_new_env_runners": True},
+        replay_buffer_config={"type": "EpisodeReplayBuffer"},
+    )
+)
+# __enabling-new-api-stack-sa-sac-end__
+
+
+# Test whether it works.
+print(config.build().train())
diff --git a/doc/source/rllib/index.rst b/doc/source/rllib/index.rst
@@ -1,5 +1,7 @@
 .. include:: /_includes/rllib/we_are_hiring.rst
 
+.. include:: /_includes/rllib/new_api_stack.rst
+
 .. _rllib-index:
 
 RLlib: Industry-Grade Reinforcement Learning
@@ -14,6 +16,7 @@ RLlib: Industry-Grade Reinforcement Learning
     rllib-algorithms
     user-guides
     rllib-examples
+    rllib-new-api-stack
     package_ref/index
 
 

diff --git a/doc/source/rllib/key-concepts.rst b/doc/source/rllib/key-concepts.rst
@@ -1,7 +1,7 @@
 
 .. include:: /_includes/rllib/we_are_hiring.rst
 
-.. include:: /_includes/rllib/rlm_learner_migration_banner.rst
+.. include:: /_includes/rllib/new_api_stack.rst
 
 .. TODO: We need algorithms, environments, policies, models here. Likely in that order.
     Execution plans are not a "core" concept for users. Sample batches should probably also be left out.

diff --git a/doc/source/rllib/package_ref/algorithm.rst b/doc/source/rllib/package_ref/algorithm.rst
@@ -1,6 +1,9 @@
-.. algorithm-reference-docs:
 
-.. include:: /_includes/rllib/rlm_learner_migration_banner.rst
+.. include:: /_includes/rllib/we_are_hiring.rst
+
+.. include:: /_includes/rllib/new_api_stack.rst
+
+.. algorithm-reference-docs:
 
 Algorithms
 ==========

diff --git a/doc/source/rllib/package_ref/catalogs.rst b/doc/source/rllib/package_ref/catalogs.rst
@@ -1,8 +1,12 @@
-.. _catalog-reference-docs:
 
-.. include:: /_includes/rllib/rlm_learner_migration_banner.rst
+.. include:: /_includes/rllib/we_are_hiring.rst
+
+.. include:: /_includes/rllib/new_api_stack.rst
+
+.. include:: /_includes/rllib/new_api_stack_component.rst
 
-.. include:: /_includes/rllib/rlmodules_rollout.rst
+
+.. _catalog-reference-docs:
 
 Catalog API
 ===========

diff --git a/doc/source/rllib/package_ref/env.rst b/doc/source/rllib/package_ref/env.rst
@@ -1,3 +1,9 @@
+
+.. include:: /_includes/rllib/we_are_hiring.rst
+
+.. include:: /_includes/rllib/new_api_stack.rst
+
+
 .. _env-reference-docs:
 
 Environments

diff --git a/doc/source/rllib/package_ref/env/base_env.rst b/doc/source/rllib/package_ref/env/base_env.rst
@@ -1,3 +1,8 @@
+
+.. include:: /_includes/rllib/we_are_hiring.rst
+
+.. include:: /_includes/rllib/new_api_stack.rst
+
 .. _base-env-reference-docs:
 
 BaseEnv API

diff --git a/doc/source/rllib/package_ref/env/external_env.rst b/doc/source/rllib/package_ref/env/external_env.rst
@@ -1,3 +1,8 @@
+
+.. include:: /_includes/rllib/we_are_hiring.rst
+
+.. include:: /_includes/rllib/new_api_stack.rst
+
 .. _external-env-reference-docs:
 
 ExternalEnv API

diff --git a/doc/source/rllib/package_ref/env/multi_agent_env.rst b/doc/source/rllib/package_ref/env/multi_agent_env.rst
@@ -1,3 +1,8 @@
+
+.. include:: /_includes/rllib/we_are_hiring.rst
+
+.. include:: /_includes/rllib/new_api_stack.rst
+
 .. _multi-agent-env-reference-docs:
 
 MultiAgentEnv API

diff --git a/doc/source/rllib/package_ref/env/vector_env.rst b/doc/source/rllib/package_ref/env/vector_env.rst
@@ -1,3 +1,7 @@
+.. include:: /_includes/rllib/we_are_hiring.rst
+
+.. include:: /_includes/rllib/new_api_stack.rst
+
 .. _vector-env-reference-docs:
 
 VectorEnv API

diff --git a/doc/source/rllib/package_ref/evaluation.rst b/doc/source/rllib/package_ref/evaluation.rst
@@ -1,3 +1,9 @@
+
+.. include:: /_includes/rllib/we_are_hiring.rst
+
+.. include:: /_includes/rllib/new_api_stack.rst
+
+
 .. _evaluation-reference-docs:
 
 Sampling the Environment or offline data

diff --git a/doc/source/rllib/package_ref/external-app.rst b/doc/source/rllib/package_ref/external-app.rst
@@ -1,3 +1,8 @@
+
+.. include:: /_includes/rllib/we_are_hiring.rst
+
+.. include:: /_includes/rllib/new_api_stack.rst
+
 External Application API
 ------------------------
 

diff --git a/doc/source/rllib/package_ref/index.rst b/doc/source/rllib/package_ref/index.rst
@@ -1,6 +1,7 @@
+
 .. include:: /_includes/rllib/we_are_hiring.rst
 
-.. include:: /_includes/rllib/rlm_learner_migration_banner.rst
+.. include:: /_includes/rllib/new_api_stack.rst
 
 .. _rllib-reference-docs:
 

diff --git a/doc/source/rllib/package_ref/learner.rst b/doc/source/rllib/package_ref/learner.rst
@@ -1,3 +1,10 @@
+
+.. include:: /_includes/rllib/we_are_hiring.rst
+
+.. include:: /_includes/rllib/new_api_stack.rst
+
+.. include:: /_includes/rllib/new_api_stack_component.rst
+
 .. _learner-reference-docs:
 
 

diff --git a/doc/source/rllib/package_ref/models.rst b/doc/source/rllib/package_ref/models.rst
@@ -1,6 +1,9 @@
-.. _model-reference-docs:
+.. include:: /_includes/rllib/we_are_hiring.rst
+
+.. include:: /_includes/rllib/new_api_stack.rst
 
-.. include:: /_includes/rllib/rlm_learner_migration_banner.rst
+
+.. _model-reference-docs:
 
 Model APIs
 ==========

diff --git a/doc/source/rllib/package_ref/policy.rst b/doc/source/rllib/package_ref/policy.rst
@@ -1,6 +1,9 @@
+.. include:: /_includes/rllib/we_are_hiring.rst
+
+.. include:: /_includes/rllib/new_api_stack.rst
+
 .. _policy-reference-docs:
 
-.. include:: /_includes/rllib/rlm_learner_migration_banner.rst
 
 Policy API
 ==========

diff --git a/doc/source/rllib/package_ref/policy/custom_policies.rst b/doc/source/rllib/package_ref/policy/custom_policies.rst
@@ -1,3 +1,6 @@
+.. include:: /_includes/rllib/we_are_hiring.rst
+
+.. include:: /_includes/rllib/new_api_stack.rst
 
 Building Custom Policy Classes
 ------------------------------

diff --git a/doc/source/rllib/package_ref/replay-buffers.rst b/doc/source/rllib/package_ref/replay-buffers.rst
@@ -1,3 +1,8 @@
+
+.. include:: /_includes/rllib/we_are_hiring.rst
+
+.. include:: /_includes/rllib/new_api_stack.rst
+
 .. _replay-buffer-api-reference-docs:
 
 Replay Buffer API

diff --git a/doc/source/rllib/package_ref/rl_modules.rst b/doc/source/rllib/package_ref/rl_modules.rst
@@ -1,4 +1,9 @@
 
+.. include:: /_includes/rllib/we_are_hiring.rst
+
+.. include:: /_includes/rllib/new_api_stack.rst
+
+.. include:: /_includes/rllib/new_api_stack_component.rst
 
 .. _rlmodule-reference-docs:
 

diff --git a/doc/source/rllib/package_ref/utils.rst b/doc/source/rllib/package_ref/utils.rst
@@ -1,3 +1,8 @@
+
+.. include:: /_includes/rllib/we_are_hiring.rst
+
+.. include:: /_includes/rllib/new_api_stack.rst
+
 .. _utils-reference-docs:
 
 RLlib Utilities

diff --git a/doc/source/rllib/rllib-advanced-api.rst b/doc/source/rllib/rllib-advanced-api.rst
@@ -1,4 +1,6 @@
 
+.. include:: /_includes/rllib/new_api_stack.rst
+
 .. _rllib-advanced-api-doc:
 
 Advanced Python APIs

diff --git a/doc/source/rllib/rllib-algorithms.rst b/doc/source/rllib/rllib-algorithms.rst
@@ -1,6 +1,6 @@
 .. include:: /_includes/rllib/we_are_hiring.rst
 
-.. include:: /_includes/rllib/rlm_learner_migration_banner.rst
+.. include:: /_includes/rllib/new_api_stack.rst
 
 .. _rllib-algorithms-doc:
 

diff --git a/doc/source/rllib/rllib-catalogs.rst b/doc/source/rllib/rllib-catalogs.rst
@@ -1,6 +1,8 @@
 .. include:: /_includes/rllib/we_are_hiring.rst
 
-.. include:: /_includes/rllib/rlmodules_rollout.rst
+.. include:: /_includes/rllib/new_api_stack.rst
+
+.. include:: /_includes/rllib/new_api_stack_component.rst
 
 
 Catalog (Alpha)

diff --git a/doc/source/rllib/rllib-concepts.rst b/doc/source/rllib/rllib-concepts.rst
@@ -1,6 +1,7 @@
 .. include:: /_includes/rllib/we_are_hiring.rst
 
-.. include:: /_includes/rllib/rlm_learner_migration_banner.rst
+.. include:: /_includes/rllib/new_api_stack.rst
+
 
 .. _rllib-policy-walkthrough:
 

diff --git a/doc/source/rllib/rllib-connector.rst b/doc/source/rllib/rllib-connector.rst
@@ -1,5 +1,7 @@
 .. include:: /_includes/rllib/we_are_hiring.rst
 
+.. include:: /_includes/rllib/new_api_stack.rst
+
 Connectors (Beta)
 ==================