Skip to content

Commit

Permalink
[RLlib] Documentation do-over 01: Announce new API stack as alpha; ad…
Browse files Browse the repository at this point in the history
…d hints to all RLlib pages; describe how to use it in new page. (ray-project#44090)
  • Loading branch information
sven1977 authored and ryanaoleary committed Jun 7, 2024
1 parent 9d16d30 commit d1ff9ad
Show file tree
Hide file tree
Showing 44 changed files with 442 additions and 24 deletions.
11 changes: 11 additions & 0 deletions doc/source/_includes/rllib/new_api_stack.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
.. note::

Ray 2.10.0 introduces the alpha stage of RLlib's "new API stack".
The Ray Team plans to transition algorithms, example scripts, and documentation to the new code base
thereby incrementally replacing the "old API stack" (e.g., ModelV2, Policy, RolloutWorker) throughout the subsequent minor releases leading up to Ray 3.0.

Note, however, that so far only PPO (single- and multi-agent) and SAC (single-agent only)
support the "new API stack" and continue to run by default with the old APIs.
You can continue to use the existing custom (old stack) classes.

`See </rllib/package_ref/rllib-new-api-stack.html>`__ for more details on how to use the new API stack.
3 changes: 3 additions & 0 deletions doc/source/_includes/rllib/new_api_stack_component.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
.. note::

This doc is related to RLlib's `new API stack </rllib/package_ref/rllib-new-api-stack.html>`__ and therefore experimental.
5 changes: 0 additions & 5 deletions doc/source/_includes/rllib/rlm_learner_migration_banner.rst

This file was deleted.

1 change: 0 additions & 1 deletion doc/source/_includes/rllib/rlmodules_rollout.rst

This file was deleted.

135 changes: 135 additions & 0 deletions doc/source/rllib/doc_code/new_api_stack.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
# __enabling-new-api-stack-sa-ppo-begin__

from ray.rllib.algorithms.ppo import PPOConfig
from ray.rllib.env.single_agent_env_runner import SingleAgentEnvRunner


config = (
PPOConfig().environment("CartPole-v1")
# Switch the new API stack flag to True (False by default).
# This enables the use of the RLModule (replaces ModelV2) AND Learner (replaces
# Policy) classes.
.experimental(_enable_new_api_stack=True)
# However, the above flag only activates the RLModule and Learner APIs. In order
# to utilize all of the new API stack's classes, you also have to specify the
# EnvRunner (replaces RolloutWorker) to use.
# Note that this step will be fully automated in the next release.
# Set the `env_runner_cls` to `SingleAgentEnvRunner` for single-agent setups and
# `MultiAgentEnvRunner` for multi-agent cases.
.rollouts(env_runner_cls=SingleAgentEnvRunner)
# We are using a simple 1-CPU setup here for learning. However, as the new stack
# supports arbitrary scaling on the learner axis, feel free to set
# `num_learner_workers` to the number of available GPUs for multi-GPU training (and
# `num_gpus_per_learner_worker=1`).
.resources(
num_learner_workers=0, # <- in most cases, set this value to the number of GPUs
num_gpus_per_learner_worker=0, # <- set this to 1, if you have at least 1 GPU
num_cpus_for_local_worker=1,
)
# When using RLlib's default models (RLModules) AND the new EnvRunners, you should
# set this flag in your model config. Having to set this, will no longer be required
# in the near future. It does yield a small performance advantage as value function
# predictions for PPO are no longer required to happen on the sampler side (but are
# now fully located on the learner side, which might have GPUs available).
.training(model={"uses_new_env_runners": True})
)

# __enabling-new-api-stack-sa-ppo-end__

# Test whether it works.
print(config.build().train())


# __enabling-new-api-stack-ma-ppo-begin__

from ray.rllib.algorithms.ppo import PPOConfig # noqa
from ray.rllib.env.multi_agent_env_runner import MultiAgentEnvRunner # noqa
from ray.rllib.examples.env.multi_agent import MultiAgentCartPole # noqa


# A typical multi-agent setup (otherwise using the exact same parameters as before)
# looks like this.
config = (
PPOConfig().environment(MultiAgentCartPole, env_config={"num_agents": 2})
# Switch the new API stack flag to True (False by default).
# This enables the use of the RLModule (replaces ModelV2) AND Learner (replaces
# Policy) classes.
.experimental(_enable_new_api_stack=True)
# However, the above flag only activates the RLModule and Learner APIs. In order
# to utilize all of the new API stack's classes, you also have to specify the
# EnvRunner (replaces RolloutWorker) to use.
# Note that this step will be fully automated in the next release.
# Set the `env_runner_cls` to `SingleAgentEnvRunner` for single-agent setups and
# `MultiAgentEnvRunner` for multi-agent cases.
.rollouts(env_runner_cls=MultiAgentEnvRunner)
# We are using a simple 1-CPU setup here for learning. However, as the new stack
# supports arbitrary scaling on the learner axis, feel free to set
# `num_learner_workers` to the number of available GPUs for multi-GPU training (and
# `num_gpus_per_learner_worker=1`).
.resources(
num_learner_workers=0, # <- in most cases, set this value to the number of GPUs
num_gpus_per_learner_worker=0, # <- set this to 1, if you have at least 1 GPU
num_cpus_for_local_worker=1,
)
# When using RLlib's default models (RLModules) AND the new EnvRunners, you should
# set this flag in your model config. Having to set this, will no longer be required
# in the near future. It does yield a small performance advantage as value function
# predictions for PPO are no longer required to happen on the sampler side (but are
# now fully located on the learner side, which might have GPUs available).
.training(model={"uses_new_env_runners": True})
# Because you are in a multi-agent env, you have to set up the usual multi-agent
# parameters:
.multi_agent(
policies={"p0", "p1"},
# Map agent 0 to p0 and agent 1 to p1.
policy_mapping_fn=lambda agent_id, episode, **kwargs: f"p{agent_id}",
)
)

# __enabling-new-api-stack-ma-ppo-end__

# Test whether it works.
print(config.build().train())


# __enabling-new-api-stack-sa-sac-begin__

from ray.rllib.algorithms.sac import SACConfig # noqa
from ray.rllib.env.single_agent_env_runner import SingleAgentEnvRunner # noqa


config = (
SACConfig().environment("Pendulum-v1")
# Switch the new API stack flag to True (False by default).
# This enables the use of the RLModule (replaces ModelV2) AND Learner (replaces
# Policy) classes.
.experimental(_enable_new_api_stack=True)
# However, the above flag only activates the RLModule and Learner APIs. In order
# to utilize all of the new API stack's classes, you also have to specify the
# EnvRunner (replaces RolloutWorker) to use.
# Note that this step will be fully automated in the next release.
.rollouts(env_runner_cls=SingleAgentEnvRunner)
# We are using a simple 1-CPU setup here for learning. However, as the new stack
# supports arbitrary scaling on the learner axis, feel free to set
# `num_learner_workers` to the number of available GPUs for multi-GPU training (and
# `num_gpus_per_learner_worker=1`).
.resources(
num_learner_workers=0, # <- in most cases, set this value to the number of GPUs
num_gpus_per_learner_worker=0, # <- set this to 1, if you have at least 1 GPU
num_cpus_for_local_worker=1,
)
# When using RLlib's default models (RLModules) AND the new EnvRunners, you should
# set this flag in your model config. Having to set this, will no longer be required
# in the near future. It does yield a small performance advantage as value function
# predictions for PPO are no longer required to happen on the sampler side (but are
# now fully located on the learner side, which might have GPUs available).
.training(
model={"uses_new_env_runners": True},
replay_buffer_config={"type": "EpisodeReplayBuffer"},
)
)
# __enabling-new-api-stack-sa-sac-end__


# Test whether it works.
print(config.build().train())
3 changes: 3 additions & 0 deletions doc/source/rllib/index.rst
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
.. include:: /_includes/rllib/we_are_hiring.rst

.. include:: /_includes/rllib/new_api_stack.rst

.. _rllib-index:

RLlib: Industry-Grade Reinforcement Learning
Expand All @@ -14,6 +16,7 @@ RLlib: Industry-Grade Reinforcement Learning
rllib-algorithms
user-guides
rllib-examples
rllib-new-api-stack
package_ref/index


Expand Down
2 changes: 1 addition & 1 deletion doc/source/rllib/key-concepts.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@

.. include:: /_includes/rllib/we_are_hiring.rst

.. include:: /_includes/rllib/rlm_learner_migration_banner.rst
.. include:: /_includes/rllib/new_api_stack.rst

.. TODO: We need algorithms, environments, policies, models here. Likely in that order.
Execution plans are not a "core" concept for users. Sample batches should probably also be left out.
Expand Down
7 changes: 5 additions & 2 deletions doc/source/rllib/package_ref/algorithm.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
.. algorithm-reference-docs:

.. include:: /_includes/rllib/rlm_learner_migration_banner.rst
.. include:: /_includes/rllib/we_are_hiring.rst

.. include:: /_includes/rllib/new_api_stack.rst

.. algorithm-reference-docs:
Algorithms
==========
Expand Down
10 changes: 7 additions & 3 deletions doc/source/rllib/package_ref/catalogs.rst
Original file line number Diff line number Diff line change
@@ -1,8 +1,12 @@
.. _catalog-reference-docs:

.. include:: /_includes/rllib/rlm_learner_migration_banner.rst
.. include:: /_includes/rllib/we_are_hiring.rst

.. include:: /_includes/rllib/new_api_stack.rst

.. include:: /_includes/rllib/new_api_stack_component.rst

.. include:: /_includes/rllib/rlmodules_rollout.rst

.. _catalog-reference-docs:

Catalog API
===========
Expand Down
6 changes: 6 additions & 0 deletions doc/source/rllib/package_ref/env.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@

.. include:: /_includes/rllib/we_are_hiring.rst

.. include:: /_includes/rllib/new_api_stack.rst


.. _env-reference-docs:

Environments
Expand Down
5 changes: 5 additions & 0 deletions doc/source/rllib/package_ref/env/base_env.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@

.. include:: /_includes/rllib/we_are_hiring.rst

.. include:: /_includes/rllib/new_api_stack.rst

.. _base-env-reference-docs:

BaseEnv API
Expand Down
5 changes: 5 additions & 0 deletions doc/source/rllib/package_ref/env/external_env.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@

.. include:: /_includes/rllib/we_are_hiring.rst

.. include:: /_includes/rllib/new_api_stack.rst

.. _external-env-reference-docs:

ExternalEnv API
Expand Down
5 changes: 5 additions & 0 deletions doc/source/rllib/package_ref/env/multi_agent_env.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@

.. include:: /_includes/rllib/we_are_hiring.rst

.. include:: /_includes/rllib/new_api_stack.rst

.. _multi-agent-env-reference-docs:

MultiAgentEnv API
Expand Down
4 changes: 4 additions & 0 deletions doc/source/rllib/package_ref/env/vector_env.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
.. include:: /_includes/rllib/we_are_hiring.rst

.. include:: /_includes/rllib/new_api_stack.rst

.. _vector-env-reference-docs:

VectorEnv API
Expand Down
6 changes: 6 additions & 0 deletions doc/source/rllib/package_ref/evaluation.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@

.. include:: /_includes/rllib/we_are_hiring.rst

.. include:: /_includes/rllib/new_api_stack.rst


.. _evaluation-reference-docs:

Sampling the Environment or offline data
Expand Down
5 changes: 5 additions & 0 deletions doc/source/rllib/package_ref/external-app.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@

.. include:: /_includes/rllib/we_are_hiring.rst

.. include:: /_includes/rllib/new_api_stack.rst

External Application API
------------------------

Expand Down
3 changes: 2 additions & 1 deletion doc/source/rllib/package_ref/index.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@

.. include:: /_includes/rllib/we_are_hiring.rst

.. include:: /_includes/rllib/rlm_learner_migration_banner.rst
.. include:: /_includes/rllib/new_api_stack.rst

.. _rllib-reference-docs:

Expand Down
7 changes: 7 additions & 0 deletions doc/source/rllib/package_ref/learner.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,10 @@

.. include:: /_includes/rllib/we_are_hiring.rst

.. include:: /_includes/rllib/new_api_stack.rst

.. include:: /_includes/rllib/new_api_stack_component.rst

.. _learner-reference-docs:


Expand Down
7 changes: 5 additions & 2 deletions doc/source/rllib/package_ref/models.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
.. _model-reference-docs:
.. include:: /_includes/rllib/we_are_hiring.rst

.. include:: /_includes/rllib/new_api_stack.rst

.. include:: /_includes/rllib/rlm_learner_migration_banner.rst

.. _model-reference-docs:

Model APIs
==========
Expand Down
5 changes: 4 additions & 1 deletion doc/source/rllib/package_ref/policy.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
.. include:: /_includes/rllib/we_are_hiring.rst

.. include:: /_includes/rllib/new_api_stack.rst

.. _policy-reference-docs:

.. include:: /_includes/rllib/rlm_learner_migration_banner.rst

Policy API
==========
Expand Down
3 changes: 3 additions & 0 deletions doc/source/rllib/package_ref/policy/custom_policies.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
.. include:: /_includes/rllib/we_are_hiring.rst

.. include:: /_includes/rllib/new_api_stack.rst

Building Custom Policy Classes
------------------------------
Expand Down
5 changes: 5 additions & 0 deletions doc/source/rllib/package_ref/replay-buffers.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@

.. include:: /_includes/rllib/we_are_hiring.rst

.. include:: /_includes/rllib/new_api_stack.rst

.. _replay-buffer-api-reference-docs:

Replay Buffer API
Expand Down
5 changes: 5 additions & 0 deletions doc/source/rllib/package_ref/rl_modules.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,9 @@

.. include:: /_includes/rllib/we_are_hiring.rst

.. include:: /_includes/rllib/new_api_stack.rst

.. include:: /_includes/rllib/new_api_stack_component.rst

.. _rlmodule-reference-docs:

Expand Down
5 changes: 5 additions & 0 deletions doc/source/rllib/package_ref/utils.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@

.. include:: /_includes/rllib/we_are_hiring.rst

.. include:: /_includes/rllib/new_api_stack.rst

.. _utils-reference-docs:

RLlib Utilities
Expand Down
2 changes: 2 additions & 0 deletions doc/source/rllib/rllib-advanced-api.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@

.. include:: /_includes/rllib/new_api_stack.rst

.. _rllib-advanced-api-doc:

Advanced Python APIs
Expand Down
2 changes: 1 addition & 1 deletion doc/source/rllib/rllib-algorithms.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
.. include:: /_includes/rllib/we_are_hiring.rst

.. include:: /_includes/rllib/rlm_learner_migration_banner.rst
.. include:: /_includes/rllib/new_api_stack.rst

.. _rllib-algorithms-doc:

Expand Down
4 changes: 3 additions & 1 deletion doc/source/rllib/rllib-catalogs.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
.. include:: /_includes/rllib/we_are_hiring.rst

.. include:: /_includes/rllib/rlmodules_rollout.rst
.. include:: /_includes/rllib/new_api_stack.rst

.. include:: /_includes/rllib/new_api_stack_component.rst


Catalog (Alpha)
Expand Down
3 changes: 2 additions & 1 deletion doc/source/rllib/rllib-concepts.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
.. include:: /_includes/rllib/we_are_hiring.rst

.. include:: /_includes/rllib/rlm_learner_migration_banner.rst
.. include:: /_includes/rllib/new_api_stack.rst


.. _rllib-policy-walkthrough:

Expand Down
2 changes: 2 additions & 0 deletions doc/source/rllib/rllib-connector.rst
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
.. include:: /_includes/rllib/we_are_hiring.rst

.. include:: /_includes/rllib/new_api_stack.rst

Connectors (Beta)
==================

Expand Down
Loading

0 comments on commit d1ff9ad

Please sign in to comment.