Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib] Early improvements to Catalogs and RL Modules docs + Catalogs improvements #37245

Merged

Conversation

ArturNiederfahrenhorst
Copy link
Contributor

@ArturNiederfahrenhorst ArturNiederfahrenhorst commented Jul 10, 2023

Why are these changes needed?

As discussed offline, our first iteration of docs and examples on RL Modules and Catalog need some tweaking to provide better guidance on most common user journeys and clean up some general rough edges we left.

This PR executes on a collection of action items we collected offline.
This PR also includes some changes to Catalog itself that do not change functionality but make adjustments to naming based on user feedback.

Signed-off-by: Artur Niederfahrenhorst <attaismyname@googlemail.com>
Signed-off-by: Artur Niederfahrenhorst <attaismyname@googlemail.com>
Signed-off-by: Artur Niederfahrenhorst <attaismyname@googlemail.com>
Signed-off-by: Artur Niederfahrenhorst <attaismyname@googlemail.com>
Signed-off-by: Artur Niederfahrenhorst <attaismyname@googlemail.com>
Signed-off-by: Artur Niederfahrenhorst <attaismyname@googlemail.com>
Signed-off-by: Artur Niederfahrenhorst <attaismyname@googlemail.com>
Signed-off-by: Artur Niederfahrenhorst <attaismyname@googlemail.com>
Signed-off-by: Artur Niederfahrenhorst <attaismyname@googlemail.com>
@@ -399,3 +399,50 @@ def setup(self):

module = spec.build()
# __pass-custom-marlmodule-shared-enc-end__

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This addition solves a todo from the RL Modules doc to show checkpointing.
It's not very extensive, but I think we should include more info in a general section on checkpointing that includes S3 checkpointing, checkpointing with and without tune etc outside of the RL Modules guide.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on a separate user-guide on checkpointing and resuming experiments.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added to backlog #37515

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General sketch that shows a couple of components on little space at the cost of being somewhat imprecise.
This is on purpose because it should only convey the general idea to form a mental model rather than specifics.

@@ -44,6 +44,7 @@ The model that tries to maximize the expected sum over all future rewards is cal
The RL simulation feedback loop repeatedly collects data, for one (single-agent case) or multiple (multi-agent case) policies, trains the policies on these collected data, and makes sure the policies' weights are kept in sync. Thereby, the collected environment data contains observations, taken actions, received rewards and so-called **done** flags, indicating the boundaries of different episodes the agents play through in the simulation.

The simulation iterations of action -> reward -> next state -> train -> repeat, until the end state, is called an **episode**, or in RLlib, a **rollout**.
The most common API to define environments is the `Farama-Foundation Gymnasium <rllib-env.html#gymnasium>`__ API, which we also use in most of our examples.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was found further down in the file in Policies section, where it does not belong afaics.

@@ -115,40 +116,32 @@ You can `configure the parallelism <rllib-training.html#specifying-resources>`__
Check out our `scaling guide <rllib-training.html#scaling-guide>`__ for more details here.


Policies
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should leave policies out of the concepts from now on.
They are only used on the sampling stack and therefore transparent.


Catalog (Alpha)
Catalog (Beta)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this PR, the API should have largely settled.

===============

Catalogs are where `RLModules <rllib-rlmodule.html>`__ primarily get their models and action distributions from.
Each :py:class:`~ray.rllib.core.rl_module.rl_module.RLModule` has its own default
:py:class:`~ray.rllib.core.models.catalog.Catalog`. For example,
:py:class:`~ray.rllib.algorithms.ppo.ppo_torch_rl_module.PPOTorchRLModule` has the
:py:class:`~ray.rllib.algorithms.ppo.ppo_catalog.PPOCatalog`.
You can override Catalogs’ methods to alter the behavior of existing RLModules.
This makes Catalogs a means of configuration for RLModules.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is misleading.
Users should not randomly land on this page, read the first few sentences and attempt to configure an RL Module through catalogs. This gets clearner further down.

===============

Catalogs are where `RLModules <rllib-rlmodule.html>`__ primarily get their models and action distributions from.
Each :py:class:`~ray.rllib.core.rl_module.rl_module.RLModule` has its own default
:py:class:`~ray.rllib.core.models.catalog.Catalog`. For example,
:py:class:`~ray.rllib.algorithms.ppo.ppo_torch_rl_module.PPOTorchRLModule` has the
:py:class:`~ray.rllib.algorithms.ppo.ppo_catalog.PPOCatalog`.
You can override Catalogs’ methods to alter the behavior of existing RLModules.
This makes Catalogs a means of configuration for RLModules.
You interact with Catalogs when making deeper customization to what :py:class:`~ray.rllib.core.models.Model` and :py:class:`~ray.rllib.models.distributions.Distribution` RLlib creates by default.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The is part of the information directly below.

Signed-off-by: Artur Niederfahrenhorst <attaismyname@googlemail.com>
Signed-off-by: Artur Niederfahrenhorst <attaismyname@googlemail.com>
:language: python
:start-after: __write-custom-marlmodule-shared-enc-begin__
:end-before: __write-custom-marlmodule-shared-enc-end__
.. literalinclude:: doc_code/rlmodule_guide.py
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was in its own tab, but with no alternative tabs.

Signed-off-by: Artur Niederfahrenhorst <attaismyname@googlemail.com>
Signed-off-by: Artur Niederfahrenhorst <attaismyname@googlemail.com>
Signed-off-by: Artur Niederfahrenhorst <attaismyname@googlemail.com>
Signed-off-by: Artur Niederfahrenhorst <attaismyname@googlemail.com>
@ArturNiederfahrenhorst ArturNiederfahrenhorst changed the title [RLlib] Early improvements to Catalogs and RL Modules docs to clear up common use cases [RLlib] Early improvements to Catalogs and RL Modules docs + Catalogs improvements Jul 17, 2023
Signed-off-by: Artur Niederfahrenhorst <attaismyname@googlemail.com>
Signed-off-by: Artur Niederfahrenhorst <attaismyname@googlemail.com>
Copy link
Contributor

@kouroshHakha kouroshHakha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is high quality right there. Awesome PR.
I left some comments and modification to make things a bit more clear.
Also re: RL Module vs. RLModule, feel free to ignore the comment I think you had something different in mind. We just need to be consistent in spelling across all the docs. Make sure everything is spelled RL Modules in text.

Comment on lines 16 to 17
Catalogs can be extended to offer more or different models or distributions to RLModules (e.g. to PPOTorchRLModule).
Catalogs can we written to build models for new RLModules (for new algorithms).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change the first sentence above to the following:

Catalog is a utility abstraction that modularizes the construction of `RLModules <rllib-rlmodule.html>`. It includes information such how input observation spaces should be encoded, what action distributions should be used, and so on. Each ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Catalogs can be extended to offer more or different models or distributions to RLModules (e.g. to PPOTorchRLModule).
Catalogs can we written to build models for new RLModules (for new algorithms).
To customize existing RLModules either change the RLModule directly by inheriting the class and changing the `setup()` method or alternatively, extend the `Catalog` class attributed to that `RLModule`. Use catalogs only if your customizations fits the abstractions provided by Catalog.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this to a manual commit with proper doc links.

doc/source/rllib/rllib-catalogs.rst Outdated Show resolved Hide resolved
doc/source/rllib/rllib-catalogs.rst Outdated Show resolved Hide resolved
doc/source/rllib/rllib-catalogs.rst Outdated Show resolved Hide resolved
Whenever you create a Catalog, the decision tree is executed to find suitable configs for models and classes for distributions.
By default this happens in :py:meth:`~ray.rllib.core.models.catalog.Catalog.get_encoder_config` and :py:meth:`~ray.rllib.core.models.catalog.Catalog._get_dist_cls_from_action_space`.
Whenever you build a model, the config is turned into a model.
Distributions are instantiated per forward pass of an RL Module and are therefore not built.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RLModule

doc/source/rllib/rllib-catalogs.rst Outdated Show resolved Hide resolved
- Does the Algorithm need a special Encoder? Overwrite :py:meth:`~ray.rllib.core.models.catalog.Catalog._get_encoder_config`.
- Does the Algorithm need an additional network? Write a method to build it. You can use RLlib's model configurations to build models from dimensions.
- Does the Algorithm need a custom distribution? Overwrite :py:meth:`~ray.rllib.core.models.catalog.Catalog._get_dist_cls_from_action_space`.
- Does the Algorithm need a special tokenizer? Overwrite :py:meth:`~ray.rllib.core.models.catalog.Catalog.get_tokenizer_config`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one thing that is not clear is what is the difference between tokenizer and encoder. We need to have a proper definition of tokenizer somewhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some more info above now!

@@ -334,15 +330,21 @@ To construct this custom multi-agent RL module, pass the class to the :py:class:
Extending Existing RLlib RL Modules
-----------------------------------

RLlib provides a number of RL Modules for different frameworks (e.g., PyTorch, TensorFlow, etc.). Extend these modules by inheriting from them and overriding the methods you need to customize. For example, extend :py:class:`~ray.rllib.algorithms.ppo.torch.ppo_torch_rl_module.PPOTorchRLModule` and augment it with your own customization. Then pass the new customized class into the algorithm configuration.
RLlib provides a number of RL Modules for different frameworks (e.g., PyTorch, TensorFlow, etc.).
Extend these modules by inheriting from them and overriding the methods you need to customize.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

say which method people should override. setup()?

@kouroshHakha
Copy link
Contributor

There is also a lint error.

Co-authored-by: kourosh hakhamaneshi <31483498+kouroshHakha@users.noreply.github.com>
Signed-off-by: Artur Niederfahrenhorst <attaismyname@googlemail.com>
Kourosh's suggestions

Co-authored-by: kourosh hakhamaneshi <31483498+kouroshHakha@users.noreply.github.com>
Signed-off-by: Artur Niederfahrenhorst <attaismyname@googlemail.com>
Signed-off-by: Artur Niederfahrenhorst <attaismyname@googlemail.com>
Signed-off-by: Artur Niederfahrenhorst <attaismyname@googlemail.com>
Signed-off-by: Artur Niederfahrenhorst <attaismyname@googlemail.com>
@kouroshHakha kouroshHakha merged commit ab131bb into ray-project:master Jul 19, 2023
42 of 45 checks passed
ArturNiederfahrenhorst added a commit to ArturNiederfahrenhorst/ray that referenced this pull request Jul 22, 2023
… improvements (ray-project#37245)

Signed-off-by: Artur Niederfahrenhorst <attaismyname@googlemail.com>
Bhav00 pushed a commit to Bhav00/ray that referenced this pull request Jul 24, 2023
… improvements (ray-project#37245)

Signed-off-by: Artur Niederfahrenhorst <attaismyname@googlemail.com>
rickyyx pushed a commit that referenced this pull request Jul 25, 2023
… improvements (#37245) (#37681)

Signed-off-by: Artur Niederfahrenhorst <attaismyname@googlemail.com>
NripeshN pushed a commit to NripeshN/ray that referenced this pull request Aug 15, 2023
… improvements (ray-project#37245)

Signed-off-by: Artur Niederfahrenhorst <attaismyname@googlemail.com>
Signed-off-by: NripeshN <nn2012@hw.ac.uk>
harborn pushed a commit to harborn/ray that referenced this pull request Aug 17, 2023
… improvements (ray-project#37245)

Signed-off-by: Artur Niederfahrenhorst <attaismyname@googlemail.com>
Signed-off-by: harborn <gangsheng.wu@intel.com>
harborn pushed a commit to harborn/ray that referenced this pull request Aug 17, 2023
… improvements (ray-project#37245)

Signed-off-by: Artur Niederfahrenhorst <attaismyname@googlemail.com>
arvind-chandra pushed a commit to lmco/ray that referenced this pull request Aug 31, 2023
… improvements (ray-project#37245)

Signed-off-by: Artur Niederfahrenhorst <attaismyname@googlemail.com>
Signed-off-by: e428265 <arvind.chandramouli@lmco.com>
vymao pushed a commit to vymao/ray that referenced this pull request Oct 11, 2023
… improvements (ray-project#37245)

Signed-off-by: Artur Niederfahrenhorst <attaismyname@googlemail.com>
Signed-off-by: Victor <vctr.y.m@example.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants