New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib] Early improvements to Catalogs and RL Modules docs + Catalogs improvements #37245
[RLlib] Early improvements to Catalogs and RL Modules docs + Catalogs improvements #37245
Conversation
Signed-off-by: Artur Niederfahrenhorst <attaismyname@googlemail.com>
Signed-off-by: Artur Niederfahrenhorst <attaismyname@googlemail.com>
Signed-off-by: Artur Niederfahrenhorst <attaismyname@googlemail.com>
Signed-off-by: Artur Niederfahrenhorst <attaismyname@googlemail.com>
Signed-off-by: Artur Niederfahrenhorst <attaismyname@googlemail.com>
Signed-off-by: Artur Niederfahrenhorst <attaismyname@googlemail.com>
@@ -399,3 +399,50 @@ def setup(self): | |||
|
|||
module = spec.build() | |||
# __pass-custom-marlmodule-shared-enc-end__ | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This addition solves a todo from the RL Modules doc to show checkpointing.
It's not very extensive, but I think we should include more info in a general section on checkpointing that includes S3 checkpointing, checkpointing with and without tune etc outside of the RL Modules guide.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on a separate user-guide on checkpointing and resuming experiments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added to backlog #37515
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General sketch that shows a couple of components on little space at the cost of being somewhat imprecise.
This is on purpose because it should only convey the general idea to form a mental model rather than specifics.
@@ -44,6 +44,7 @@ The model that tries to maximize the expected sum over all future rewards is cal | |||
The RL simulation feedback loop repeatedly collects data, for one (single-agent case) or multiple (multi-agent case) policies, trains the policies on these collected data, and makes sure the policies' weights are kept in sync. Thereby, the collected environment data contains observations, taken actions, received rewards and so-called **done** flags, indicating the boundaries of different episodes the agents play through in the simulation. | |||
|
|||
The simulation iterations of action -> reward -> next state -> train -> repeat, until the end state, is called an **episode**, or in RLlib, a **rollout**. | |||
The most common API to define environments is the `Farama-Foundation Gymnasium <rllib-env.html#gymnasium>`__ API, which we also use in most of our examples. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was found further down in the file in Policies section, where it does not belong afaics.
@@ -115,40 +116,32 @@ You can `configure the parallelism <rllib-training.html#specifying-resources>`__ | |||
Check out our `scaling guide <rllib-training.html#scaling-guide>`__ for more details here. | |||
|
|||
|
|||
Policies |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should leave policies out of the concepts from now on.
They are only used on the sampling stack and therefore transparent.
doc/source/rllib/rllib-catalogs.rst
Outdated
|
||
Catalog (Alpha) | ||
Catalog (Beta) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this PR, the API should have largely settled.
=============== | ||
|
||
Catalogs are where `RLModules <rllib-rlmodule.html>`__ primarily get their models and action distributions from. | ||
Each :py:class:`~ray.rllib.core.rl_module.rl_module.RLModule` has its own default | ||
:py:class:`~ray.rllib.core.models.catalog.Catalog`. For example, | ||
:py:class:`~ray.rllib.algorithms.ppo.ppo_torch_rl_module.PPOTorchRLModule` has the | ||
:py:class:`~ray.rllib.algorithms.ppo.ppo_catalog.PPOCatalog`. | ||
You can override Catalogs’ methods to alter the behavior of existing RLModules. | ||
This makes Catalogs a means of configuration for RLModules. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is misleading.
Users should not randomly land on this page, read the first few sentences and attempt to configure an RL Module through catalogs. This gets clearner further down.
=============== | ||
|
||
Catalogs are where `RLModules <rllib-rlmodule.html>`__ primarily get their models and action distributions from. | ||
Each :py:class:`~ray.rllib.core.rl_module.rl_module.RLModule` has its own default | ||
:py:class:`~ray.rllib.core.models.catalog.Catalog`. For example, | ||
:py:class:`~ray.rllib.algorithms.ppo.ppo_torch_rl_module.PPOTorchRLModule` has the | ||
:py:class:`~ray.rllib.algorithms.ppo.ppo_catalog.PPOCatalog`. | ||
You can override Catalogs’ methods to alter the behavior of existing RLModules. | ||
This makes Catalogs a means of configuration for RLModules. | ||
You interact with Catalogs when making deeper customization to what :py:class:`~ray.rllib.core.models.Model` and :py:class:`~ray.rllib.models.distributions.Distribution` RLlib creates by default. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The is part of the information directly below.
Signed-off-by: Artur Niederfahrenhorst <attaismyname@googlemail.com>
Signed-off-by: Artur Niederfahrenhorst <attaismyname@googlemail.com>
:language: python | ||
:start-after: __write-custom-marlmodule-shared-enc-begin__ | ||
:end-before: __write-custom-marlmodule-shared-enc-end__ | ||
.. literalinclude:: doc_code/rlmodule_guide.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was in its own tab, but with no alternative tabs.
Signed-off-by: Artur Niederfahrenhorst <attaismyname@googlemail.com>
Signed-off-by: Artur Niederfahrenhorst <attaismyname@googlemail.com>
Signed-off-by: Artur Niederfahrenhorst <attaismyname@googlemail.com>
Signed-off-by: Artur Niederfahrenhorst <attaismyname@googlemail.com>
Signed-off-by: Artur Niederfahrenhorst <attaismyname@googlemail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is high quality right there. Awesome PR.
I left some comments and modification to make things a bit more clear.
Also re: RL Module vs. RLModule, feel free to ignore the comment I think you had something different in mind. We just need to be consistent in spelling across all the docs. Make sure everything is spelled RL Modules in text.
doc/source/rllib/rllib-catalogs.rst
Outdated
Catalogs can be extended to offer more or different models or distributions to RLModules (e.g. to PPOTorchRLModule). | ||
Catalogs can we written to build models for new RLModules (for new algorithms). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change the first sentence above to the following:
Catalog is a utility abstraction that modularizes the construction of `RLModules <rllib-rlmodule.html>`. It includes information such how input observation spaces should be encoded, what action distributions should be used, and so on. Each ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Catalogs can be extended to offer more or different models or distributions to RLModules (e.g. to PPOTorchRLModule). | |
Catalogs can we written to build models for new RLModules (for new algorithms). | |
To customize existing RLModules either change the RLModule directly by inheriting the class and changing the `setup()` method or alternatively, extend the `Catalog` class attributed to that `RLModule`. Use catalogs only if your customizations fits the abstractions provided by Catalog. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added this to a manual commit with proper doc links.
doc/source/rllib/rllib-catalogs.rst
Outdated
Whenever you create a Catalog, the decision tree is executed to find suitable configs for models and classes for distributions. | ||
By default this happens in :py:meth:`~ray.rllib.core.models.catalog.Catalog.get_encoder_config` and :py:meth:`~ray.rllib.core.models.catalog.Catalog._get_dist_cls_from_action_space`. | ||
Whenever you build a model, the config is turned into a model. | ||
Distributions are instantiated per forward pass of an RL Module and are therefore not built. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RLModule
- Does the Algorithm need a special Encoder? Overwrite :py:meth:`~ray.rllib.core.models.catalog.Catalog._get_encoder_config`. | ||
- Does the Algorithm need an additional network? Write a method to build it. You can use RLlib's model configurations to build models from dimensions. | ||
- Does the Algorithm need a custom distribution? Overwrite :py:meth:`~ray.rllib.core.models.catalog.Catalog._get_dist_cls_from_action_space`. | ||
- Does the Algorithm need a special tokenizer? Overwrite :py:meth:`~ray.rllib.core.models.catalog.Catalog.get_tokenizer_config`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one thing that is not clear is what is the difference between tokenizer and encoder. We need to have a proper definition of tokenizer somewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some more info above now!
doc/source/rllib/rllib-rlmodule.rst
Outdated
@@ -334,15 +330,21 @@ To construct this custom multi-agent RL module, pass the class to the :py:class: | |||
Extending Existing RLlib RL Modules | |||
----------------------------------- | |||
|
|||
RLlib provides a number of RL Modules for different frameworks (e.g., PyTorch, TensorFlow, etc.). Extend these modules by inheriting from them and overriding the methods you need to customize. For example, extend :py:class:`~ray.rllib.algorithms.ppo.torch.ppo_torch_rl_module.PPOTorchRLModule` and augment it with your own customization. Then pass the new customized class into the algorithm configuration. | |||
RLlib provides a number of RL Modules for different frameworks (e.g., PyTorch, TensorFlow, etc.). | |||
Extend these modules by inheriting from them and overriding the methods you need to customize. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
say which method people should override. setup()
?
There is also a lint error. |
Co-authored-by: kourosh hakhamaneshi <31483498+kouroshHakha@users.noreply.github.com> Signed-off-by: Artur Niederfahrenhorst <attaismyname@googlemail.com>
Kourosh's suggestions Co-authored-by: kourosh hakhamaneshi <31483498+kouroshHakha@users.noreply.github.com> Signed-off-by: Artur Niederfahrenhorst <attaismyname@googlemail.com>
Signed-off-by: Artur Niederfahrenhorst <attaismyname@googlemail.com>
Signed-off-by: Artur Niederfahrenhorst <attaismyname@googlemail.com>
… improvements (ray-project#37245) Signed-off-by: Artur Niederfahrenhorst <attaismyname@googlemail.com>
… improvements (ray-project#37245) Signed-off-by: Artur Niederfahrenhorst <attaismyname@googlemail.com>
… improvements (ray-project#37245) Signed-off-by: Artur Niederfahrenhorst <attaismyname@googlemail.com> Signed-off-by: NripeshN <nn2012@hw.ac.uk>
… improvements (ray-project#37245) Signed-off-by: Artur Niederfahrenhorst <attaismyname@googlemail.com> Signed-off-by: harborn <gangsheng.wu@intel.com>
… improvements (ray-project#37245) Signed-off-by: Artur Niederfahrenhorst <attaismyname@googlemail.com>
… improvements (ray-project#37245) Signed-off-by: Artur Niederfahrenhorst <attaismyname@googlemail.com> Signed-off-by: e428265 <arvind.chandramouli@lmco.com>
… improvements (ray-project#37245) Signed-off-by: Artur Niederfahrenhorst <attaismyname@googlemail.com> Signed-off-by: Victor <vctr.y.m@example.com>
Why are these changes needed?
As discussed offline, our first iteration of docs and examples on RL Modules and Catalog need some tweaking to provide better guidance on most common user journeys and clean up some general rough edges we left.
This PR executes on a collection of action items we collected offline.
This PR also includes some changes to Catalog itself that do not change functionality but make adjustments to naming based on user feedback.