[RLlib; docs] Docs do-over (new API stack): Rewrite/enhance "getting started" rst page.#49950
Conversation
…_redo_getting_started Signed-off-by: sven1977 <svenmika1977@gmail.com> # Conflicts: # doc/source/rllib/rllib-training.rst
…_redo_getting_started Signed-off-by: sven1977 <svenmika1977@gmail.com> # Conflicts: # doc/source/rllib/rllib-training.rst
…_redo_getting_started
…_redo_getting_started
…_redo_getting_started
…_redo_getting_started
…_redo_getting_started
Signed-off-by: sven1977 <svenmika1977@gmail.com>
…_redo_getting_started
…_redo_getting_started
simonsays1980
left a comment
There was a problem hiding this comment.
LGTM. Some nits here and there. Great introduction for users into RLlib.
| In this tutorial, you learn how to design, customize, and run an end-to-end RLlib learning experiment | ||
| from scratch. This includes picking and configuring an :py:class:`~ray.rllib.algorithms.algorithm.Algorithm`, | ||
| running a couple of training iterations, saving the state of your | ||
| :py:class:`~ray.rllib.algorithms.algorithm.Algorithm` from time to time, running a separate |
There was a problem hiding this comment.
Awesome! This is what most people are looking for.
| Python API | ||
| ~~~~~~~~~~ | ||
|
|
||
| RLlib's Python API provides all the flexibility required for applying the library to any |
There was a problem hiding this comment.
Do we have any other API than the Python one?
There was a problem hiding this comment.
Nope :D We got rid of the CLI, b/c of the maintenance burden, its stark limitations, and it being more or less a duplicate of a subset of what the python API could do.
There was a problem hiding this comment.
Well, we are working on the external access protocol for clients to connect to and communicate with RLlib, but that's heavily wip.
| ) | ||
|
|
||
|
|
||
| To scale your setup and define, how many EnvRunner actors you want to leverage, |
There was a problem hiding this comment.
Shall we put all class names into ``?
There was a problem hiding this comment.
Also we might want to add that these EnvRunners are used to rollout the policy and collect samples?
| .. testcode:: | ||
|
|
||
| # Build the Algorithm (PPO). | ||
| ppo = config.build_algo() |
There was a problem hiding this comment.
Does build still work?
There was a problem hiding this comment.
Yup, but you get a warning.
| from pprint import pprint | ||
|
|
||
| for _ in range(5): | ||
| pprint(ppo.train()) |
| # Define your custom env class by subclassing gymnasium.Env: | ||
|
|
||
| class ParrotEnv(gym.Env): | ||
| """Environment in which the agent learns to repeat the seen observations. |
| # Point your config to your custom env class: | ||
| config = ( | ||
| PPOConfig() | ||
| .environment(ParrotEnv) # add `env_config=[some Box space] to customize the env |
There was a problem hiding this comment.
Maybe a missing " ` "?
There was a problem hiding this comment.
done and clarified more. Also fixed the env accepting this suggested setting.
| class CustomTorchRLModule(TorchRLModule): | ||
| def setup(self): | ||
| # You have access here to the following already set attributes: | ||
| # self.observation_space |
| :hide: | ||
|
|
||
| At the end of your script, RLlib evaluates the trained Algorithm: | ||
| algo.stop() |
There was a problem hiding this comment.
Haha. Yes that is needed.
There was a problem hiding this comment.
We might however show it explicitly as otherwise users might run into problems.
There was a problem hiding this comment.
... in their own code
There was a problem hiding this comment.
Great idea. Will add a one-liner for this API.
|
|
||
| The `state` of an instantiated Algorithm can be retrieved by calling its | ||
| `get_state` method. It contains all information necessary | ||
| to create the Algorithm from scratch. No access to the original code (e.g. |
There was a problem hiding this comment.
Does this work now also with algorithms that had defined new attributes/methods? If the class is available it should imo.
There was a problem hiding this comment.
Yeah, I think so. Users can decide to override the get_state/set_state APIs to add more stateful stuff to their state-dicts, but the basic functionality (restoring EnvRunners, RLModule, Learner optimizer states, connector pipelines, etc..) works across all algos.
…_redo_getting_started
…_redo_getting_started
…_redo_metrics_logger Signed-off-by: sven1977 <svenmika1977@gmail.com> # Conflicts: # doc/source/rllib/package_ref/algorithm.rst
…_redo_getting_started
angelinalg
left a comment
There was a problem hiding this comment.
Just some style nits.
| - `[Course] Applied Reinforcement Learning with RLlib <https://applied-rl-course.netlify.app/>`_ | ||
| - `[Blog] Intro to RLlib: Example Environments <https://medium.com/distributed-computing-with-ray/intro-to-rllib-example-environments-3a113f532c70>`_ | ||
| - :doc:`[Guide] Getting Started with RLlib </rllib/rllib-training>` | ||
| - :doc:`[Guide] Getting Started with RLlib </rllib/getting-started>` |
There was a problem hiding this comment.
| - :doc:`[Guide] Getting Started with RLlib </rllib/getting-started>` | |
| - :doc:`[Guide] Getting started with RLlib </rllib/getting-started>` |
|
|
||
| .. _rllib-getting-started: | ||
|
|
||
| Getting Started |
There was a problem hiding this comment.
| Getting Started | |
| Getting started |
| RLlib's Python API provides all the flexibility required for applying the library to any | ||
| type of RL problem. | ||
|
|
||
| You manage RLlib experiments through an instance of the :py:class:`~ray.rllib.algorithms.algorithm.Algorithm` |
There was a problem hiding this comment.
| You manage RLlib experiments through an instance of the :py:class:`~ray.rllib.algorithms.algorithm.Algorithm` | |
| Manage RLlib experiments using an instance of the :py:class:`~ray.rllib.algorithms.algorithm.Algorithm` |
| class. An :py:class:`~ray.rllib.algorithms.algorithm.Algorithm` typically holds a neural | ||
| network for computing actions, called ``policy``, the :ref:`RL environment <rllib-key-concepts-environments>` | ||
| that you want to optimize against, a loss function, an optimizer, and some code describing the | ||
| algorithm's execution logic, like determining when to collect samples, when to update your model, etc.. |
There was a problem hiding this comment.
| algorithm's execution logic, like determining when to collect samples, when to update your model, etc.. | |
| algorithm's execution logic, like determining when to collect samples, when to update your model, etc. |
| In :ref:`multi-agent training <rllib-multi-agent-environments-doc>`, | ||
| :py:class:`~ray.rllib.algorithms.algorithm.Algorithm` manages the querying and optimization of multiple policies at once. | ||
|
|
||
| Through the algorithm's interface, you can train the policy, compute actions, or store your |
There was a problem hiding this comment.
| Through the algorithm's interface, you can train the policy, compute actions, or store your | |
| Using the algorithm's interface, you can train the policy, compute actions, or store the |
|
|
||
| pip install "gymnasium[atari,accept-rom-license,mujoco]" | ||
|
|
||
| This is all, you can now start coding against RLlib. Here is an example for running the :ref:`PPO Algorithm <ppo>` on the |
There was a problem hiding this comment.
| This is all, you can now start coding against RLlib. Here is an example for running the :ref:`PPO Algorithm <ppo>` on the | |
| You are ready to start coding against RLlib. The following is an example for running the :ref:`PPO Algorithm <ppo>` on the |
| `Taxi domain <https://gymnasium.farama.org/environments/toy_text/taxi/>`__. | ||
| You first create a `config` for the algorithm, which defines the RL environment and | ||
| any other needed settings and parameters. | ||
| You first create a `config` for the algorithm, which defines the :ref:`RL environment <rllib-key-concepts-environments>` and any other needed settings and parameters. |
There was a problem hiding this comment.
| You first create a `config` for the algorithm, which defines the :ref:`RL environment <rllib-key-concepts-environments>` and any other needed settings and parameters. | |
| First create a `config` for the algorithm, which defines the :ref:`RL environment <rllib-key-concepts-environments>` and any other needed settings and parameters. |
| for _ in range(5): | ||
| pprint(algo.train()) | ||
|
|
||
| At the end of your script, you evaluate the trained Algorithm and release all its resources: |
There was a problem hiding this comment.
| At the end of your script, you evaluate the trained Algorithm and release all its resources: | |
| At the end of the script, evaluate the trained Algorithm and release all its resources: |
| :py:class:`~ray.rllib.env.env_runner.EnvRunner` actors through the ``config.evaluation()`` method. | ||
|
|
||
| `See here <rllib-training.html#using-the-python-api>`_, if you want to learn more about the RLlib training APIs. | ||
| :ref:`See here <rllib-python-api>`, if you want to learn more about the RLlib training APIs. |
There was a problem hiding this comment.
| :ref:`See here <rllib-python-api>`, if you want to learn more about the RLlib training APIs. | |
| See :ref:`rllib-python-api`, to learn more about the RLlib training APIs. |
|
|
||
| The `state` of an instantiated Algorithm can be retrieved by calling its | ||
| `get_state` method. It contains all information necessary | ||
| to create the Algorithm from scratch. No access to the original code (e.g. |
There was a problem hiding this comment.
| to create the Algorithm from scratch. No access to the original code (e.g. | |
| to create the Algorithm from scratch. No access to the original code (e.g., |
…_redo_getting_started Signed-off-by: sven1977 <svenmika1977@gmail.com> # Conflicts: # rllib/algorithms/algorithm.py
…_redo_getting_started
…started" rst page. (#49950)
…started" rst page. (ray-project#49950)
…started" rst page. (ray-project#49950)
…started" rst page. (ray-project#49950)
Docs do-over (new API stack): Rewrite/enhance "getting started" rst page.
rllib-training.htmltogetting-started.html...testcodeblocks.Why are these changes needed?
Related issue number
Checks
git commit -s) in this PR.scripts/format.shto lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/under thecorresponding
.rstfile.