Skip to content

[RLlib; docs] Docs do-over (new API stack): Rewrite/enhance "getting started" rst page.#49950

Merged
sven1977 merged 36 commits into
ray-project:masterfrom
sven1977:docs_redo_getting_started
Jan 24, 2025
Merged

[RLlib; docs] Docs do-over (new API stack): Rewrite/enhance "getting started" rst page.#49950
sven1977 merged 36 commits into
ray-project:masterfrom
sven1977:docs_redo_getting_started

Conversation

@sven1977
Copy link
Copy Markdown
Contributor

@sven1977 sven1977 commented Jan 18, 2025

Docs do-over (new API stack): Rewrite/enhance "getting started" rst page.

  • Rename file from rllib-training.html to getting-started.html.
  • Translate everything to the new API stack and simplify a little.
  • Vale cleanup.
  • Move example code into ..testcode blocks.

Why are these changes needed?

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: sven1977 <svenmika1977@gmail.com>
…_redo_getting_started

Signed-off-by: sven1977 <svenmika1977@gmail.com>

# Conflicts:
#	doc/source/rllib/rllib-training.rst
…_redo_getting_started

Signed-off-by: sven1977 <svenmika1977@gmail.com>

# Conflicts:
#	doc/source/rllib/rllib-training.rst
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
@sven1977 sven1977 added rllib RLlib related issues rllib-docs-or-examples Issues related to RLlib documentation or rllib/examples rllib-newstack labels Jan 18, 2025
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Copy link
Copy Markdown
Contributor

@simonsays1980 simonsays1980 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Some nits here and there. Great introduction for users into RLlib.

In this tutorial, you learn how to design, customize, and run an end-to-end RLlib learning experiment
from scratch. This includes picking and configuring an :py:class:`~ray.rllib.algorithms.algorithm.Algorithm`,
running a couple of training iterations, saving the state of your
:py:class:`~ray.rllib.algorithms.algorithm.Algorithm` from time to time, running a separate
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! This is what most people are looking for.

Python API
~~~~~~~~~~

RLlib's Python API provides all the flexibility required for applying the library to any
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have any other API than the Python one?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope :D We got rid of the CLI, b/c of the maintenance burden, its stark limitations, and it being more or less a duplicate of a subset of what the python API could do.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, we are working on the external access protocol for clients to connect to and communicate with RLlib, but that's heavily wip.

Comment thread doc/source/rllib/getting-started.rst Outdated
)


To scale your setup and define, how many EnvRunner actors you want to leverage,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we put all class names into ``?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also we might want to add that these EnvRunners are used to rollout the policy and collect samples?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

.. testcode::

# Build the Algorithm (PPO).
ppo = config.build_algo()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does build still work?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, but you get a warning.

from pprint import pprint

for _ in range(5):
pprint(ppo.train())
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

# Define your custom env class by subclassing gymnasium.Env:

class ParrotEnv(gym.Env):
"""Environment in which the agent learns to repeat the seen observations.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha! Awesome!

Comment thread doc/source/rllib/getting-started.rst Outdated
# Point your config to your custom env class:
config = (
PPOConfig()
.environment(ParrotEnv) # add `env_config=[some Box space] to customize the env
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a missing " ` "?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done and clarified more. Also fixed the env accepting this suggested setting.

class CustomTorchRLModule(TorchRLModule):
def setup(self):
# You have access here to the following already set attributes:
# self.observation_space
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great description!!

:hide:

At the end of your script, RLlib evaluates the trained Algorithm:
algo.stop()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha. Yes that is needed.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might however show it explicitly as otherwise users might run into problems.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... in their own code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea. Will add a one-liner for this API.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


The `state` of an instantiated Algorithm can be retrieved by calling its
`get_state` method. It contains all information necessary
to create the Algorithm from scratch. No access to the original code (e.g.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work now also with algorithms that had defined new attributes/methods? If the class is available it should imo.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think so. Users can decide to override the get_state/set_state APIs to add more stateful stuff to their state-dicts, but the basic functionality (restoring EnvRunners, RLModule, Learner optimizer states, connector pipelines, etc..) works across all algos.

Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
…_redo_metrics_logger

Signed-off-by: sven1977 <svenmika1977@gmail.com>

# Conflicts:
#	doc/source/rllib/package_ref/algorithm.rst
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Copy link
Copy Markdown
Contributor

@angelinalg angelinalg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some style nits.

- `[Course] Applied Reinforcement Learning with RLlib <https://applied-rl-course.netlify.app/>`_
- `[Blog] Intro to RLlib: Example Environments <https://medium.com/distributed-computing-with-ray/intro-to-rllib-example-environments-3a113f532c70>`_
- :doc:`[Guide] Getting Started with RLlib </rllib/rllib-training>`
- :doc:`[Guide] Getting Started with RLlib </rllib/getting-started>`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- :doc:`[Guide] Getting Started with RLlib </rllib/getting-started>`
- :doc:`[Guide] Getting started with RLlib </rllib/getting-started>`


.. _rllib-getting-started:

Getting Started
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Getting Started
Getting started

RLlib's Python API provides all the flexibility required for applying the library to any
type of RL problem.

You manage RLlib experiments through an instance of the :py:class:`~ray.rllib.algorithms.algorithm.Algorithm`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
You manage RLlib experiments through an instance of the :py:class:`~ray.rllib.algorithms.algorithm.Algorithm`
Manage RLlib experiments using an instance of the :py:class:`~ray.rllib.algorithms.algorithm.Algorithm`

class. An :py:class:`~ray.rllib.algorithms.algorithm.Algorithm` typically holds a neural
network for computing actions, called ``policy``, the :ref:`RL environment <rllib-key-concepts-environments>`
that you want to optimize against, a loss function, an optimizer, and some code describing the
algorithm's execution logic, like determining when to collect samples, when to update your model, etc..
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
algorithm's execution logic, like determining when to collect samples, when to update your model, etc..
algorithm's execution logic, like determining when to collect samples, when to update your model, etc.

In :ref:`multi-agent training <rllib-multi-agent-environments-doc>`,
:py:class:`~ray.rllib.algorithms.algorithm.Algorithm` manages the querying and optimization of multiple policies at once.

Through the algorithm's interface, you can train the policy, compute actions, or store your
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Through the algorithm's interface, you can train the policy, compute actions, or store your
Using the algorithm's interface, you can train the policy, compute actions, or store the


pip install "gymnasium[atari,accept-rom-license,mujoco]"

This is all, you can now start coding against RLlib. Here is an example for running the :ref:`PPO Algorithm <ppo>` on the
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This is all, you can now start coding against RLlib. Here is an example for running the :ref:`PPO Algorithm <ppo>` on the
You are ready to start coding against RLlib. The following is an example for running the :ref:`PPO Algorithm <ppo>` on the

`Taxi domain <https://gymnasium.farama.org/environments/toy_text/taxi/>`__.
You first create a `config` for the algorithm, which defines the RL environment and
any other needed settings and parameters.
You first create a `config` for the algorithm, which defines the :ref:`RL environment <rllib-key-concepts-environments>` and any other needed settings and parameters.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
You first create a `config` for the algorithm, which defines the :ref:`RL environment <rllib-key-concepts-environments>` and any other needed settings and parameters.
First create a `config` for the algorithm, which defines the :ref:`RL environment <rllib-key-concepts-environments>` and any other needed settings and parameters.

for _ in range(5):
pprint(algo.train())

At the end of your script, you evaluate the trained Algorithm and release all its resources:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
At the end of your script, you evaluate the trained Algorithm and release all its resources:
At the end of the script, evaluate the trained Algorithm and release all its resources:

:py:class:`~ray.rllib.env.env_runner.EnvRunner` actors through the ``config.evaluation()`` method.

`See here <rllib-training.html#using-the-python-api>`_, if you want to learn more about the RLlib training APIs.
:ref:`See here <rllib-python-api>`, if you want to learn more about the RLlib training APIs.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
:ref:`See here <rllib-python-api>`, if you want to learn more about the RLlib training APIs.
See :ref:`rllib-python-api`, to learn more about the RLlib training APIs.


The `state` of an instantiated Algorithm can be retrieved by calling its
`get_state` method. It contains all information necessary
to create the Algorithm from scratch. No access to the original code (e.g.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
to create the Algorithm from scratch. No access to the original code (e.g.
to create the Algorithm from scratch. No access to the original code (e.g.,

@sven1977 sven1977 enabled auto-merge (squash) January 23, 2025 10:20
@github-actions github-actions Bot added the go add ONLY when ready to merge, run all tests label Jan 23, 2025
…_redo_getting_started

Signed-off-by: sven1977 <svenmika1977@gmail.com>

# Conflicts:
#	rllib/algorithms/algorithm.py
@github-actions github-actions Bot disabled auto-merge January 23, 2025 10:22
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
@sven1977 sven1977 enabled auto-merge (squash) January 23, 2025 15:30
@sven1977 sven1977 merged commit 66602b1 into ray-project:master Jan 24, 2025
@sven1977 sven1977 deleted the docs_redo_getting_started branch January 24, 2025 09:39
srinathk10 pushed a commit that referenced this pull request Feb 2, 2025
xsuler pushed a commit to antgroup/ant-ray that referenced this pull request Mar 4, 2025
xsuler pushed a commit to antgroup/ant-ray that referenced this pull request Mar 4, 2025
park12sj pushed a commit to park12sj/ray that referenced this pull request Mar 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-backlog go add ONLY when ready to merge, run all tests rllib RLlib related issues rllib-docs-or-examples Issues related to RLlib documentation or rllib/examples

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants