Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib]: Cleanup examples folder: Add example restoring 1 of n agents from a checkpoint. #45462

Conversation

simonsays1980
Copy link
Collaborator

@simonsays1980 simonsays1980 commented May 21, 2024

Why are these changes needed?

Restoring certain agents from checkpoint is a frequent use case and we should provide examples for this scenario. This PR is adding such an example in the new API. stack. The example does the following:

  1. Training of n agents on Pendulum-v1 MultiEnv.
  2. Choosing the best checkpoint by return.
  3. Loading the module state for policy 0 from this checkpoint.
  4. Training the agents with policy 0 restored from checkpoint.

This example shows that training further on from a restored checkpoint - even for only a single agent - results in faster convergence.

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
… multi-agent environment.

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
@simonsays1980 simonsays1980 self-assigned this May 21, 2024
@simonsays1980 simonsays1980 added rllib RLlib related issues rllib-docs-or-examples Issues related to RLlib documentation or rllib/examples rllib-newstack labels May 21, 2024
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
@sven1977 sven1977 changed the title [RLlib] - Example that restores 1 of n agents from checkpoint. [RLlib]: Cleanup examples folder: Add example restoring 1 of n agents from a checkpoint. May 21, 2024
@sven1977 sven1977 marked this pull request as ready for review May 21, 2024 12:27
@sven1977 sven1977 assigned sven1977 and unassigned simonsays1980 May 21, 2024
rllib/BUILD Show resolved Hide resolved
@@ -0,0 +1,151 @@
"""Simple example of loading module weights for 1 of n agents from checkpoint.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Let's not use "simple".

"An example script showing how to load RLModule weights for 1 out of n agents from a checkpoint" ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, not simple for everyone lol. I know what you mean, its actually quite some complexity to make this possible in MA scenarios - and so powerful.

@@ -0,0 +1,151 @@
"""Simple example of loading module weights for 1 of n agents from checkpoint.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a tiny paragraph here saying the usual:

This example:
- runs a multi-agent Pendulum experiment with ... policies ... blabla
- saves a checkpoint of the used MultiAgentRLModule every blabla iterations
- stops the experiment after the agents reach a combined return of ...
- picks the best of both trained policies (based on episode return) and restores only the corresponding RLModule.
- runs a second experiment with the restored RLModule (single-agent) .... blabla

Copy link
Contributor

@sven1977 sven1977 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Just a few nits on comments/docstrings.

Awesome example! One more down. :)

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
…he issues. In addition added 'no_main' tag to test in BUILD b/c linter errored out.

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
@sven1977 sven1977 enabled auto-merge (squash) May 22, 2024 10:21
@github-actions github-actions bot added the go add ONLY when ready to merge, run all tests label May 22, 2024
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
@github-actions github-actions bot disabled auto-merge May 22, 2024 15:49
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
can-anyscale added a commit that referenced this pull request May 22, 2024
#45462 adds a new tests by
changing bazel rule instead of adding a new test file; this case can
only be covered by our previous logic of computing new tests; recover
this logic (in addition to the logic of computing new tests by looking
at changed test files)

Test:
- CI

---------

Signed-off-by: can <can@anyscale.com>
can-anyscale added a commit that referenced this pull request May 23, 2024
#45462 adds a new tests by
changing bazel rule instead of adding a new test file; this case can
only be covered by our previous logic of computing new tests; recover
this logic (in addition to the logic of computing new tests by looking
at changed test files)

This is a redo of #45495 which
got reverted. The difference now is that we run the bazel command in a
container instead of on the current environment. bazel seems to have
issues sharing the cache when calling bazel within bazel
(https://buildkite.com/ray-project/microcheck/builds/444#018fa23a-6e31-435b-a0ea-412ca2d1017b/175-1476)

Test:
- CI
- full microcheck run:
https://buildkite.com/ray-project/microcheck/builds/464

Signed-off-by: can <can@anyscale.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
@sven1977 sven1977 merged commit 5cb7c09 into ray-project:master May 24, 2024
6 checks passed
ryanaoleary pushed a commit to ryanaoleary/ray that referenced this pull request Jun 6, 2024
…45495)

ray-project#45462 adds a new tests by
changing bazel rule instead of adding a new test file; this case can
only be covered by our previous logic of computing new tests; recover
this logic (in addition to the logic of computing new tests by looking
at changed test files)

Test:
- CI

---------

Signed-off-by: can <can@anyscale.com>
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
ryanaoleary pushed a commit to ryanaoleary/ray that referenced this pull request Jun 6, 2024
…45507)

ray-project#45462 adds a new tests by
changing bazel rule instead of adding a new test file; this case can
only be covered by our previous logic of computing new tests; recover
this logic (in addition to the logic of computing new tests by looking
at changed test files)

This is a redo of ray-project#45495 which
got reverted. The difference now is that we run the bazel command in a
container instead of on the current environment. bazel seems to have
issues sharing the cache when calling bazel within bazel
(https://buildkite.com/ray-project/microcheck/builds/444#018fa23a-6e31-435b-a0ea-412ca2d1017b/175-1476)

Test:
- CI
- full microcheck run:
https://buildkite.com/ray-project/microcheck/builds/464

Signed-off-by: can <can@anyscale.com>
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
ryanaoleary pushed a commit to ryanaoleary/ray that referenced this pull request Jun 6, 2024
…from a checkpoint. (ray-project#45462)

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
ryanaoleary pushed a commit to ryanaoleary/ray that referenced this pull request Jun 6, 2024
…45495)

ray-project#45462 adds a new tests by
changing bazel rule instead of adding a new test file; this case can
only be covered by our previous logic of computing new tests; recover
this logic (in addition to the logic of computing new tests by looking
at changed test files)

Test:
- CI

---------

Signed-off-by: can <can@anyscale.com>
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
ryanaoleary pushed a commit to ryanaoleary/ray that referenced this pull request Jun 6, 2024
…45507)

ray-project#45462 adds a new tests by
changing bazel rule instead of adding a new test file; this case can
only be covered by our previous logic of computing new tests; recover
this logic (in addition to the logic of computing new tests by looking
at changed test files)

This is a redo of ray-project#45495 which
got reverted. The difference now is that we run the bazel command in a
container instead of on the current environment. bazel seems to have
issues sharing the cache when calling bazel within bazel
(https://buildkite.com/ray-project/microcheck/builds/444#018fa23a-6e31-435b-a0ea-412ca2d1017b/175-1476)

Test:
- CI
- full microcheck run:
https://buildkite.com/ray-project/microcheck/builds/464

Signed-off-by: can <can@anyscale.com>
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
ryanaoleary pushed a commit to ryanaoleary/ray that referenced this pull request Jun 6, 2024
…from a checkpoint. (ray-project#45462)

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
ryanaoleary pushed a commit to ryanaoleary/ray that referenced this pull request Jun 7, 2024
…45495)

ray-project#45462 adds a new tests by
changing bazel rule instead of adding a new test file; this case can
only be covered by our previous logic of computing new tests; recover
this logic (in addition to the logic of computing new tests by looking
at changed test files)

Test:
- CI

---------

Signed-off-by: can <can@anyscale.com>
ryanaoleary pushed a commit to ryanaoleary/ray that referenced this pull request Jun 7, 2024
…45507)

ray-project#45462 adds a new tests by
changing bazel rule instead of adding a new test file; this case can
only be covered by our previous logic of computing new tests; recover
this logic (in addition to the logic of computing new tests by looking
at changed test files)

This is a redo of ray-project#45495 which
got reverted. The difference now is that we run the bazel command in a
container instead of on the current environment. bazel seems to have
issues sharing the cache when calling bazel within bazel
(https://buildkite.com/ray-project/microcheck/builds/444#018fa23a-6e31-435b-a0ea-412ca2d1017b/175-1476)

Test:
- CI
- full microcheck run:
https://buildkite.com/ray-project/microcheck/builds/464

Signed-off-by: can <can@anyscale.com>
ryanaoleary pushed a commit to ryanaoleary/ray that referenced this pull request Jun 7, 2024
GabeChurch pushed a commit to GabeChurch/ray that referenced this pull request Jun 11, 2024
…45495)

ray-project#45462 adds a new tests by
changing bazel rule instead of adding a new test file; this case can
only be covered by our previous logic of computing new tests; recover
this logic (in addition to the logic of computing new tests by looking
at changed test files)

Test:
- CI

---------

Signed-off-by: can <can@anyscale.com>
Signed-off-by: gchurch <gabe1church@gmail.com>
GabeChurch pushed a commit to GabeChurch/ray that referenced this pull request Jun 11, 2024
…45507)

ray-project#45462 adds a new tests by
changing bazel rule instead of adding a new test file; this case can
only be covered by our previous logic of computing new tests; recover
this logic (in addition to the logic of computing new tests by looking
at changed test files)

This is a redo of ray-project#45495 which
got reverted. The difference now is that we run the bazel command in a
container instead of on the current environment. bazel seems to have
issues sharing the cache when calling bazel within bazel
(https://buildkite.com/ray-project/microcheck/builds/444#018fa23a-6e31-435b-a0ea-412ca2d1017b/175-1476)

Test:
- CI
- full microcheck run:
https://buildkite.com/ray-project/microcheck/builds/464

Signed-off-by: can <can@anyscale.com>
Signed-off-by: gchurch <gabe1church@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
go add ONLY when ready to merge, run all tests rllib RLlib related issues rllib-docs-or-examples Issues related to RLlib documentation or rllib/examples rllib-newstack
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants