Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib] Revert PPO back to old API stack (by default). New stack and PPO not ready yet on several features. #40706

Merged

Conversation

sven1977
Copy link
Contributor

@sven1977 sven1977 commented Oct 26, 2023

  1. Revert PPO back to old API stack (by default).
  • PPO on the new stack is NOT ready yet on several features, including LSTM, disabling exploration (e.g. on the eval workers), attention net, trajectory view API.
  • We will re-activate PPO on the new stack by default, once it has been fully moved to the EnvRunner APIs and supports multi-agent, connectors, and all the above mentioned currently missing functionalities.
  1. Renamed config args: _enable_rl_module_api and _enable_learner_api into a single _enable_new_api_stack setting to remove confusion. These two settings already had to be either both switch on or both switched off anyways.

Why are these changes needed?

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: Sven Mika <svenmika1977@gmail.com>
"but have not enabled the new API stack. To enable it, call "
"`config.experimental(_enable_new_api_stack=True)`."
)
# LR-schedule checking.
Copy link
Contributor Author

@sven1977 sven1977 Oct 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved here for better overview.

@@ -116,11 +115,6 @@ def get_default_learner_class(self) -> Union[Type[Learner], str]:

@override(MARWILConfig)
def validate(self) -> None:
# Can not use Tf with learner api.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already checked in validate(). We should never(!) automatically change properties inside AlgorithmConfig (unless private ones that are covered by (public) @properties).

Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
@@ -462,13 +461,13 @@ def __init__(self, algo_class=None):
self.worker_restore_timeout_s = 1800

# `self.rl_module()`
self.rl_module_spec = None
self._enable_rl_module_api = False
self._rl_module_spec = None
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made this private (plus a @Property for rl_module_spec). We should never automatically (or inside validate()) set any properties. This now makes a lot of tests better as they don't require to magically wait for things to change under the hood after calling validate().

@sven1977 sven1977 changed the title [RLlib] Revert PPO back to old API stack (by default). Not ready yet on several features. [RLlib] Revert PPO back to old API stack (by default). New stack and PPO not ready yet on several features. Oct 26, 2023
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
@sven1977 sven1977 added the tests-ok The tagger certifies test failures are unrelated and assumes personal liability. label Oct 27, 2023
Copy link
Contributor

@kouroshHakha kouroshHakha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -1784,9 +1768,6 @@ def training(
dashboard. If you're seeing that the object store is filling up,
turn down the number of remote requests in flight, or enable compression
in your experiment of timesteps.
_enable_learner_api: Whether to enable the LearnerGroup and Learner
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: don't remove the doc for it. Just add that it has been replaced with _enable_new_stack_api

I see that you throw the deprecation warning right away.

@sven1977 sven1977 merged commit eabd18e into ray-project:master Oct 27, 2023
45 of 52 checks passed
@sven1977 sven1977 deleted the revert_ppo_back_to_old_stack_by_default branch May 17, 2024 04:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tests-ok The tagger certifies test failures are unrelated and assumes personal liability.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants