[RLlib] Revert PPO back to old API stack (by default). New stack and PPO not ready yet on several features. #40706

sven1977 · 2023-10-26T13:07:41Z

Revert PPO back to old API stack (by default).

PPO on the new stack is NOT ready yet on several features, including LSTM, disabling exploration (e.g. on the eval workers), attention net, trajectory view API.
We will re-activate PPO on the new stack by default, once it has been fully moved to the EnvRunner APIs and supports multi-agent, connectors, and all the above mentioned currently missing functionalities.

Renamed config args: _enable_rl_module_api and _enable_learner_api into a single _enable_new_api_stack setting to remove confusion. These two settings already had to be either both switch on or both switched off anyways.

Why are these changes needed?

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: sven1977 <svenmika1977@gmail.com>

Signed-off-by: Sven Mika <svenmika1977@gmail.com>

sven1977 · 2023-10-26T14:29:52Z

rllib/algorithms/algorithm_config.py

+                "but have not enabled the new API stack. To enable it, call "
+                "`config.experimental(_enable_new_api_stack=True)`."
+            )
+        # LR-schedule checking.


Moved here for better overview.

sven1977 · 2023-10-26T14:32:05Z

rllib/algorithms/bc/bc.py

@@ -116,11 +115,6 @@ def get_default_learner_class(self) -> Union[Type[Learner], str]:

    @override(MARWILConfig)
    def validate(self) -> None:
-        # Can not use Tf with learner api.


This is already checked in validate(). We should never(!) automatically change properties inside AlgorithmConfig (unless private ones that are covered by (public) @properties).

Signed-off-by: sven1977 <svenmika1977@gmail.com>

sven1977 · 2023-10-26T16:51:24Z

rllib/algorithms/algorithm_config.py

@@ -462,13 +461,13 @@ def __init__(self, algo_class=None):
        self.worker_restore_timeout_s = 1800

        # `self.rl_module()`
-        self.rl_module_spec = None
-        self._enable_rl_module_api = False
+        self._rl_module_spec = None


Made this private (plus a @Property for rl_module_spec). We should never automatically (or inside validate()) set any properties. This now makes a lot of tests better as they don't require to magically wait for things to change under the hood after calling validate().

Signed-off-by: sven1977 <svenmika1977@gmail.com>

…rt_ppo_back_to_old_stack_by_default

Signed-off-by: sven1977 <svenmika1977@gmail.com>

…rt_ppo_back_to_old_stack_by_default

Signed-off-by: sven1977 <svenmika1977@gmail.com>

kouroshHakha

LGTM

kouroshHakha · 2023-10-27T16:11:08Z

rllib/algorithms/algorithm_config.py

@@ -1784,9 +1768,6 @@ def training(
                dashboard. If you're seeing that the object store is filling up,
                turn down the number of remote requests in flight, or enable compression
                in your experiment of timesteps.
-            _enable_learner_api: Whether to enable the LearnerGroup and Learner


~~nit: don't remove the doc for it. Just add that it has been replaced with _enable_new_stack_api~~

I see that you throw the deprecation warning right away.

sven1977 added 2 commits October 26, 2023 13:14

wip

3db0d95

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

909cc7b

Signed-off-by: sven1977 <svenmika1977@gmail.com>

sven1977 requested review from avnishn, ArturNiederfahrenhorst, smorad, maxpumperla, kouroshHakha and a team as code owners October 26, 2023 13:07

sven1977 added 3 commits October 26, 2023 15:57

wip

f95f72b

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

4215c8e

Signed-off-by: sven1977 <svenmika1977@gmail.com>

Merge branch 'master' into revert_ppo_back_to_old_stack_by_default

5cdb817

Signed-off-by: Sven Mika <svenmika1977@gmail.com>

sven1977 commented Oct 26, 2023

View reviewed changes

wip

9690ff3

Signed-off-by: sven1977 <svenmika1977@gmail.com>

sven1977 assigned kouroshHakha Oct 26, 2023

wip

211d932

Signed-off-by: sven1977 <svenmika1977@gmail.com>

sven1977 commented Oct 26, 2023

View reviewed changes

sven1977 changed the title ~~[RLlib] Revert PPO back to old API stack (by default). Not ready yet on several features.~~ [RLlib] Revert PPO back to old API stack (by default). New stack and PPO not ready yet on several features. Oct 26, 2023

sven1977 added 5 commits October 26, 2023 22:27

wip

d52d99b

Signed-off-by: sven1977 <svenmika1977@gmail.com>

Merge branch 'master' of https://github.com/ray-project/ray into reve…

1787185

…rt_ppo_back_to_old_stack_by_default

wip

e4dd7f4

Signed-off-by: sven1977 <svenmika1977@gmail.com>

Merge branch 'master' of https://github.com/ray-project/ray into reve…

5743fd4

…rt_ppo_back_to_old_stack_by_default

wip

31b1682

Signed-off-by: sven1977 <svenmika1977@gmail.com>

sven1977 added the tests-ok The tagger certifies test failures are unrelated and assumes personal liability. label Oct 27, 2023

kouroshHakha approved these changes Oct 27, 2023

View reviewed changes

sven1977 merged commit eabd18e into ray-project:master Oct 27, 2023
45 of 52 checks passed

sven1977 deleted the revert_ppo_back_to_old_stack_by_default branch May 17, 2024 04:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLlib] Revert PPO back to old API stack (by default). New stack and PPO not ready yet on several features. #40706

[RLlib] Revert PPO back to old API stack (by default). New stack and PPO not ready yet on several features. #40706

sven1977 commented Oct 26, 2023 •

edited

Loading

sven1977 Oct 26, 2023 •

edited

Loading

sven1977 Oct 26, 2023

sven1977 Oct 26, 2023

kouroshHakha left a comment

kouroshHakha Oct 27, 2023

[RLlib] Revert PPO back to old API stack (by default). New stack and PPO not ready yet on several features. #40706

[RLlib] Revert PPO back to old API stack (by default). New stack and PPO not ready yet on several features. #40706

Conversation

sven1977 commented Oct 26, 2023 • edited Loading

Why are these changes needed?

Related issue number

Checks

sven1977 Oct 26, 2023 • edited Loading

Choose a reason for hiding this comment

sven1977 Oct 26, 2023

Choose a reason for hiding this comment

sven1977 Oct 26, 2023

Choose a reason for hiding this comment

kouroshHakha left a comment

Choose a reason for hiding this comment

kouroshHakha Oct 27, 2023

Choose a reason for hiding this comment

sven1977 commented Oct 26, 2023 •

edited

Loading

sven1977 Oct 26, 2023 •

edited

Loading