-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib] Revert PPO back to old API stack (by default). New stack and PPO not ready yet on several features. #40706
[RLlib] Revert PPO back to old API stack (by default). New stack and PPO not ready yet on several features. #40706
Conversation
"but have not enabled the new API stack. To enable it, call " | ||
"`config.experimental(_enable_new_api_stack=True)`." | ||
) | ||
# LR-schedule checking. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved here for better overview.
@@ -116,11 +115,6 @@ def get_default_learner_class(self) -> Union[Type[Learner], str]: | |||
|
|||
@override(MARWILConfig) | |||
def validate(self) -> None: | |||
# Can not use Tf with learner api. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is already checked in validate()
. We should never(!) automatically change properties inside AlgorithmConfig (unless private ones that are covered by (public) @properties).
@@ -462,13 +461,13 @@ def __init__(self, algo_class=None): | |||
self.worker_restore_timeout_s = 1800 | |||
|
|||
# `self.rl_module()` | |||
self.rl_module_spec = None | |||
self._enable_rl_module_api = False | |||
self._rl_module_spec = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made this private (plus a @Property for rl_module_spec
). We should never automatically (or inside validate()
) set any properties. This now makes a lot of tests better as they don't require to magically wait for things to change under the hood after calling validate()
.
…rt_ppo_back_to_old_stack_by_default
…rt_ppo_back_to_old_stack_by_default
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -1784,9 +1768,6 @@ def training( | |||
dashboard. If you're seeing that the object store is filling up, | |||
turn down the number of remote requests in flight, or enable compression | |||
in your experiment of timesteps. | |||
_enable_learner_api: Whether to enable the LearnerGroup and Learner |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: don't remove the doc for it. Just add that it has been replaced with _enable_new_stack_api
I see that you throw the deprecation warning right away.
_enable_rl_module_api
and_enable_learner_api
into a single_enable_new_api_stack
setting to remove confusion. These two settings already had to be either both switch on or both switched off anyways.Why are these changes needed?
Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.