-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib] SAC RLModule on new API stack. #42568
Conversation
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
|
||
@ExperimentalAPI | ||
class SACRLModule(RLModule, RLModuleWithTargetNetworksInterface): | ||
def setup(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add override
decorator.
Question: Should we call super
here or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should try to be super consistent and clear about the two decorators:
@OverrideToImplementCustomLogic
and
@OverrideToImplementCustomLogic_CallToSuperRecommended
Can you check, whether these already exist in the base: RLModule.setup()
and if not add them as applicable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both are used in the super class. I think I remember that I ran into an error when calling super().setup()
. I have to recheck, when testing.
self.action_dist_cls = catalog.get_action_dist_cls(framework=self.framework) | ||
|
||
# Define the temperature. | ||
self.alpha = self.config.model_config_dict["initial_alpha"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dumb question: How do we get this config.initial_alpha into the model config dict?
I had the same problem for DreamerV3 (and other algos) and I think we need to come up with a non-hacky solution for this problem.
Constraint here:
- We don't want to have to pass the entire AlgorithmConfig into RLModule constructors as we envision RLModules to be used completely outside of RLlib in production.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, we actually do not need it in the model_config_dict
as it is now only used in the learner. In general we have to solve this more consistently.
It might play here a role as the old stack does use a model config for the poilicy and one for the q model. If we want to use something like this we should implement such a solution in the same breath.
"post_fcnet_activation" | ||
] | ||
|
||
# We don't have the exact (framework specific) action dist class yet and thus |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, same in PPO. I feel like we should change the Catalog API here:
Provide the framework
already in the c'tor (imo, there is no good reason NOT to do this), then users can already construct the pi-config in the c'tor, which is much cleaner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WOuld this also work with dynamic action spaces?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But do we already support dynamic action spaces?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The thing is that right now, we construct one module config in the c'tor and the other one during build()
, which is pretty bad imo. We should be consistent in where and when we do what. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, we do not support them right now - at least not for the new stack I guess.
Co-authored-by: Sven Mika <sven@anyscale.io> Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
…-rl-module Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
…es in 'SACLearner'. Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Co-authored-by: Sven Mika <sven@anyscale.io> Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>
Co-authored-by: Sven Mika <sven@anyscale.io> Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>
Co-authored-by: Sven Mika <sven@anyscale.io> Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>
Co-authored-by: Sven Mika <sven@anyscale.io> Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>
Signed-off-by: Sven Mika <sven@anyscale.io>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM now. Thanks for the fixes @simonsays1980 !
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
…-rl-module Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
…project#42568) Signed-off-by: khluu <khluu000@gmail.com>
…project#42568) Signed-off-by: khluu <khluu000@gmail.com>
Why are these changes needed?
Transferring to the new stack
SAC
needs to be implemented with anRLModule
. This PR delivers the files needed to define and configure theSACRLModule
. Implementation is inPyTorch
.Related issue number
Closes #37778
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.