-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib] POC: Deprecate build_policy
(policy template) for torch only; PPOTorchPolicy
#20061
[RLlib] POC: Deprecate build_policy
(policy template) for torch only; PPOTorchPolicy
#20061
Conversation
…deprecate_policy_template_torch_only
else: | ||
|
||
def value(*args, **kwargs): | ||
return 0.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the right logic? it feels like users may want to use just advantages, and maybe advantage whitening, which still depends on the output of the value function for different timesteps in the batch. @sven1977
@@ -25,7 +25,7 @@ | |||
torch, _ = try_import_torch() | |||
|
|||
|
|||
# TODO: (sven) Unify this with `build_tf_policy` as well. | |||
# TODO: Deprecate in favor of directly sub-classing from TorchPolicy. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when you say subclassing, do you mean that we can build classes that directly inherit from torch policy, as opposed to using the policy builder model?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh wait thats the whole point of the PR
…deprecate_policy_template_torch_only
…deprecate_policy_template_torch_only
…deprecate_policy_template_torch_only
…deprecate_policy_template_torch_only
…deprecate_policy_template_torch_only
…deprecate_policy_template_torch_only
POC: Start deprecating
build_policy()
(policy_template.py) as a means to sub-class TorchPolicy to build custom policies.build_policy
is still used by most algos in RLlib (this is only a POC).TorchPolicy
.loss
method to Policy API.TODOs:
Why are these changes needed?
Related issue number
Checks
scripts/format.sh
to lint the changes in this PR.