Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib] POC: Deprecate build_policy (policy template) for torch only; PPOTorchPolicy #20061

Merged

Conversation

sven1977
Copy link
Contributor

@sven1977 sven1977 commented Nov 4, 2021

POC: Start deprecating build_policy() (policy_template.py) as a means to sub-class TorchPolicy to build custom policies.

  • build_policy is still used by most algos in RLlib (this is only a POC).
  • This PR only implements the PPO torch policy as a direct sub-class of TorchPolicy.
  • Add loss method to Policy API.

TODOs:

  • Prove that sub-classing can solve right order of mix-in initializations + loss initializations.
  • Try to get rid of confusing mix-ins entirely.

Why are these changes needed?

Related issue number

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

else:

def value(*args, **kwargs):
return 0.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the right logic? it feels like users may want to use just advantages, and maybe advantage whitening, which still depends on the output of the value function for different timesteps in the batch. @sven1977

@@ -25,7 +25,7 @@
torch, _ = try_import_torch()


# TODO: (sven) Unify this with `build_tf_policy` as well.
# TODO: Deprecate in favor of directly sub-classing from TorchPolicy.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when you say subclassing, do you mean that we can build classes that directly inherit from torch policy, as opposed to using the policy builder model?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh wait thats the whole point of the PR

@sven1977 sven1977 merged commit 5b1c8e4 into ray-project:master Nov 15, 2021
amogkam added a commit that referenced this pull request Nov 16, 2021
…orch only; PPOTorchPolicy (#20061)"

This reverts commit 5b1c8e4.
amogkam added a commit that referenced this pull request Nov 16, 2021
sven1977 added a commit that referenced this pull request Nov 16, 2021
…e) for torch only; PPOTorchPolicy (#20061)" (#20399)"

This reverts commit 90dc546.
sven1977 added a commit that referenced this pull request Nov 16, 2021
…) for torch only; PPOTorchPolicy (#20061) (#20399)" (#20417)

This reverts commit 90dc546.
wuisawesome pushed a commit that referenced this pull request Nov 20, 2021
…) for torch only; PPOTorchPolicy (#20061) (#20399)" (#20417)

This reverts commit 90dc546.
wuisawesome pushed a commit that referenced this pull request Nov 21, 2021
…) for torch only; PPOTorchPolicy (#20061) (#20399)" (#20417)

This reverts commit 90dc546.
@sven1977 sven1977 deleted the poc_deprecate_policy_template_torch_only branch June 2, 2023 20:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants