[RLlib] POC: Deprecate `build_policy` (policy template) for torch only; PPOTorchPolicy #20061

sven1977 · 2021-11-04T16:06:55Z

POC: Start deprecating build_policy() (policy_template.py) as a means to sub-class TorchPolicy to build custom policies.

build_policy is still used by most algos in RLlib (this is only a POC).
This PR only implements the PPO torch policy as a direct sub-class of TorchPolicy.
Add loss method to Policy API.

TODOs:

Prove that sub-classing can solve right order of mix-in initializations + loss initializations.
Try to get rid of confusing mix-ins entirely.

Why are these changes needed?

Related issue number

Checks

I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

…deprecate_policy_template_torch_only

avnishn · 2021-11-05T18:29:49Z

rllib/agents/a3c/a3c_torch_policy.py

+        else:
+
+            def value(*args, **kwargs):
+                return 0.0


Is this the right logic? it feels like users may want to use just advantages, and maybe advantage whitening, which still depends on the output of the value function for different timesteps in the batch. @sven1977

avnishn · 2021-11-05T18:34:20Z

rllib/policy/policy_template.py

@@ -25,7 +25,7 @@
 torch, _ = try_import_torch()


-# TODO: (sven) Unify this with `build_tf_policy` as well.
+# TODO: Deprecate in favor of directly sub-classing from TorchPolicy.


when you say subclassing, do you mean that we can build classes that directly inherit from torch policy, as opposed to using the policy builder model?

oh wait thats the whole point of the PR

…deprecate_policy_template_torch_only

…orch only; PPOTorchPolicy (#20061)" This reverts commit 5b1c8e4.

…orch only; PPOTorchPolicy (#20061)" (#20399) This reverts commit 5b1c8e4.

…e) for torch only; PPOTorchPolicy (#20061)" (#20399)" This reverts commit 90dc546.

…) for torch only; PPOTorchPolicy (#20061) (#20399)" (#20417) This reverts commit 90dc546.

sven1977 added 3 commits November 4, 2021 11:44

wip

d034a7e

wip.

5fe6853

Merge branch 'master' of https://github.com/ray-project/ray into poc_…

1085d20

…deprecate_policy_template_torch_only

avnishn approved these changes Nov 5, 2021

View reviewed changes

sven1977 added 16 commits November 7, 2021 19:42

Merge branch 'master' of https://github.com/ray-project/ray into poc_…

2a86f59

…deprecate_policy_template_torch_only

wip

6f1f7f3

Merge branch 'master' of https://github.com/ray-project/ray into poc_…

95fd301

…deprecate_policy_template_torch_only

wip.

14914c1

wip.

f9bdcfd

Merge branch 'master' of https://github.com/ray-project/ray into poc_…

4683b2c

…deprecate_policy_template_torch_only

wip.

4463b08

fixes.

a5ecec8

Merge branch 'master' of https://github.com/ray-project/ray into poc_…

dbd26bb

…deprecate_policy_template_torch_only

wip.

3434097

wip.

c6362dc

Merge branch 'master' of https://github.com/ray-project/ray into poc_…

4f33be9

…deprecate_policy_template_torch_only

wip

4c7ee64

fixes.

4a21ef3

fixes.

89f254a

Merge branch 'master' of https://github.com/ray-project/ray into poc_…

d2eb3d4

…deprecate_policy_template_torch_only

sven1977 merged commit 5b1c8e4 into ray-project:master Nov 15, 2021

amogkam added a commit that referenced this pull request Nov 16, 2021

Revert "[RLlib] POC: Deprecate build_policy (policy template) for t…

bc11064

…orch only; PPOTorchPolicy (#20061)" This reverts commit 5b1c8e4.

amogkam mentioned this pull request Nov 16, 2021

Revert "[RLlib] POC: Deprecate build_policy (policy template) for torch only; PPOTorchPolicy" #20399

Merged

amogkam added a commit that referenced this pull request Nov 16, 2021

Revert "[RLlib] POC: Deprecate build_policy (policy template) for t…

90dc546

…orch only; PPOTorchPolicy (#20061)" (#20399) This reverts commit 5b1c8e4.

sven1977 added a commit that referenced this pull request Nov 16, 2021

Revert "Revert "[RLlib] POC: Deprecate build_policy (policy templat…

6a87c63

…e) for torch only; PPOTorchPolicy (#20061)" (#20399)" This reverts commit 90dc546.

sven1977 added a commit that referenced this pull request Nov 16, 2021

Revert "Revert [RLlib] POC: Deprecate build_policy (policy template…

f82880e

…) for torch only; PPOTorchPolicy (#20061) (#20399)" (#20417) This reverts commit 90dc546.

wuisawesome pushed a commit that referenced this pull request Nov 20, 2021

Revert "Revert [RLlib] POC: Deprecate build_policy (policy template…

b556949

…) for torch only; PPOTorchPolicy (#20061) (#20399)" (#20417) This reverts commit 90dc546.

wuisawesome pushed a commit that referenced this pull request Nov 21, 2021

Revert "Revert [RLlib] POC: Deprecate build_policy (policy template…

cd39329

…) for torch only; PPOTorchPolicy (#20061) (#20399)" (#20417) This reverts commit 90dc546.

sven1977 deleted the poc_deprecate_policy_template_torch_only branch June 2, 2023 20:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLlib] POC: Deprecate `build_policy` (policy template) for torch only; PPOTorchPolicy #20061

[RLlib] POC: Deprecate `build_policy` (policy template) for torch only; PPOTorchPolicy #20061

sven1977 commented Nov 4, 2021 •

edited

Loading

avnishn Nov 5, 2021

avnishn Nov 5, 2021

avnishn Nov 5, 2021

[RLlib] POC: Deprecate build_policy (policy template) for torch only; PPOTorchPolicy #20061

[RLlib] POC: Deprecate build_policy (policy template) for torch only; PPOTorchPolicy #20061

Conversation

sven1977 commented Nov 4, 2021 • edited Loading

Why are these changes needed?

Related issue number

Checks

avnishn Nov 5, 2021

Choose a reason for hiding this comment

avnishn Nov 5, 2021

Choose a reason for hiding this comment

avnishn Nov 5, 2021

Choose a reason for hiding this comment

[RLlib] POC: Deprecate `build_policy` (policy template) for torch only; PPOTorchPolicy #20061

[RLlib] POC: Deprecate `build_policy` (policy template) for torch only; PPOTorchPolicy #20061

sven1977 commented Nov 4, 2021 •

edited

Loading