[RLlib] Trajectory view API: Enable by default for PPO, IMPALA, PG, A3C (tf and torch). #11747

sven1977 · 2020-11-02T11:44:17Z

This PR is based on #11717 (which needs to be merged first!)

It enables the trajectory view API by default for: PPO, IMPALA, PG, and A3C for tf, tfe, tf2, and torch.
view requirements are stored in:
a) the model (self.inference_view_requirements) for model forward passes
b) the policy (self.view_requirements), which holds a superset of its model's view requirements plus its own (loss, postprocessing) view requirements.
view requirements that turn out to be not needed (after a test model pass, postprocessing, and loss pass through) are automatically removed and will not be used for sample collection and SampleBatch creation. Thereby, we differentiate between postprocessing and training batches (some data may be needed for postprocessing, but not for training anymore).

Why are these changes needed?

Related issue number

Checks

I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

…ectory_view_api_plumbing_only

…ectory_view_api_plumbing_only � Conflicts: � rllib/policy/eager_tf_policy.py

…ectory_view_api_enable_by_default_for_some_tf

ericl · 2020-11-05T17:58:08Z

rllib/agents/impala/vtrace_tf_policy.py

-                           episode=None):
-    # not used, so save some bandwidth
-    del sample_batch.data[SampleBatch.NEXT_OBS]
-    return sample_batch


Can we keep this? What would happen if a user had this in a custom copy of Impala?

keeping it, however, in the test run, this will be a TrackingDict, not a SampleBatch, so it won't have the data prop.
I had to add a key check.

How about adding this to TrackingDict for backwards compat?

@property def data(self): return self # backwards compat with SampleBatch

I really want to make sure there are zero lines of code change in the policy files.

sounds good.

This doesn't seem fixed

ericl · 2020-11-05T17:58:37Z

rllib/agents/impala/vtrace_torch_policy.py

        T = policy.config["rollout_fragment_length"]
        B = tensor.shape[0] // T
+        # Cover cases, where we send a (small) test batch through this loss
+        # function.
+        if B == 0:


Can we retain compatibility here by sending a bigger batch?

ok, yeah, maybe the test batch should be large anyways.

done, removed.

made test batch large enough (32).

ericl

Few compatibility issues left I think

ericl · 2020-11-05T18:00:18Z

@sven1977 also, please make sure to assign PRs to reviewers, otherwise it will not show up on their dashboard.

…ectory_view_api_enable_by_default_for_some_tf

ericl

Very close, last round of comments

ericl · 2020-11-10T19:19:49Z

rllib/agents/impala/vtrace_tf_policy.py

-                           episode=None):
-    # not used, so save some bandwidth
-    del sample_batch.data[SampleBatch.NEXT_OBS]
-    return sample_batch


How about adding this to TrackingDict for backwards compat?

@property def data(self): return self # backwards compat with SampleBatch

I really want to make sure there are zero lines of code change in the policy files.

ericl · 2020-11-10T19:21:30Z

rllib/agents/ppo/appo_tf_policy.py

@@ -358,9 +358,6 @@ def postprocess_trajectory(
            use_critic=policy.config["use_critic"])
    else:
        batch = sample_batch
-    # TODO: (sven) remove this del once we have trajectory view API fully in
-    #  place.
-    del batch.data["new_obs"]  # not used, so save some bandwidth


Can we keep this for now? Will it crash?

ericl · 2020-11-10T19:23:17Z

rllib/policy/tf_policy_template.py

@@ -62,8 +63,12 @@ def build_tf_policy(
            Policy, ModelV2, TensorType, TensorType, TensorType
        ], Tuple[TensorType, type, List[TensorType]]]] = None,
        mixins: Optional[List[type]] = None,
+        view_requirements_fn: Optional[Callable[[Policy], Dict[
+            str, ViewRequirement]]] = None,


I thought the only way to specify view reqs would be through custom models. So we should remove this right?

Yes, but I still wanted to leave the user some opportunity to add new ones. But it's not needed by any algos right now.

ericl · 2020-11-10T19:24:02Z

rllib/policy/torch_policy_template.py

@@ -174,8 +174,8 @@ def build_torch_policy(
        mixins (Optional[List[type]]): Optional list of any class mixins for
            the returned policy class. These mixins will be applied in order
            and will have higher precedence than the TorchPolicy class.
-        view_requirements_fn (Callable[[],
-            Dict[str, ViewRequirement]]): An optional callable to retrieve
+        view_requirements_fn (Optional[Callable[[Policy],


Remove this arg?

ericl

LGTM; please fix the 3 comments prior to merge

ericl · 2020-11-11T18:52:11Z

rllib/agents/impala/vtrace_tf_policy.py

-                           episode=None):
-    # not used, so save some bandwidth
-    del sample_batch.data[SampleBatch.NEXT_OBS]
-    return sample_batch


This doesn't seem fixed

ericl · 2020-11-11T18:54:42Z

rllib/agents/dqn/dqn_tf_policy.py

@@ -424,7 +424,7 @@ def postprocess_nstep_and_prio(policy: Policy,
            batch[SampleBatch.DONES], batch[PRIO_WEIGHTS])
        new_priorities = (np.abs(convert_to_numpy(td_errors)) +
                          policy.config["prioritized_replay_eps"])
-        batch.data[PRIO_WEIGHTS] = new_priorities
+        batch[PRIO_WEIGHTS] = new_priorities


Please revert this change.

ericl · 2020-11-11T18:55:00Z

rllib/agents/impala/vtrace_torch_policy.py

@@ -209,7 +208,7 @@ def make_time_major(policy, seq_lens, tensor, drop_last=False):
        T = tensor.shape[0] // B
    else:
        # Important: chop the tensor into batches at known episode cut
-        # boundaries. TODO(ekl) this is kind of a hack
+        # boundaries.


Isn't it still a hack? Please restore the comment.

I don't think it's a "hack", since IMPALA explicitly indicates through the divisibility requirement that the batch must be divisible by rollout_fragment_length.

…ectory_view_api_enable_by_default_for_some_tf

sven1977 added 9 commits October 30, 2020 08:44

WIP.

e71814c

Merge branch 'master' of https://github.com/ray-project/ray into traj…

a8530cc

…ectory_view_api_plumbing_only

WIP.

1ffa52c

Fix.

3d5c567

Fixes and LINT.

8bd7075

Merge branch 'master' of https://github.com/ray-project/ray into traj…

a4ed42f

…ectory_view_api_plumbing_only

Fix.

593426d

Merge branch 'master' of https://github.com/ray-project/ray into traj…

e60ae06

…ectory_view_api_plumbing_only � Conflicts: � rllib/policy/eager_tf_policy.py

WIP.

b7a737d

sven1977 changed the title ~~[RLlib] Trajectory view API: Enable by default for PPO, IMPALA, PG, A3C, ES, ARS (tf and torch).~~ [WIP RLlib] Trajectory view API: Enable by default for PPO, IMPALA, PG, A3C, ES, ARS (tf and torch). Nov 2, 2020

sven1977 added 11 commits November 2, 2020 14:20

Merge branch 'master' of https://github.com/ray-project/ray into traj…

67b80f0

…ectory_view_api_enable_by_default_for_some_tf

Fixes.

5a7020c

Fixes.

3ca31cd

LINT and fixes.

f9cd241

Fix.

dad163c

Merge branch 'master' of https://github.com/ray-project/ray into traj…

0cb32a9

…ectory_view_api_enable_by_default_for_some_tf

WIP.

8ad714f

Fix and remove ARS/ES again (follow-up PR).

a00f663

Fix and remove ARS/ES again (follow-up PR).

6355a2a

Fix APPO and DDPPO w/ traj. view API.

eb05419

WIP.

f9ab364

sven1977 changed the title ~~[WIP RLlib] Trajectory view API: Enable by default for PPO, IMPALA, PG, A3C, ES, ARS (tf and torch).~~ [WIP RLlib] Trajectory view API: Enable by default for PPO, IMPALA, PG, A3C (tf and torch). Nov 3, 2020

sven1977 changed the title ~~[WIP RLlib] Trajectory view API: Enable by default for PPO, IMPALA, PG, A3C (tf and torch).~~ [RLlib] Trajectory view API: Enable by default for PPO, IMPALA, PG, A3C, ES, ARS (tf and torch). Nov 3, 2020

sven1977 requested a review from ericl November 3, 2020 19:10

sven1977 assigned ericl and unassigned ericl Nov 3, 2020

sven1977 removed the request for review from ericl November 3, 2020 19:11

sven1977 changed the title ~~[RLlib] Trajectory view API: Enable by default for PPO, IMPALA, PG, A3C, ES, ARS (tf and torch).~~ [WIP RLlib] Trajectory view API: Enable by default for PPO, IMPALA, PG, A3C, ES, ARS (tf and torch). Nov 3, 2020

sven1977 added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Nov 3, 2020

LINT.

2826753

ericl reviewed Nov 5, 2020

View reviewed changes

ericl requested changes Nov 5, 2020

View reviewed changes

ericl self-assigned this Nov 5, 2020

ericl added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Nov 5, 2020

sven1977 added 6 commits November 9, 2020 17:55

Merge branch 'master' of https://github.com/ray-project/ray into traj…

2786d7c

…ectory_view_api_enable_by_default_for_some_tf

Fixes.

e1f9380

WIP.

6f44e44

WIP.

59bac46

Fixes.

8515748

LINT and Fixes.

47ce4bd

sven1977 changed the title ~~[RLlib] Trajectory view API: Enable by default for PPO, APPO, IMPALA, PG, A3C (tf and torch).~~ [RLlib] Trajectory view API: Enable by default for PPO, IMPALA, PG, A3C (tf and torch). Nov 10, 2020

sven1977 removed the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Nov 10, 2020

ericl requested changes Nov 10, 2020

View reviewed changes

ericl added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Nov 10, 2020

LINT and Fixes.

5b6b73f

sven1977 removed the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Nov 11, 2020

ericl approved these changes Nov 11, 2020

View reviewed changes

ericl added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Nov 11, 2020

sven1977 added 7 commits November 12, 2020 08:30

Fixes.

7321701

Fixes.

952ec40

Fixes.

05b9aa9

LINT.

177296d

Merge branch 'master' of https://github.com/ray-project/ray into traj…

0f3908a

…ectory_view_api_enable_by_default_for_some_tf

LINT and fixes.

8c95d0e

LINT.

42594db

sven1977 merged commit 62c7ab5 into ray-project:master Nov 12, 2020

sven1977 removed the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Nov 12, 2020

sven1977 deleted the trajectory_view_api_enable_by_default_for_some_tf branch June 2, 2023 20:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLlib] Trajectory view API: Enable by default for PPO, IMPALA, PG, A3C (tf and torch). #11747

[RLlib] Trajectory view API: Enable by default for PPO, IMPALA, PG, A3C (tf and torch). #11747

sven1977 commented Nov 2, 2020 •

edited

Loading

ericl Nov 5, 2020

sven1977 Nov 10, 2020

ericl Nov 10, 2020

sven1977 Nov 11, 2020

sven1977 Nov 11, 2020

ericl Nov 11, 2020

ericl Nov 5, 2020

sven1977 Nov 9, 2020

sven1977 Nov 10, 2020

sven1977 Nov 10, 2020

ericl left a comment

ericl commented Nov 5, 2020

ericl left a comment

ericl Nov 10, 2020

ericl Nov 10, 2020

sven1977 Nov 11, 2020

ericl Nov 10, 2020

sven1977 Nov 11, 2020

ericl Nov 10, 2020

sven1977 Nov 11, 2020

ericl left a comment

ericl Nov 11, 2020

ericl Nov 11, 2020

sven1977 Nov 12, 2020

ericl Nov 11, 2020

sven1977 Nov 12, 2020

[RLlib] Trajectory view API: Enable by default for PPO, IMPALA, PG, A3C (tf and torch). #11747

[RLlib] Trajectory view API: Enable by default for PPO, IMPALA, PG, A3C (tf and torch). #11747

Conversation

sven1977 commented Nov 2, 2020 • edited Loading

Why are these changes needed?

Related issue number

Checks

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ericl left a comment

Choose a reason for hiding this comment

ericl commented Nov 5, 2020

ericl left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ericl left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sven1977 commented Nov 2, 2020 •

edited

Loading