[RLlib] Trajectory view API (prep PR for switching on by default across all RLlib; plumbing only) #11717

sven1977 · 2020-10-30T07:46:00Z

Trajectory view API prep PR for switching on by default across all RLlib;

plumbing only (new methods (not used yet), etc..).
some cleanup and minor fixes.

Why are these changes needed?

Related issue number

Checks

I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

…ectory_view_api_plumbing_only

ericl · 2020-10-31T01:58:12Z

Did you mean to assign? Will review

sven1977 · 2020-10-31T15:07:31Z

Yes, please review. Fixing the remaining tests rn. ...

…ectory_view_api_plumbing_only

sven1977 · 2020-10-31T15:19:26Z

Tests should be all fixed now, but I'll keep checking on 'em.

…ectory_view_api_plumbing_only � Conflicts: � rllib/policy/eager_tf_policy.py

ericl · 2020-11-02T20:54:03Z

rllib/policy/eager_tf_policy.py

@@ -636,7 +637,8 @@ def _stats(self, outputs, samples, grads):
                })
            return fetches

-        def _initialize_loss_with_dummy_batch(self):
+        @override(Policy)
+        def _initialize_loss_dynamically(self):


Why the name change? The previous one seemed more clear.

ericl · 2020-11-02T20:58:16Z

rllib/models/modelv2.py

@@ -318,6 +319,24 @@ def is_time_major(self) -> bool:
        """
        return self.time_major is True

+    @PublicAPI
+    def update_view_requirements_from_init_state(self):


I don't think we should be exposing this as a public API of model. Instead, why not have the policies add the state view requirements internally, without mutating or otherwise requiring the model to do anything.

ericl

Main comment is to move update_view_requirements_from_init_state into implementations instead of exposing it publicy.

ericl · 2020-11-02T21:00:23Z

rllib/policy/policy.py

+            SampleBatch.AGENT_INDEX: ViewRequirement(),
+            SampleBatch.ACTION_DIST_INPUTS: ViewRequirement(),
+            SampleBatch.ACTION_LOGP: ViewRequirement(),
+            SampleBatch.VF_PREDS: ViewRequirement(),


It's kind of odd to see agent-specific keys like VF_PREDS here. Can't we infer these dynamically always, and omit them from this dict?

I can remove the extra-action fetch keys. These can indeed be added (if required) after the model's test call.
The others need to stay due to them being required (maybe) by the loss function. If any field is not required, it'll be removed automatically, so the user shouldn't really case. We always get the slimmest possible final view req dict.

ericl · 2020-11-02T21:00:38Z

rllib/policy/policy.py

+
+        Note: This is the base/maximum requirement dict, from which later
+        some requirements will be subtracted again automatically to streamline
+        data collection, batch creation, and data transfer.


Why not keep this as the empty dict and instead infer columns to add?

It's not that easy:

Model may require some unknown inputs (such as prev-actions, etc..), which are only visible from the model's self.inference_view_requirements

Policy may need some standard inputs, such as "t", "episode_id" in its loss (or learn_on_batch) methods.

So we do need to fill in the standard values at first (then remove if not needed). The only exception are models extra-action-fetches, which probably shouldn't be in the initial dict and can be added after the model test-call.

done.
So we are initially only adding the following base columns:
OBS
NEXT_OBS
ACTIONS
REWARDS
DONES
INFOS
EPS_ID
AGENT_INDEX
t (<- time step)

We then add the model's own inference requirements, including inferring some requirements from the model's init-state (done in the policy now, the model does not do this anymore).
Then we do the model forward test-pass and add the returned extra-action outs to the view reqs.
Then we call postprocessing and the loss, after which we erase all columns that are not needed (thereby differentiating between postprocessing and loss (some cols are only needed for postprocessing)).

ericl · 2020-11-02T21:01:43Z

rllib/policy/policy.py

+        """
+        sample_batch_size = max(self.batch_divisibility_req, 2)
+        B = 2  # For RNNs, have B=2, T=[depends on sample_batch_size]
+        self._dummy_batch = self._get_dummy_batch_from_view_requirements(


Why not create them from a sample batch?

Not sure what you mean? We need to wrap the dummy batch into a tracking dict anyways (so it doesn't really matter what's underneath, a SampleBatch or plain dict).

done, _dummy_batch is now a SampleBatch, which will be wrapped into a tracking dict prioir to calling postprocessing_fn and loss.

…ectory_view_api_plumbing_only

ericl · 2020-11-03T17:58:55Z

Tue Nov 3 10:53:28 UTC 2020 Flake8....
rllib/models/modelv2.py:4:1: F401 'gym.spaces.Box' imported but unused
rllib/policy/policy.py:553:9: F401 'ray.rllib.agents.dqn.dqn_tf_policy.PRIO_WEIGHTS' imported but unused

ericl · 2020-11-03T18:16:18Z

Suggested renaming:
policy.view_req -> inference_view_req | action_view_req | acting_view_req
training_view_req -> training_view_req
model.inference_view_req -> model_view_req

I think this makes the supersetting a bit more clear (inference includes model forward, as does training)

sven1977 added 2 commits October 30, 2020 08:44

WIP.

e71814c

Merge branch 'master' of https://github.com/ray-project/ray into traj…

a8530cc

…ectory_view_api_plumbing_only

sven1977 requested a review from ericl October 30, 2020 07:46

sven1977 added 3 commits October 30, 2020 08:58

WIP.

1ffa52c

Fix.

3d5c567

Fixes and LINT.

8bd7075

ericl self-assigned this Oct 31, 2020

sven1977 added 2 commits October 31, 2020 16:08

Merge branch 'master' of https://github.com/ray-project/ray into traj…

a4ed42f

…ectory_view_api_plumbing_only

Fix.

593426d

Merge branch 'master' of https://github.com/ray-project/ray into traj…

e60ae06

…ectory_view_api_plumbing_only � Conflicts: � rllib/policy/eager_tf_policy.py

sven1977 added the tests-ok The tagger certifies test failures are unrelated and assumes personal liability. label Nov 2, 2020

ericl reviewed Nov 2, 2020

View reviewed changes

ericl requested changes Nov 2, 2020

View reviewed changes

ericl reviewed Nov 2, 2020

View reviewed changes

ericl added @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. and removed tests-ok The tagger certifies test failures are unrelated and assumes personal liability. labels Nov 2, 2020

sven1977 added 2 commits November 3, 2020 09:36

Merge branch 'master' of https://github.com/ray-project/ray into traj…

35b4cac

…ectory_view_api_plumbing_only

WIP.

cebbbe0

sven1977 removed the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Nov 3, 2020

sven1977 added 2 commits November 3, 2020 10:58

WIP.

cc55770

WIP.

368e167

ericl approved these changes Nov 3, 2020

View reviewed changes

ericl added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Nov 3, 2020

sven1977 mentioned this pull request Nov 3, 2020

[RLlib] Trajectory view API: Enable by default for PPO, IMPALA, PG, A3C (tf and torch). #11747

Merged

6 tasks

ericl merged commit 5b788cc into ray-project:master Nov 3, 2020

sven1977 deleted the trajectory_view_api_plumbing_only branch June 2, 2023 20:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLlib] Trajectory view API (prep PR for switching on by default across all RLlib; plumbing only) #11717

[RLlib] Trajectory view API (prep PR for switching on by default across all RLlib; plumbing only) #11717

sven1977 commented Oct 30, 2020 •

edited

ericl commented Oct 31, 2020

sven1977 commented Oct 31, 2020

sven1977 commented Oct 31, 2020

ericl Nov 2, 2020

sven1977 Nov 3, 2020

ericl Nov 2, 2020

sven1977 Nov 3, 2020

sven1977 Nov 3, 2020

ericl left a comment

ericl Nov 2, 2020

sven1977 Nov 3, 2020

sven1977 Nov 3, 2020

ericl Nov 2, 2020

sven1977 Nov 3, 2020

sven1977 Nov 3, 2020

ericl Nov 2, 2020

sven1977 Nov 3, 2020

sven1977 Nov 3, 2020

ericl commented Nov 3, 2020

ericl commented Nov 3, 2020 •

edited

[RLlib] Trajectory view API (prep PR for switching on by default across all RLlib; plumbing only) #11717

[RLlib] Trajectory view API (prep PR for switching on by default across all RLlib; plumbing only) #11717

Conversation

sven1977 commented Oct 30, 2020 • edited

Why are these changes needed?

Related issue number

Checks

ericl commented Oct 31, 2020

sven1977 commented Oct 31, 2020

sven1977 commented Oct 31, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ericl left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ericl commented Nov 3, 2020

ericl commented Nov 3, 2020 • edited

sven1977 commented Oct 30, 2020 •

edited

ericl commented Nov 3, 2020 •

edited