Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib] Trajectory view API (prep PR for switching on by default across all RLlib; plumbing only) #11717

Merged
merged 12 commits into from
Nov 3, 2020

Conversation

sven1977
Copy link
Contributor

@sven1977 sven1977 commented Oct 30, 2020

Trajectory view API prep PR for switching on by default across all RLlib;

  • plumbing only (new methods (not used yet), etc..).
  • some cleanup and minor fixes.

Why are these changes needed?

Related issue number

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@sven1977 sven1977 requested a review from ericl October 30, 2020 07:46
@ericl
Copy link
Contributor

ericl commented Oct 31, 2020

Did you mean to assign? Will review

@ericl ericl self-assigned this Oct 31, 2020
@sven1977
Copy link
Contributor Author

Yes, please review. Fixing the remaining tests rn. ...

@sven1977
Copy link
Contributor Author

Tests should be all fixed now, but I'll keep checking on 'em.

…ectory_view_api_plumbing_only

� Conflicts:
�	rllib/policy/eager_tf_policy.py
@sven1977 sven1977 added the tests-ok The tagger certifies test failures are unrelated and assumes personal liability. label Nov 2, 2020
@@ -636,7 +637,8 @@ def _stats(self, outputs, samples, grads):
})
return fetches

def _initialize_loss_with_dummy_batch(self):
@override(Policy)
def _initialize_loss_dynamically(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the name change? The previous one seemed more clear.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -318,6 +319,24 @@ def is_time_major(self) -> bool:
"""
return self.time_major is True

@PublicAPI
def update_view_requirements_from_init_state(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should be exposing this as a public API of model. Instead, why not have the policies add the state view requirements internally, without mutating or otherwise requiring the model to do anything.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

@ericl ericl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Main comment is to move update_view_requirements_from_init_state into implementations instead of exposing it publicy.

SampleBatch.AGENT_INDEX: ViewRequirement(),
SampleBatch.ACTION_DIST_INPUTS: ViewRequirement(),
SampleBatch.ACTION_LOGP: ViewRequirement(),
SampleBatch.VF_PREDS: ViewRequirement(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's kind of odd to see agent-specific keys like VF_PREDS here. Can't we infer these dynamically always, and omit them from this dict?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can remove the extra-action fetch keys. These can indeed be added (if required) after the model's test call.
The others need to stay due to them being required (maybe) by the loss function. If any field is not required, it'll be removed automatically, so the user shouldn't really case. We always get the slimmest possible final view req dict.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


Note: This is the base/maximum requirement dict, from which later
some requirements will be subtracted again automatically to streamline
data collection, batch creation, and data transfer.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not keep this as the empty dict and instead infer columns to add?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not that easy:

  1. Model may require some unknown inputs (such as prev-actions, etc..), which are only visible from the model's self.inference_view_requirements
  2. Policy may need some standard inputs, such as "t", "episode_id" in its loss (or learn_on_batch) methods.

So we do need to fill in the standard values at first (then remove if not needed). The only exception are models extra-action-fetches, which probably shouldn't be in the initial dict and can be added after the model test-call.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.
So we are initially only adding the following base columns:
OBS
NEXT_OBS
ACTIONS
REWARDS
DONES
INFOS
EPS_ID
AGENT_INDEX
t (<- time step)

We then add the model's own inference requirements, including inferring some requirements from the model's init-state (done in the policy now, the model does not do this anymore).
Then we do the model forward test-pass and add the returned extra-action outs to the view reqs.
Then we call postprocessing and the loss, after which we erase all columns that are not needed (thereby differentiating between postprocessing and loss (some cols are only needed for postprocessing)).

"""
sample_batch_size = max(self.batch_divisibility_req, 2)
B = 2 # For RNNs, have B=2, T=[depends on sample_batch_size]
self._dummy_batch = self._get_dummy_batch_from_view_requirements(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not create them from a sample batch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what you mean? We need to wrap the dummy batch into a tracking dict anyways (so it doesn't really matter what's underneath, a SampleBatch or plain dict).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, _dummy_batch is now a SampleBatch, which will be wrapped into a tracking dict prioir to calling postprocessing_fn and loss.

@ericl ericl added @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. and removed tests-ok The tagger certifies test failures are unrelated and assumes personal liability. labels Nov 2, 2020
@sven1977 sven1977 removed the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Nov 3, 2020
@ericl
Copy link
Contributor

ericl commented Nov 3, 2020

Tue Nov 3 10:53:28 UTC 2020 Flake8....
rllib/models/modelv2.py:4:1: F401 'gym.spaces.Box' imported but unused
rllib/policy/policy.py:553:9: F401 'ray.rllib.agents.dqn.dqn_tf_policy.PRIO_WEIGHTS' imported but unused

@ericl ericl added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Nov 3, 2020
@ericl
Copy link
Contributor

ericl commented Nov 3, 2020

Suggested renaming:
policy.view_req -> inference_view_req | action_view_req | acting_view_req
training_view_req -> training_view_req
model.inference_view_req -> model_view_req

I think this makes the supersetting a bit more clear (inference includes model forward, as does training)

@ericl ericl merged commit 5b788cc into ray-project:master Nov 3, 2020
@sven1977 sven1977 deleted the trajectory_view_api_plumbing_only branch June 2, 2023 20:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants