Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib] Trajectory view API: Enable by default for PPO, IMPALA, PG, A3C (tf and torch). #11747

Conversation

sven1977
Copy link
Contributor

@sven1977 sven1977 commented Nov 2, 2020

This PR is based on #11717 (which needs to be merged first!)

  • It enables the trajectory view API by default for: PPO, IMPALA, PG, and A3C for tf, tfe, tf2, and torch.
  • view requirements are stored in:
    a) the model (self.inference_view_requirements) for model forward passes
    b) the policy (self.view_requirements), which holds a superset of its model's view requirements plus its own (loss, postprocessing) view requirements.
  • view requirements that turn out to be not needed (after a test model pass, postprocessing, and loss pass through) are automatically removed and will not be used for sample collection and SampleBatch creation. Thereby, we differentiate between postprocessing and training batches (some data may be needed for postprocessing, but not for training anymore).

Why are these changes needed?

Related issue number

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@sven1977 sven1977 changed the title [RLlib] Trajectory view API: Enable by default for PPO, IMPALA, PG, A3C, ES, ARS (tf and torch). [WIP RLlib] Trajectory view API: Enable by default for PPO, IMPALA, PG, A3C, ES, ARS (tf and torch). Nov 2, 2020
@sven1977 sven1977 changed the title [WIP RLlib] Trajectory view API: Enable by default for PPO, IMPALA, PG, A3C, ES, ARS (tf and torch). [WIP RLlib] Trajectory view API: Enable by default for PPO, IMPALA, PG, A3C (tf and torch). Nov 3, 2020
@sven1977 sven1977 changed the title [WIP RLlib] Trajectory view API: Enable by default for PPO, IMPALA, PG, A3C (tf and torch). [RLlib] Trajectory view API: Enable by default for PPO, IMPALA, PG, A3C, ES, ARS (tf and torch). Nov 3, 2020
@sven1977 sven1977 requested a review from ericl November 3, 2020 19:10
@sven1977 sven1977 assigned ericl and unassigned ericl Nov 3, 2020
@sven1977 sven1977 removed the request for review from ericl November 3, 2020 19:11
@sven1977 sven1977 changed the title [RLlib] Trajectory view API: Enable by default for PPO, IMPALA, PG, A3C, ES, ARS (tf and torch). [WIP RLlib] Trajectory view API: Enable by default for PPO, IMPALA, PG, A3C, ES, ARS (tf and torch). Nov 3, 2020
@sven1977 sven1977 added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Nov 3, 2020
episode=None):
# not used, so save some bandwidth
del sample_batch.data[SampleBatch.NEXT_OBS]
return sample_batch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we keep this? What would happen if a user had this in a custom copy of Impala?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keeping it, however, in the test run, this will be a TrackingDict, not a SampleBatch, so it won't have the data prop.
I had to add a key check.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about adding this to TrackingDict for backwards compat?

@property
def data(self):
   return self   # backwards compat with SampleBatch

I really want to make sure there are zero lines of code change in the policy files.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem fixed

T = policy.config["rollout_fragment_length"]
B = tensor.shape[0] // T
# Cover cases, where we send a (small) test batch through this loss
# function.
if B == 0:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we retain compatibility here by sending a bigger batch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, yeah, maybe the test batch should be large anyways.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

made test batch large enough (32).

Copy link
Contributor

@ericl ericl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few compatibility issues left I think

@ericl ericl self-assigned this Nov 5, 2020
@ericl ericl added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Nov 5, 2020
@ericl
Copy link
Contributor

ericl commented Nov 5, 2020

@sven1977 also, please make sure to assign PRs to reviewers, otherwise it will not show up on their dashboard.

@sven1977 sven1977 changed the title [RLlib] Trajectory view API: Enable by default for PPO, APPO, IMPALA, PG, A3C (tf and torch). [RLlib] Trajectory view API: Enable by default for PPO, IMPALA, PG, A3C (tf and torch). Nov 10, 2020
@sven1977 sven1977 removed the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Nov 10, 2020
Copy link
Contributor

@ericl ericl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very close, last round of comments

episode=None):
# not used, so save some bandwidth
del sample_batch.data[SampleBatch.NEXT_OBS]
return sample_batch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about adding this to TrackingDict for backwards compat?

@property
def data(self):
   return self   # backwards compat with SampleBatch

I really want to make sure there are zero lines of code change in the policy files.

@@ -358,9 +358,6 @@ def postprocess_trajectory(
use_critic=policy.config["use_critic"])
else:
batch = sample_batch
# TODO: (sven) remove this del once we have trajectory view API fully in
# place.
del batch.data["new_obs"] # not used, so save some bandwidth
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we keep this for now? Will it crash?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted.

@@ -62,8 +63,12 @@ def build_tf_policy(
Policy, ModelV2, TensorType, TensorType, TensorType
], Tuple[TensorType, type, List[TensorType]]]] = None,
mixins: Optional[List[type]] = None,
view_requirements_fn: Optional[Callable[[Policy], Dict[
str, ViewRequirement]]] = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the only way to specify view reqs would be through custom models. So we should remove this right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but I still wanted to leave the user some opportunity to add new ones. But it's not needed by any algos right now.

@@ -174,8 +174,8 @@ def build_torch_policy(
mixins (Optional[List[type]]): Optional list of any class mixins for
the returned policy class. These mixins will be applied in order
and will have higher precedence than the TorchPolicy class.
view_requirements_fn (Callable[[],
Dict[str, ViewRequirement]]): An optional callable to retrieve
view_requirements_fn (Optional[Callable[[Policy],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this arg?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@ericl ericl added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Nov 10, 2020
@sven1977 sven1977 removed the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Nov 11, 2020
Copy link
Contributor

@ericl ericl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM; please fix the 3 comments prior to merge

episode=None):
# not used, so save some bandwidth
del sample_batch.data[SampleBatch.NEXT_OBS]
return sample_batch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem fixed

@@ -424,7 +424,7 @@ def postprocess_nstep_and_prio(policy: Policy,
batch[SampleBatch.DONES], batch[PRIO_WEIGHTS])
new_priorities = (np.abs(convert_to_numpy(td_errors)) +
policy.config["prioritized_replay_eps"])
batch.data[PRIO_WEIGHTS] = new_priorities
batch[PRIO_WEIGHTS] = new_priorities
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please revert this change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@@ -209,7 +208,7 @@ def make_time_major(policy, seq_lens, tensor, drop_last=False):
T = tensor.shape[0] // B
else:
# Important: chop the tensor into batches at known episode cut
# boundaries. TODO(ekl) this is kind of a hack
# boundaries.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it still a hack? Please restore the comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's a "hack", since IMPALA explicitly indicates through the divisibility requirement that the batch must be divisible by rollout_fragment_length.

@ericl ericl added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Nov 11, 2020
@sven1977 sven1977 merged commit 62c7ab5 into ray-project:master Nov 12, 2020
@sven1977 sven1977 removed the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Nov 12, 2020
@sven1977 sven1977 deleted the trajectory_view_api_enable_by_default_for_some_tf branch June 2, 2023 20:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants