Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Refactor] Refactor the step to include reward and done in the 'next' tensordict #941

Merged
merged 147 commits into from
Mar 8, 2023

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Mar 1, 2023

Description

Moves "reward" and "done" to the "next" tensordict.

New features:

  • EnvBase._step now must return a TensorDict with a "next" key. This simplifies the downstream updates of the input tensordict as the t and t+1 assignments are clearly stated.
  • Deprecation of the Specs class
  • default values for reward_spec
  • New done_spec with default value
  • New output_spec that contains the reward_spec, the done_spec and the observation_spec
  • observation_spec does not need to be a CompositeSpec anymore but output_spec must. If observation_spec is a composite spec, then a nested tensordict is to be expected in "next". => I decided not to proceed with this since having observation_spec as a CompositeSpec allows us to keep track of the obs names (e.g. pixels or observation). Concretely, this means that the "next" entry cannot be sampled using output_spec.rand() but using TensorDict({"reward": env.reward_spec.rand(), "done": env.done_spec.rand(), **env.observation_spec.rand()})
  • Deprecates key selection in parallel envs (handled by env specs, keys must now match completely).

Things to sort out:

  • Should we move the default done and reward specs to gym_like? Somewhere else?
  • Should all envs have a done and reward? Maybe these things belong to gym-like? We could have envs that do not have rewards or done states (eg pure simulation)

cc @matteobettini @XuehaiPan @btx0424 @albertbou92 @BY571 @shagunsodhani

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 1, 2023
@matteobettini
Copy link
Contributor

@vmoens. One thing maybe to add is that we should check that the "_reset" flag, used for resetting environements allways follows the same spec as done. This will be fundamental for avoiding errors in vectorized envs

@vmoens vmoens added bc breaking backward compatibility breaking change Refactoring Refactoring of an existing feature labels Mar 2, 2023
@vmoens vmoens marked this pull request as ready for review March 8, 2023 14:41
@vmoens vmoens merged commit d02f5a1 into main Mar 8, 2023
@vmoens vmoens deleted the refactor_step branch March 8, 2023 18:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bc breaking backward compatibility breaking change CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Refactoring Refactoring of an existing feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants