[Refactor] Refactor the step to include reward and done in the 'next' tensordict #941

vmoens · 2023-03-01T11:11:41Z

Description

Moves "reward" and "done" to the "next" tensordict.

New features:

EnvBase._step now must return a TensorDict with a "next" key. This simplifies the downstream updates of the input tensordict as the t and t+1 assignments are clearly stated.
Deprecation of the Specs class
default values for reward_spec
New done_spec with default value
New output_spec that contains the reward_spec, the done_spec and the observation_spec
~~observation_spec does not need to be a CompositeSpec anymore but output_spec must. If observation_spec is a composite spec, then a nested tensordict is to be expected in "next".~~ => I decided not to proceed with this since having observation_spec as a CompositeSpec allows us to keep track of the obs names (e.g. pixels or observation). Concretely, this means that the "next" entry cannot be sampled using output_spec.rand() but using TensorDict({"reward": env.reward_spec.rand(), "done": env.done_spec.rand(), **env.observation_spec.rand()})
Deprecates key selection in parallel envs (handled by env specs, keys must now match completely).

Things to sort out:

Should we move the default done and reward specs to gym_like? Somewhere else?
Should all envs have a done and reward? Maybe these things belong to gym-like? We could have envs that do not have rewards or done states (eg pure simulation)

cc @matteobettini @XuehaiPan @btx0424 @albertbou92 @BY571 @shagunsodhani

matteobettini · 2023-03-01T13:56:50Z

@vmoens. One thing maybe to add is that we should check that the "_reset" flag, used for resetting environements allways follows the same spec as done. This will be fundamental for avoiding errors in vectorized envs

# Conflicts: # torchrl/envs/libs/vmas.py

# Conflicts: # torchrl/envs/common.py # torchrl/envs/transforms/transforms.py # torchrl/envs/vec_env.py

empty

8cec365

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 1, 2023

vmoens added 9 commits March 1, 2023 15:30

init

cffb997

amend

fba633d

amend

96088b4

amend

0135ae2

amend

df2af45

amend

bdead29

amend

dfd76ff

lint

e519fb9

Merge branch 'main' into refactor_step

e5c440f

# Conflicts: # torchrl/envs/libs/vmas.py

vmoens added bc breaking backward compatibility breaking change Refactoring Refactoring of an existing feature labels Mar 2, 2023

vmoens added 16 commits March 2, 2023 16:06

bf

f820fe7

bf

442ef1a

bf

7fb3284

bf

e19c90d

bf

2ff3aba

bf

3e8e4a3

bf

bd6c94f

amend

651537a

init

4d786bc

Merge branch 'dispatch_kwargs_ref' into refactor_step

a11a718

amend

4002cbe

amend

42a826b

amend

74cab28

amend

055250b

amend

dbf83f3

amend

6ed2bb1

vmoens added 26 commits March 8, 2023 12:00

Merge branch 'main' into refactor_keys_composite

df5a9c0

Merge branch 'refactor_keys_composite' into refactor_step

0219a8d

bf

ee3c752

Merge branch 'refactor_keys_composite' into refactor_step

72c3b53

Merge branch 'main' into refactor_step

dc21f27

# Conflicts: # torchrl/envs/common.py # torchrl/envs/transforms/transforms.py # torchrl/envs/vec_env.py

vmas and env pool

1306084

typo

98f522d

typo

baf5110

typo

2fd2cfc

typo

1beb065

typo

db7165c

typo

7db790b

typo

72c7d7a

typo

f9df31c

typo

e8f9881

typo

5832b0d

typo

44ca9f3

typo

7de0c3d

typo

c228c82

typo

c582b88

typo

a6f7f15

typo

46164fa

typo

e126837

typo

e543df3

typo

d59cbe4

typo

bed71b6

vmoens marked this pull request as ready for review March 8, 2023 14:41

bf

ea8bad1

vmoens merged commit d02f5a1 into main Mar 8, 2023

vmoens deleted the refactor_step branch March 8, 2023 18:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Refactor] Refactor the step to include reward and done in the 'next' tensordict #941

[Refactor] Refactor the step to include reward and done in the 'next' tensordict #941

vmoens commented Mar 1, 2023 •

edited

Loading

matteobettini commented Mar 1, 2023

[Refactor] Refactor the step to include reward and done in the 'next' tensordict #941

[Refactor] Refactor the step to include reward and done in the 'next' tensordict #941

Conversation

vmoens commented Mar 1, 2023 • edited Loading

Description

matteobettini commented Mar 1, 2023

vmoens commented Mar 1, 2023 •

edited

Loading