Skip to content

Conversation

@tcbegley
Copy link
Contributor

This PR deprecates the get_dist and get_dist_params methods of TensorDictSequential which could previously be used if the last element of the sequence was a ProbabilisticTensorDictModule. This pattern is a little confusing, so recommended usage going forward is to instead construct a sequence then wrap the sequence with ProbabilisticTensorDictModule.

I.e.

ProbabilisticModule(Sequential(Module, Module, Module))

instead of

Sequential(Module, Module, ProbabilisticModule(Module))

Note that this change will require changes in TorchRL, and hence should not be merged until those can also be implemented.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 29, 2022
@vmoens
Copy link
Collaborator

vmoens commented Nov 29, 2022

Thanks for this @tcbegley
Indeed we should wait until we have refactored the usages in torchrl.
Here are some guidelines if you'd like to take care of it.
In TorchRL, we were using this feature to build sequential models like

tdm = TensorDictModule(common, actor, critic)

Then you could get a probabilistic actor by doing

tdm[:2]

and the critic by doing

tdm[[0, 2]]

Now this won't work anymore.
The approach will be

tdm = ProbabilisticActor(TensorDictModule(common, actor, critic), ...)
actor = ProbabilisticActor(TensorDictModule[:2], ...)
critic = TensorDictModule[[0, 2]]

This concretely involves changes in ActorCriticWrapper, ActorCriticOperator and ActorValueOperator (or in parts of the code where those classes are used).

The algorithms to look at in TorchRL that use probabilistic actors are: PPO, SAC, A2C, Dreamer and REDQ. DDPG also uses a ProbabilisticActor to map stuff to the right space (see make_ddpg_model function)

Not all will use the Sequential module though.
A good first check will be to run test_modules.py and test_costs.py with this branch and check what works and what doesn't.

@tcbegley
Copy link
Contributor Author

Thanks for the pointers! I made a start on this yesterday (PR is targeting the functional-changes branch of my fork so doesn't show up in the upstream TorchRL repo): tcbegley/rl#5

Let me revisit and compare against what you've written though, make sure that I've got things right.

Rather than constructing a sequence whose last element is a
ProbabilisticTensorDictModule, users should wrap a sequence with
ProbabilisticTensorDictModule.
@tcbegley
Copy link
Contributor Author

Superceded by #109

@tcbegley tcbegley closed this Dec 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants