Skip to content

clip in PPOLoss #2334

Answered by albertbou92
majid5776 asked this question in Q&A
Discussion options

You must be logged in to vote

Hello!

As you mention, the action sampling is defined by de actor, and is independent of the algorithm.

Essentially, TorchRL has a ProbabilisticActor class that can be used handle the probabilistic sampling. You can add it at the end of your model and specify the distribution you want (e.g. TanhNormal). As long as the outputs of your model (the keys of the output TensorDict) match the inputs expected by the distribution, the ProbabilisticActor will automatically sample the action and add it to the output TensorDict.

An example of how to define your ProbabilisticActor can be found in the sota-implementation for PPO for MuJoCo environments: https://github.com/pytorch/rl/blob/main/sota-imple…

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by majid5776
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants