clip in PPOLoss #2334

majid5776 · 2024-07-29T16:41:16Z

majid5776
Jul 29, 2024

Hi.
As you know when the actor network in PPO want to take action, it get sample mu(mean) and sigma(variance) from the policy dist.
How it(mu and sigma) implemented in torchrl?
As my question I want to use 1/sigma replace with clip_ratio that is always 0.2? that is possible?
Thank you.

Answered by albertbou92

Jul 29, 2024

Hello!

As you mention, the action sampling is defined by de actor, and is independent of the algorithm.

Essentially, TorchRL has a ProbabilisticActor class that can be used handle the probabilistic sampling. You can add it at the end of your model and specify the distribution you want (e.g. TanhNormal). As long as the outputs of your model (the keys of the output TensorDict) match the inputs expected by the distribution, the ProbabilisticActor will automatically sample the action and add it to the output TensorDict.

An example of how to define your ProbabilisticActor can be found in the sota-implementation for PPO for MuJoCo environments: https://github.com/pytorch/rl/blob/main/sota-imple…

View full answer

albertbou92 · 2024-07-29T18:21:31Z

albertbou92
Jul 29, 2024

Hello!

As you mention, the action sampling is defined by de actor, and is independent of the algorithm.

Essentially, TorchRL has a ProbabilisticActor class that can be used handle the probabilistic sampling. You can add it at the end of your model and specify the distribution you want (e.g. TanhNormal). As long as the outputs of your model (the keys of the output TensorDict) match the inputs expected by the distribution, the ProbabilisticActor will automatically sample the action and add it to the output TensorDict.

An example of how to define your ProbabilisticActor can be found in the sota-implementation for PPO for MuJoCo environments: https://github.com/pytorch/rl/blob/main/sota-implementations/ppo/utils_mujoco.py#L83

I am not sure if I understood the 1 / sigma part. I don't think there is any existing module to flip sigma and obtain 1/sigma, but if you need to do that for some reason you could create a very simple custom TensorDictModule that does it in the forward method and concatenate it at the end of your model, before the ProbabilisticActor.

For the PPO clip parameter, it can be fixed in the PPO class.
https://github.com/pytorch/rl/blob/main/torchrl/objectives/ppo.py#L640

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

clip in PPOLoss #2334

{{title}}

Replies: 1 comment

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

clip in PPOLoss #2334

majid5776 Jul 29, 2024

Replies: 1 comment

albertbou92 Jul 29, 2024

majid5776
Jul 29, 2024

albertbou92
Jul 29, 2024