Skip to content

Value clipping for PPO loss #1977

@wbinventor

Description

@wbinventor

Motivation

It would nice to add a new option to compute the clipped value loss as used in OpenAI's Baselines for PPO:

https://github.com/openai/baselines/blob/ea25b9e8b234e6ee1bca43083f8f3cf974143998/baselines/ppo2/model.py#L66-L75

Currently, the PPOLoss().loss_critic method calls the torchrl.objectives.utils.distance_loss function which supports the "l1", "l2" and "smooth_l1" loss functions. Perhaps this new clipped value loss function could be implemented as a new loss type within distance_loss.

The clipping fraction is also commonly reported as a metric by OpenAI's Baselines and this could be useful to report from PPOLoss for the clipped value loss, as well as for ClipPPOLoss for the loss_objective loss term.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions