You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello! I was documenting your PPO code algo/ppo.py to improve my understanding of the algorithm, and I got confused on max_grad_norm and _use_clipped_value_loss.
If I am understanding this correctly, max_grad_norm is given to nn.utils.clip_grad_norm_() to set maximum gradient size, and _use_clipped_value_loss. However, I could not find relevent detail in the paper Proximal Policy Optimization Algorithm. If it was explicitly mentioned here, would you please point it out for me?
For L^VF, The paper seems to use the simple squared loss, equivalent to use_clipped_value_loss=False, but I could not find anything about the case when use_clipped_value_loss=True. Is this a trick not mentioned in the paper?
Thank you in advance for your help. Happy holidays!
The text was updated successfully, but these errors were encountered:
Hello! I was documenting your PPO code
algo/ppo.py
to improve my understanding of the algorithm, and I got confused onmax_grad_norm
and_use_clipped_value_loss
.If I am understanding this correctly,
max_grad_norm
is given tonn.utils.clip_grad_norm_()
to set maximum gradient size, and_use_clipped_value_loss
. However, I could not find relevent detail in the paper Proximal Policy Optimization Algorithm. If it was explicitly mentioned here, would you please point it out for me?For L^VF, The paper seems to use the simple squared loss, equivalent to
use_clipped_value_loss=False
, but I could not find anything about the case whenuse_clipped_value_loss=True
. Is this a trick not mentioned in the paper?Thank you in advance for your help. Happy holidays!
The text was updated successfully, but these errors were encountered: