You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
the 'ratio' is just the $p_{\theta}/p_{\theta_old}$, meaning if I want to compute the loss corresponding to gradient in Eqn(3), I only need the variable 'advantage' in
Hi,
Thank you for open sourcing the repo. I am reading the code and want to understand how the loss is computed.
It looks like in the final loss,
ddpo/ddpo/training/policy_gradient.py
Line 125 in f0b6ca7
the 'ratio' is just the
ddpo/ddpo/training/policy_gradient.py
Line 123 in f0b6ca7
which is essentially gaussian normalized score of the original reward value?
I guess then this loss will be non-differentiable if the reward is say the jpeg encoding length?
I must be missing something, am i ?
Thanks!
The text was updated successfully, but these errors were encountered: