You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, thanks for the great repo
I have a question,
In the function masked_kl_div of ppo.py, shouldnt the calculation be prob1*(log(prob1) - log(prob2))?
The calculation in the code is a negative KL loss that is to be maximized instead of minimized (as assumed by the code).
The text was updated successfully, but these errors were encountered:
Hi, thanks for the great repo
I have a question,
In the function
masked_kl_div
ofppo.py
, shouldnt the calculation beprob1*(log(prob1) - log(prob2))
?The calculation in the code is a negative KL loss that is to be maximized instead of minimized (as assumed by the code).
The text was updated successfully, but these errors were encountered: