You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
advantages: Informs the actor how much better its actions were compared to critic's expectations.
this is similar to critic's loss term, but the state values predicted by critic is detacted during actor's back-prop in order to decouple actor & critic.
clipping: Bounds the learning rate of the actor network by setting a limit on how much the log probabilties generated by the actor in previous episode can differ from the newly generated ones during the training updates.
critic: Tries to minimize the difference between predicted vs actual discounted returns