Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calculation of alpha loss in SAC is different from the original paper #28

Open
CloudyDory opened this issue Jun 15, 2022 · 0 comments
Open

Comments

@CloudyDory
Copy link

CloudyDory commented Jun 15, 2022

Hello, in the SAC paper "Soft Actor-Critic Algorithms and Applications" the calculation of the loss of alpha is:

J(alpha) = E[-alpha * (log(pi) + H)]

However, in your implementation, the calculation of the loss of alpha is instead (line 109 of "trainer.py"):

J(alpha) = E[-log(alpha) * (log(pi) + H)]

I am curious why the loss is calculated in this way. I have searched in Github for a couple of PyTorch based SAC implementations and they call calculate the loss in this way. But the TensorFlow based SAC implementations calculate the J(alpha) in the same way as the SAC paper (https://github.com/rail-berkeley/softlearning/blob/master/softlearning/algorithms/sac.py). TensorFlow implementations still calculate the gradient with respect to log(alpha), but when calculating the loss J(alpha) they use exp(log(alpha)) (which is alpha) instead of log(alpha).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant