Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About target entropy in SAC #467

Open
lidongke opened this issue Nov 17, 2020 · 0 comments
Open

About target entropy in SAC #467

lidongke opened this issue Nov 17, 2020 · 0 comments

Comments

@lidongke
Copy link

lidongke commented Nov 17, 2020

Hi~keng
I have some problems about SAC-discrete.
I found this version code:https://github.com/p-christ/Deep-Reinforcement-Learning-Algorithms-with-PyTorch which has not use Gumbel-softmax, and its target entropy is set as a positive value with -np.log(1.0/acition_space.size()) * 0.98 and the log_alpha will be increase to greater than 1.0 with the update step. But the sac for continuous in this version also use a negative value with -np.prod(acition_space.size()).
But in your code, you use Gumbel-softmax and set both discrete and continuous's target entropy with a negative value with -np.prod(acition_space.size()),so the log_alpha will decrease with the update step.
I really want to know how can i set the target entropy?Why target entropy in @p-christ 's code is different from you?

https://stackoverflow.com/questions/56226133/%20soft-actor-critic-with-discrete-action-space

@kengz

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant