About target entropy in SAC #467

lidongke · 2020-11-17T11:38:20Z

Hi~keng
I have some problems about SAC-discrete.
I found this version code:https://github.com/p-christ/Deep-Reinforcement-Learning-Algorithms-with-PyTorch which has not use Gumbel-softmax, and its target entropy is set as a positive value with -np.log(1.0/acition_space.size()) * 0.98 and the log_alpha will be increase to greater than 1.0 with the update step. But the sac for continuous in this version also use a negative value with -np.prod(acition_space.size()).
But in your code, you use Gumbel-softmax and set both discrete and continuous's target entropy with a negative value with -np.prod(acition_space.size()),so the log_alpha will decrease with the update step.
I really want to know how can i set the target entropy?Why target entropy in @p-christ 's code is different from you?

https://stackoverflow.com/questions/56226133/%20soft-actor-critic-with-discrete-action-space

@kengz

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About target entropy in SAC #467

About target entropy in SAC #467

lidongke commented Nov 17, 2020 •

edited

Loading

About target entropy in SAC #467

About target entropy in SAC #467

Comments

lidongke commented Nov 17, 2020 • edited Loading

lidongke commented Nov 17, 2020 •

edited

Loading