Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

alpha convergence issues with discrete actions #75

Closed
wcarvalho opened this issue Aug 15, 2019 · 1 comment
Closed

alpha convergence issues with discrete actions #75

wcarvalho opened this issue Aug 15, 2019 · 1 comment

Comments

@wcarvalho
Copy link

Hello, great repo. It's well designed and easy to use.

I'm using a discrete action space and seem to be having convergence issues. I read #50 and saw that x<= log(# actions) was a good target entropy. I've seen the following two behaviors:
(a) when x = log(# actions), alpha seems to diverge. I've seen it in the hundreds before I stopped training. I'm not sure if this is expected behavior, but this causes the policy loss to blow up.
(b) when x < log(# actions), alpha converges to 0. This leads the model to behave deterministically, I believe. I'm also not sure if this is expected behavior.

I was wondering if you've come across these problems.

Cheers

@wcarvalho wcarvalho changed the title alpha blows up with discrete actions alpha convergence issues with discrete actions Aug 15, 2019
@vitchyr
Copy link
Collaborator

vitchyr commented Aug 15, 2019

(a) Since this implements entropy-constrained SAC, the only solution is for the policy to be uniform at random. Since there's another term (the returns) that is pushing the policy to have non-uniform action, then alpha needs to keep increasing so that the policy only pays attention to the entropy.
(b) Alpha isn't the entropy, but rather the weight on the entropy term. If alpha=0, then this means that the policy is "random enough" that alpha can be zero without it mattering. You might want to plot "Log Pi Mean" to see if that's roughly equal to the (negative) target entropy.

Feel free to re-open this issue if needed. Thanks for the kind words!

@vitchyr vitchyr closed this as completed Aug 15, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants