Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can you elaborate on running SAC on discrete action space #22

Open
sandipan1 opened this issue Nov 11, 2018 · 6 comments
Open

Can you elaborate on running SAC on discrete action space #22

sandipan1 opened this issue Nov 11, 2018 · 6 comments

Comments

@sandipan1
Copy link

In the docs, it is mentioned about an alternate version of SAC with slight change can be used for discrete action space. Please elaborate with some more details.

@jachiam
Copy link
Contributor

jachiam commented Nov 11, 2018

You're actually the second person to ask about this! First person sent an email. I'll add a sub-section or a "you should know" to the docs to go over this soon.

@sandipan1
Copy link
Author

Thanks. Also since this tutorial is more in favor of learn-by-doing rather than being purely theoretical, it would be nice to see explanations with some images of neural network architectures to get a quick overview of how to implement. For e.g SAC implements about 5 NN for value ,value_target , gaussian_policy, 2 Q_networks . It would be more convenient to understand if there is some pictorial representation of the networks and their relation

@etendue
Copy link

etendue commented Jan 8, 2019

count me as 3rd. For discrete action space, the entropy calculation can be directly derived from distribution. The policy loss needs probably to maximize the advantage * log_probablity. What I am confused is, do we still need 2 Q networks and 1 Value network?

@Wei2Wakeup
Copy link

Is it just average over all \pi(a|s) for all actions, as it is already parameterized?

@redknightlois
Copy link

+1
I am just learning RL and looking to modify SAC for discrete action state. If you can elaborate on how to derive the equation I can implement it and send a PR.

@GusHebblewhite
Copy link

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants