Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

action_space.n and actions sampling #53

Closed
bionick87 opened this issue May 23, 2018 · 0 comments
Closed

action_space.n and actions sampling #53

bionick87 opened this issue May 23, 2018 · 0 comments

Comments

@bionick87
Copy link

Hey,

I have a question. Maybe I am in wrong and sorry to disturb you.
I understood all code but there is one thing that is not clear for me.
The A3C model get out Value and probability, that are 6 for pong right?

But when you compute:

action = prob.multinomial(num_samples=1).data

the number of sample are just 1 and clearly you have one action. But however the number of sample for pong are 6.

env.step(action.numpy())

is correct because receive only one action.

That is repeated within log_prob that you save in the list (you save 6 prob or only one?), with pong you have 6 not 1 and that could be created problem in the policy calculation.

Sorry I am bit confuse becouse

num_outputs = action_space.n

in the model.py are 6 not 1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants