action_space.n and actions sampling #53

bionick87 · 2018-05-23T21:08:16Z

Hey,

I have a question. Maybe I am in wrong and sorry to disturb you.
I understood all code but there is one thing that is not clear for me.
The A3C model get out Value and probability, that are 6 for pong right?

But when you compute:

action = prob.multinomial(num_samples=1).data

the number of sample are just 1 and clearly you have one action. But however the number of sample for pong are 6.

env.step(action.numpy())

is correct because receive only one action.

That is repeated within log_prob that you save in the list (you save 6 prob or only one?), with pong you have 6 not 1 and that could be created problem in the policy calculation.

Sorry I am bit confuse becouse

num_outputs = action_space.n

in the model.py are 6 not 1.

ikostrikov closed this as completed Sep 2, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

action_space.n and actions sampling #53

action_space.n and actions sampling #53

bionick87 commented May 23, 2018

action_space.n and actions sampling #53

action_space.n and actions sampling #53

Comments

bionick87 commented May 23, 2018