You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a question. Maybe I am in wrong and sorry to disturb you.
I understood all code but there is one thing that is not clear for me.
The A3C model get out Value and probability, that are 6 for pong right?
But when you compute:
action = prob.multinomial(num_samples=1).data
the number of sample are just 1 and clearly you have one action. But however the number of sample for pong are 6.
env.step(action.numpy())
is correct because receive only one action.
That is repeated within log_prob that you save in the list (you save 6 prob or only one?), with pong you have 6 not 1 and that could be created problem in the policy calculation.
Sorry I am bit confuse becouse
num_outputs = action_space.n
in the model.py are 6 not 1.
The text was updated successfully, but these errors were encountered:
Hey,
I have a question. Maybe I am in wrong and sorry to disturb you.
I understood all code but there is one thing that is not clear for me.
The A3C model get out Value and probability, that are 6 for pong right?
But when you compute:
action = prob.multinomial(num_samples=1).data
the number of sample are just 1 and clearly you have one action. But however the number of sample for pong are 6.
env.step(action.numpy())
is correct because receive only one action.
That is repeated within log_prob that you save in the list (you save 6 prob or only one?), with pong you have 6 not 1 and that could be created problem in the policy calculation.
Sorry I am bit confuse becouse
num_outputs = action_space.n
in the model.py are 6 not 1.
The text was updated successfully, but these errors were encountered: