You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
you are right, we apply random uniform actions. Given that the action bounds are finite and known, the maximum entropy policy is the uniform over that interval. We also tried applying a N(0,I) action and the results were essentially the same. It is unfortunate that this published code has a random walk that is not exactly "Brownian". Anyone interested in verifying that the results are exactly the same under Brownian motion can change the line of code you point out. Thanks for spotting this detail!
Hi,
According to your paper you apply a brownian motion to generate new seeds states (Normal 0 variance 1 ) but according to this line
https://github.com/florensacc/rllab-curriculum/blob/master/curriculum/envs/start_env.py#L233
It seems that you are applying a random uniform action with range env.action_space.bounds for the AntMaze Environment.
Can you explain why the action is not a N(0,I) ??
Thank you very much.
The text was updated successfully, but these errors were encountered: