You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I tried to quickly reproduce the cartpole balancing results reported in the paper.
I took the examples/trpo_cartpole.py script and adjusted basic parameters to those given in the paper, keeping to default rllab objects otherwise (see below). Running this, I get a lifetime average summed reward of 3200, quite a bit outside the reported 4869.8 ± 37.6. Looks as if learning is somewhat unstable with the provided learning rate (see plot below).
Can you take a look whether I'm missing something?
I tried to quickly reproduce the cartpole balancing results reported in the paper.
I took the
examples/trpo_cartpole.py
script and adjusted basic parameters to those given in the paper, keeping to default rllab objects otherwise (see below). Running this, I get a lifetime average summed reward of 3200, quite a bit outside the reported 4869.8 ± 37.6. Looks as if learning is somewhat unstable with the provided learning rate (see plot below).Can you take a look whether I'm missing something?
reward per episode:
The text was updated successfully, but these errors were encountered: