trouble replicating trpo results #27

cknd · 2016-07-06T10:33:49Z

I tried to quickly reproduce the cartpole balancing results reported in the paper.

I took the examples/trpo_cartpole.py script and adjusted basic parameters to those given in the paper, keeping to default rllab objects otherwise (see below). Running this, I get a lifetime average summed reward of 3200, quite a bit outside the reported 4869.8 ± 37.6. Looks as if learning is somewhat unstable with the provided learning rate (see plot below).

Can you take a look whether I'm missing something?

from rllab.algos.trpo import TRPO
from rllab.baselines.linear_feature_baseline import LinearFeatureBaseline
from rllab.envs.box2d.cartpole_env import CartpoleEnv
from rllab.envs.normalized_env import normalize
from rllab.policies.gaussian_mlp_policy import GaussianMLPPolicy

env = normalize(CartpoleEnv())

policy = GaussianMLPPolicy(
    env_spec=env.spec,
    hidden_sizes=(100, 50, 25)  # main text, section 5
)

baseline = LinearFeatureBaseline(env_spec=env.spec)  # suppl. section 2

algo = TRPO(
    env=env,
    policy=policy,
    baseline=baseline,
    batch_size=50000,  # suppl. section 2, Table 2
    max_path_length=500, # ""
    n_itr=500, # ""
    discount=0.99, # ""
    step_size=0.05,  # suppl. section 2, table 4 (?)
)

reward per episode:

The text was updated successfully, but these errors were encountered:

cknd · 2016-07-07T14:36:12Z

My bad. I get the expected behaviour when I take the average over batches, not episodes...

cknd closed this as completed Jul 7, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

trouble replicating trpo results #27

trouble replicating trpo results #27

cknd commented Jul 6, 2016 •

edited

Loading

cknd commented Jul 7, 2016 •

edited

Loading

trouble replicating trpo results #27

trouble replicating trpo results #27

Comments

cknd commented Jul 6, 2016 • edited Loading

cknd commented Jul 7, 2016 • edited Loading

cknd commented Jul 6, 2016 •

edited

Loading

cknd commented Jul 7, 2016 •

edited

Loading