Question about results in the paper #43

lchenat · 2016-10-05T09:16:16Z

Hi, I recently tried to reproduce the experiment result in your paper and I found some results are somehow different from the results in the paper. Did you use default parameters for all algorithms when you did the experiment?

dementrock · 2016-10-06T08:09:03Z

Hi @lchenat, all the parameters should have been documented in the appendix of the paper. However, it is not guaranteed that you will get exactly the same result, due to difference in random seeds. I'd be happy to assist if you observe significant discrepancies.

lchenat · 2016-10-10T05:36:23Z

I did not find the parameters of DDPG in the appendix of the paper. I ran the following code and the maximal average return in each iteration is no more than 2400:

import re
import numpy
import sys
from subprocess import call
from rllab.algos.vpg import VPG
from rllab.algos.tnpg import TNPG
from rllab.algos.erwr import ERWR
from rllab.algos.reps import REPS
from rllab.algos.trpo import TRPO
from rllab.algos.cem import CEM
from rllab.algos.cma_es import CMAES
from rllab.algos.ddpg import DDPG
from rllab.baselines.linear_feature_baseline import LinearFeatureBaseline
from rllab.envs.box2d.cartpole_env import CartpoleEnv
from rllab.envs.normalized_env import normalize
from rllab.policies.gaussian_mlp_policy import GaussianMLPPolicy
from rllab.misc.instrument import stub, run_experiment_lite
from rllab.exploration_strategies.ou_strategy import OUStrategy
from rllab.policies.deterministic_mlp_policy import DeterministicMLPPolicy
from rllab.q_functions.continuous_mlp_q_function import ContinuousMLPQFunction

path = "/home/data/lchenat/rllab-master/data/local/experiment/"
exp_name = "test_cartpole_again_ddpg"

stub(globals())

env = normalize(CartpoleEnv())

policy = DeterministicMLPPolicy(
env_spec=env.spec,
hidden_sizes=(400, 300)
)
es = OUStrategy(env_spec=env.spec)
qf = ContinuousMLPQFunction(env_spec=env.spec)
algo = DDPG(
env=env,
policy=policy,
es=es,
qf=qf,
n_epochs=600,
)

#delete the previous data
call(["rm","-rf",path+exp_name])

run_experiment_lite(
algo.train(),
n_parallel=1,
snapshot_mode="last",
#seed=1,
exp_name=exp_name,
#plot=True,
)

lchenat · 2016-10-10T14:58:53Z

By the way, is there a function that can calculate the metric defined in the paper? (average over all iterations and all trajectories) The debug log only provides average return of each iteration and the number of trajectories in each iteration are not provided in some algorithms.

dementrock · 2016-10-11T04:06:00Z

Also, as mentioned in the paper (probably should have been more clear), we scaled all the rewards by 0.1 when running DDPG. Refer to https://github.com/openai/rllab/blob/master/rllab/algos/ddpg.py#L112

In general, we found this parameter to be very important, and due to time constraint at the time we weren't able to tune it extensively. You may try some other values on other tasks, which may give you even better results.

Re the second question: I think we did a very crude approximation and simply averaged the results over all iterations (treating it as if all iterations had the same number of trajectories). Feel free to submit a pull request that adds additional loggings.

lchenat · 2016-10-12T11:19:07Z

I have scale the reward by 0.1, but I still got return around 2500, are there any other parameters that I need to tune?

dementrock · 2016-10-12T17:10:43Z

Oh, you should change max path length in ddpg to 500. Otherwise, the optimal score is 2500!

lchenat · 2016-10-13T06:35:25Z

Yes, the optimal score has increased to 5000 after I have changed the max path length to 500, but the average over all iteration is around 3100. Here are average return extracted from the debug.log:

[85.1877, 22.4833, 22.2935, 22.4445, 22.561, 22.3393, 22.8141, 22.2145, 22.2697, 22.3604, 100.441, 177.388, 196.363, 183.331, 223.452, 272.554, 293.124, 407.079, 535.813, 619.828, 695.468, 872.355, 1028.65, 952.744, 645.209, 846.002, 601.686, 607.632, 656.687, 697.427, 715.399, 646.103, 646.78, 621.531, 609.173, 629.381, 598.768, 633.524, 603.093, 692.313, 627.032, 665.51, 671.895, 678.046, 721.31, 670.6, 645.387, 603.164, 594.49, 617.101, 676.009, 634.184, 627.533, 658.008, 700.695, 684.835, 622.859, 596.207, 691.321, 615.621, 612.777, 573.243, 598.272, 611.166, 596.099, 598.044, 551.066, 636.267, 740.511, 599.541, 605.533, 615.751, 710.193, 662.288, 619.205, 661.016, 582.386, 582.968, 601.911, 653.29, 617.729, 651.414, 744.331, 714.654, 658.312, 804.903, 841.202, 925.207, 855.179, 1044.97, 895.128, 936.976, 1066.89, 1406.07, 2131.26, 4021.35, 1814.43, 1877.28, 1512.61, 1993.6, 1686.47, 1991.07, 3476.89, 4138.7, 2385.71, 3379.73, 2648.44, 2970.91, 4008.72, 4683.97, 3603.48, 4999.14, 4999.04, 4998.86, 2328.25, 4534.03, 4999.28, 4999.24, 4998.56, 4283.28, 4998.47, 4998.89, 4998.86, 2223.49, 4999.18, 2702.06, 4998.8, 4998.67, 4999.02, 4998.57, 4999.6, 4998.84, 4998.5, 4998.65, 2449.9, 2153.85, 2034.24, 1275.76, 1394.86, 2258.75, 4557.9, 4998.51, 4998.52, 4998.37, 4998.73, 4998.16, 4997.71, 4997.81, 4583.94, 4998.32, 4998.46, 4998.38, 4998.21, 4804.9, 4997.79, 4998.41, 4998.03, 4998.44, 4998.26, 4998.16, 4998.07, 4998.21, 4997.73, 4998.04, 4997.81, 4998.3, 4998.33, 4998.2, 4998.27, 4998.15, 4998.6, 4998.23, 4998.63, 4998.58, 4998.57, 4999.11, 4999.32, 4999.47, 4999.41, 4790.46, 4999.45, 4999.45, 4999.57, 4999.45, 4781.79, 4999.5, 4999.46, 2834.94, 2667.89, 4999.43, 4879.07, 4999.51, 4999.5, 4256.07, 4999.24, 3749.83, 3140.73, 2184.49, 3293.37, 4276.64, 4570.93, 4549.38, 4448.15, 4999.32, 4608.16, 4999.52, 4999.38, 4999.16, 4999.43, 4790.45, 4999.54, 4724.55, 4999.43, 4627.56, 4999.58, 4999.45, 4272.88, 4999.26, 4999.38, 4784.83, 4731.7, 4696.11, 4427.15, 4165.41, 4906.99, 4422.53, 3953.47, 3692.44, 4123.02, 4571.29, 4450.07, 4999.32, 4859.32, 4999.44, 4498.9, 4895.5, 4999.22, 4589.09, 4998.88, 4733.38, 4775.73, 4999.29, 4999.18, 4640.48, 4610.55, 4935.44, 4999.2, 4883.15, 4852.51, 4900.67, 4835.74, 4500.04, 4738.27, 4531.23, 4530.79, 4999.0, 4999.18, 3974.69, 4797.54, 4998.95, 4000.32, 3699.98, 3424.3, 4998.86, 4003.68, 4878.38, 4915.73, 4763.66, 4998.63, 4688.21, 4998.92, 4926.33, 3244.25, 4507.45, 4998.75, 4998.79, 4998.45, 3060.27, 2583.36, 2717.86, 2005.12, 4911.39, 4998.91, 4998.66, 4660.82, 4789.71, 4998.43, 4998.52, 4884.03, 4541.58, 4998.37]
291results
3179.83032646

The average return drops down from 5000 to 2000-3000 from time to time, is that a normal phenomenon in ddpg?

dementrock · 2016-10-13T06:49:29Z

@lchenat The benchmark results were run over 25 million samples, to match the sample complexity used by other algorithms. This should correspond to roughly 2500 epochs. A good approximation could be to extrapolate the performance of the last few epochs to the same amount of sample, and compute the average return using all these data.

I have also observed that ddpg is sometimes unstable, even in cartpole. What you're getting seems about right. One thing we didn't try was batch normalization, which we did not get working before the paper deadline and this could be a good thing to try. You can also try other reward scaling (e.g. 0.01), which might stabilize learning more.

lchenat closed this as completed Oct 5, 2016

lchenat reopened this Oct 5, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about results in the paper #43

Question about results in the paper #43

lchenat commented Oct 5, 2016 •

edited

Loading

dementrock commented Oct 6, 2016

lchenat commented Oct 10, 2016

lchenat commented Oct 10, 2016

dementrock commented Oct 11, 2016 •

edited

Loading

lchenat commented Oct 12, 2016

dementrock commented Oct 12, 2016

lchenat commented Oct 13, 2016

dementrock commented Oct 13, 2016

Question about results in the paper #43

Question about results in the paper #43

Comments

lchenat commented Oct 5, 2016 • edited Loading

dementrock commented Oct 6, 2016

lchenat commented Oct 10, 2016

lchenat commented Oct 10, 2016

dementrock commented Oct 11, 2016 • edited Loading

lchenat commented Oct 12, 2016

dementrock commented Oct 12, 2016

lchenat commented Oct 13, 2016

dementrock commented Oct 13, 2016

lchenat commented Oct 5, 2016 •

edited

Loading

dementrock commented Oct 11, 2016 •

edited

Loading