using the first_order_optimizer with TRPO gives error #5

abhishm · 2016-05-12T19:33:46Z

I tried to run the Cart Pole experiment with the adam-update. The code is as following:

from rllab.algos.trpo import TRPO
from rllab.baselines.linear_feature_baseline import LinearFeatureBaseline
from rllab.envs.box2d.cartpole_env import CartpoleEnv
from rllab.envs.normalized_env import normalize
from rllab.policies.gaussian_gru_policy import GaussianGRUPolicy
from rllab.policies.gaussian_mlp_policy import GaussianMLPPolicy
from rllab.optimizers.first_order_optimizer import FirstOrderOptimizer
env = normalize(CartpoleEnv())
policy = GaussianMLPPolicy(
env_spec=env.spec,
adaptive_std=True,
# The neural network policy should have two hidden layers, each with 32 hidden units.
)
baseline = LinearFeatureBaseline(env_spec=env.spec)
algo = TRPO(
env=env,
policy=policy,
baseline=baseline,
batch_size=4000,
max_path_length=100,
n_itr=200,
discount=0.99,
step_size=0.01,
optimizer=FirstOrderOptimizer(),
)
algo.train()

However, I was not able to run the code and I got the following error:

Traceback (most recent call last):
File "/home/drl/rllab/examples/trpo_cartpole.py", line 30, in
algo.train()
File "/home/drl/rllab/rllab/algos/batch_polopt.py", line 95, in train
self.optimize_policy(itr, samples_data)
File "/home/drl/rllab/rllab/algos/npo.py", line 110, in optimize_policy
mean_kl = self.optimizer.constraint_val(all_input_values)
AttributeError: 'FirstOrderOptimizer' object has no attribute 'constraint_val'.

I found out that conjugateGradientOptimizer has this attribute but I got a different error when I put the constrain_val function onto the FirstOrderOptimizer class.

I will appreciate if you can tell me what is the objective of this constraint_val function in the optimizer call.

Thank you

dementrock · 2016-05-13T05:37:50Z

Hi @abhishm, TRPO doesn't work with FirstOrderOptimizer, since it requires solving a constrained optimization problem, where the constraint is given by the KL divergence between the old policy and the new one. You can choose between ConjugateGradientOptimizer or PenaltyLbfgsOptimizer (which is what the PPO algorithm uses).

If you are interested, you can also try to write a variant of the provided first order optimizer that either use a fixed penalty term, or somehow adaptively adjust it (you can look into PenaltyLbfgsOptimizer for inspirations).

abhishm changed the title ~~using the first_order_optimizer with TNPG gives error~~ using the first_order_optimizer with TRPO gives error May 12, 2016

abhishm closed this as completed May 13, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

using the first_order_optimizer with TRPO gives error #5

using the first_order_optimizer with TRPO gives error #5

abhishm commented May 12, 2016 •

edited

Loading

dementrock commented May 13, 2016

using the first_order_optimizer with TRPO gives error #5

using the first_order_optimizer with TRPO gives error #5

Comments

abhishm commented May 12, 2016 • edited Loading

dementrock commented May 13, 2016

abhishm commented May 12, 2016 •

edited

Loading