Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

using the first_order_optimizer with TRPO gives error #5

Closed
abhishm opened this issue May 12, 2016 · 1 comment
Closed

using the first_order_optimizer with TRPO gives error #5

abhishm opened this issue May 12, 2016 · 1 comment

Comments

@abhishm
Copy link

abhishm commented May 12, 2016

I tried to run the Cart Pole experiment with the adam-update. The code is as following:

from rllab.algos.trpo import TRPO
from rllab.baselines.linear_feature_baseline import LinearFeatureBaseline
from rllab.envs.box2d.cartpole_env import CartpoleEnv
from rllab.envs.normalized_env import normalize
from rllab.policies.gaussian_gru_policy import GaussianGRUPolicy
from rllab.policies.gaussian_mlp_policy import GaussianMLPPolicy
from rllab.optimizers.first_order_optimizer import FirstOrderOptimizer
env = normalize(CartpoleEnv())
policy = GaussianMLPPolicy(
env_spec=env.spec,
adaptive_std=True,
# The neural network policy should have two hidden layers, each with 32 hidden units.
)
baseline = LinearFeatureBaseline(env_spec=env.spec)
algo = TRPO(
env=env,
policy=policy,
baseline=baseline,
batch_size=4000,
max_path_length=100,
n_itr=200,
discount=0.99,
step_size=0.01,
optimizer=FirstOrderOptimizer(),
)
algo.train()

However, I was not able to run the code and I got the following error:

Traceback (most recent call last):
File "/home/drl/rllab/examples/trpo_cartpole.py", line 30, in
algo.train()
File "/home/drl/rllab/rllab/algos/batch_polopt.py", line 95, in train
self.optimize_policy(itr, samples_data)
File "/home/drl/rllab/rllab/algos/npo.py", line 110, in optimize_policy
mean_kl = self.optimizer.constraint_val(all_input_values)
AttributeError: 'FirstOrderOptimizer' object has no attribute 'constraint_val'.

I found out that conjugateGradientOptimizer has this attribute but I got a different error when I put the constrain_val function onto the FirstOrderOptimizer class.

I will appreciate if you can tell me what is the objective of this constraint_val function in the optimizer call.

Thank you

@abhishm abhishm changed the title using the first_order_optimizer with TNPG gives error using the first_order_optimizer with TRPO gives error May 12, 2016
@dementrock
Copy link
Member

Hi @abhishm, TRPO doesn't work with FirstOrderOptimizer, since it requires solving a constrained optimization problem, where the constraint is given by the KL divergence between the old policy and the new one. You can choose between ConjugateGradientOptimizer or PenaltyLbfgsOptimizer (which is what the PPO algorithm uses).

If you are interested, you can also try to write a variant of the provided first order optimizer that either use a fixed penalty term, or somehow adaptively adjust it (you can look into PenaltyLbfgsOptimizer for inspirations).

@abhishm abhishm closed this as completed May 13, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants