Skip to content
Go to file

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

This repository implements several algorithms:

  • Trust Region Policy Optimization [1]
  • Proximal Policy Optimization (i.e., TRPO, but using a penalty instead of a constraint on KL divergence), where each subproblem is solved with either SGD or L-BFGS
  • Cross Entropy Method

TRPO and PPO are implemented with neural-network value functions and use GAE [2].

This library is written in a modular way to allow for sharing code between TRPO and PPO variants, and to write the same code for different kinds of action spaces.


  • keras (2.0.2)
  • theano (0.9.0)
  • tabulate
  • numpy
  • scipy

To run the algorithms implemented here, you should put modular_rl on your PYTHONPATH, or run the scripts (e.g. from this directory.

Good parameter settings can be found in the experiments directory.

You can learn about the various parameters by running one of the experiment scripts with the -h flag, but providing the (required) env and agent parameters. (Those parameters determine what other parameters are available.) For example, to see the parameters of TRPO,

./ --env CartPole-v0 --agent modular_rl.agentzoo.TrpoAgent -h

To the the parameters of CEM,

./ --env=Acrobot-v0 --agent=modular_rl.agentzoo.DeterministicAgent  --n_iter=2

[1] JS, S Levine, P Moritz, M Jordan, P Abbeel, "Trust region policy optimization." arXiv preprint arXiv:1502.05477 (2015).

[2] JS, P Moritz, S Levine, M Jordan, P Abbeel, "High-dimensional continuous control using generalized advantage estimation." arXiv preprint arXiv:1506.02438 (2015).

You can’t perform that action at this time.