Constrained Policy Optimization for rllab
Constrained Policy Optimization (CPO) is an algorithm for learning policies that should satisfy behavioral constraints throughout training. 
This module was designed for rllab , and includes the implementations of
described in our paper .
To configure, run the following command in the root folder of
git submodule add -f https://github.com/jachiam/cpo sandbox/cpo
Run CPO in the Point-Gather environment with
- Joshua Achiam, David Held, Aviv Tamar, Pieter Abbeel. "Constrained Policy Optimization". Proceedings of the 34th International Conference on Machine Learning (ICML), 2017.
- Yan Duan, Xi Chen, Rein Houthooft, John Schulman, Pieter Abbeel. "Benchmarking Deep Reinforcement Learning for Continuous Control". Proceedings of the 33rd International Conference on Machine Learning (ICML), 2016.