Skip to content
Constrained Policy Optimization
Branch: master
Clone or download
Joshua Achiam Joshua Achiam
Joshua Achiam and Joshua Achiam Release
Latest commit 5c83925 Jun 7, 2017
Type Name Latest commit message Commit time
Failed to load latest commit information.
algos Release Jun 7, 2017
baselines Release Jun 7, 2017
envs Release Jun 7, 2017
experiments Release Jun 7, 2017
safety_constraints Release Jun 7, 2017
.gitignore Release Jun 7, 2017 Release Jun 7, 2017 Release Jun 7, 2017

Constrained Policy Optimization for rllab

Constrained Policy Optimization (CPO) is an algorithm for learning policies that should satisfy behavioral constraints throughout training. [1]

This module was designed for rllab [2], and includes the implementations of

described in our paper [1].

To configure, run the following command in the root folder of rllab:

git submodule add -f sandbox/cpo

Run CPO in the Point-Gather environment with

python sandbox/cpo/experiments/ 

  1. Joshua Achiam, David Held, Aviv Tamar, Pieter Abbeel. "Constrained Policy Optimization". Proceedings of the 34th International Conference on Machine Learning (ICML), 2017.
  2. Yan Duan, Xi Chen, Rein Houthooft, John Schulman, Pieter Abbeel. "Benchmarking Deep Reinforcement Learning for Continuous Control". Proceedings of the 33rd International Conference on Machine Learning (ICML), 2016.
You can’t perform that action at this time.