Skip to content
Implementation of clipped action policy gradient (CAPG) with PPO and TRPO
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.

Clipped Action Policy Gradient

This repository contains the implementation of CAPG ( with PPO and TRPO.


  • Chainer v4.1.0
  • ChainerRL latest master
  • OpenAI Gym v0.9.4 with MuJoCo envs

Use requirements.txt to install dependencies.

pip install -r requirements.txt

How to run

# Run PPO with PG and CAPG for 1M steps
python --env Humanoid-v1
python --env Humanoid-v1 --use-clipped-gaussian

# Run TRPO with PG and CAPG for 10M steps
python --env Humanoid-v1 --steps 10000000
python --env Humanoid-v1 --steps 10000000 --use-clipped-gaussian

The figure below shows average returns of training episodes of TRPO with PG and CAPG, both of which are trained for 10M timesteps on Humanoid-v1. See the paper for more results.

BibTeX entry

  author = {Fujita, Yasuhiro and Maeda, Shin-ichi},
  booktitle = {ICML},
  title = {{Clipped Action Policy Gradient}}
  year = {2018}


MIT License.

You can’t perform that action at this time.