Skip to content
Guided policy search in Python and ROS Indigo.
Python C++ Jupyter Notebook Other
Branch: iDG
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.


This code is a reimplementation of the guided policy search algorithm and iterative LQG-based trajectory optimization and supervised policy learning method, meant to help others understand, reuse, and build upon existing work. For full documentation, see

The code base is a work in progress. See the FAQ for information on planned future additions to the code.

Mujoco dependency

Create a mujoco directory in your home folder and place the downloaded mjpro131 folder there. This is important as the activation function and openscenegraph bindings will look for mujoco in this path.


This fork of the code implements the iterative Dynamic Game that was proposed in the paper:

  • iDG: A Robust Zero-Sum, Two-Player Reinforcement Learning

For details of the algorithm, please see the paper on arxiv under the name: Olalekan Ogunmolu.

Running iDG

  • First train a protagonist agent by following the instructions on the page.

  • Go to the experiments directory and run the copy_gps executable. This will copy the learned policy for the original system into a new folder.

  • We will then make a few modifications in the hyperparams directory of the new folder as follows:

For box2d experiments, we will import the MDGPS class like so at the top of the hyperparams file:

from gps.algorithm.algorithm_mdgps import AlgorithmMDGPS # for new experiments
EXP_DIR: change this to point to the new experiment directory

	|--'experiment_name': 'name_of_new_experiment'
	|--'costs_filename': EXP_DIR + 'costs.csv',
  |--'mode': 'antagonist',  # whether we are running in block-alternating ascent mode
  |--'gamma': 1e8,   # the magnitude of the additive disturbance

where a full common dict will for example look like so:

common = {
    'experiment_name': 'box2d_badmm_example' + '_' + \
            datetime.strftime(, '%m-%d-%y_%H-%M'),
    'experiment_dir': EXP_DIR,
    'data_files_dir': EXP_DIR + 'data_files/',
    'log_filename': EXP_DIR + 'log.txt',
    'costs_filename': EXP_DIR + 'costs.csv',
    'dists_filename': EXP_DIR + 'dist.txt',
    'conditions': 4,
    'mode': 'antagonist',
    'gamma': 1e8,
    'target_end_effector': np.array([0.0, 0.3, -0.5, 0.0, 0.3, -0.2]),
  • In the action_cost dict, we would want to add the gamma and mode terms as well e.g.
action_cost = {
    'type': CostAction,
    'wu': np.array([1, 1]),
    'gamma': 1e8,
    'mode': 'antagonist',
  • So also for algorithm['cost'] e.g.,
algorithm['cost'] = {
    'type': CostSum,
    'costs': [action_cost, state_cost],
    'weights': [1e-5, 1.0],
    'gamma': 1e8,
    'mode': 'antagonist',
  • Similarly, in the algorithm['init_traj_distr'] field, we would want to modify the type of the lqr implementation to
algorithm['init_traj_distr'] = {
    'type': init_lqr_robust,

to account for the new robust lqr algorithm in lin_gauss_init

  • In agent, we want to define the mode as
agent = {
	... : ...
	'mode': 'robust'


algorithm['traj_opt'] = {
    'type': TrajOptLQRPython,
    'mode': 'robust'
  • Add the following to algorithm['policy_opt'] to account for the robust policy
algorithm['policy_opt'] = {
    'robust_weights_file_prefix': EXP_DIR + 'robust_policy',

Docker Image

The docker image for the base gps codes is located at lakehanne/gps/

You can’t perform that action at this time.