reinforcement learning methods for controlling a unicycle
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.

Using a unicycle as a control task, this repository has several aims:

  1. Benchmark different reinforcement learning models(Vanilla Policy Gradients, PPO, DDPG,TRPO, Recurrent Policy Gradients, Evolved Policy Gradients, and others) from a dynamical systems perspective.
  2. Analyse the efficacy of these systems as decoders: i.e. given an initial state, final state to what extent does the model produce trajectories which satisfy the inverse problem.
  3. Several things that will be evaluated include: sensitivity to perturbations, robustness, sample-efficiency, intrinsic dimensionality and generalisation(a.k.a. transfer learning).

Blog posts:

  1. Controlling a unicycle with Policy Gradients


  1. Passive Dynamic Walking. T. McGeer. 1990.
  2. Emergence of Locomotion Behaviours in Rich Environments. Nicolas Heess et al. 2017.
  3. Policy Gradients for Reinforcement Learning with Function Approximation. Richard S. Sutton, David McAllester, Satinder Singh & Yishay Mansour. 1999.
  4. Unicycles and Bifurcations. R. C. Johnson. 2002.
  5. Modular Multitask Reinforcement Learning with Policy Sketches. Jacob Andreas, Dan Klein, and Sergey Levine. 2017.
  6. Intrinsically Motivated Goal Exploration Processes with Automatic Curriculum Learning. Sébastien Forestier, Yoan Mollard, and Pierre-Yves Oudeyer. 2017.

Note: Given that bipedal walkers and unicycles are dynamically similar to inverse pendulums, these investigations naturally lead to experiments on bipedal dynamics.