Using a unicycle as a control task, this repository has several aims:
- Benchmark different reinforcement learning models(Vanilla Policy Gradients, PPO, DDPG,TRPO, Recurrent Policy Gradients, Evolved Policy Gradients, and others) from a dynamical systems perspective.
- Analyse the efficacy of these systems as decoders: i.e. given an initial state, final state to what extent does the model produce trajectories which satisfy the inverse problem.
- Several things that will be evaluated include: sensitivity to perturbations, robustness, sample-efficiency, intrinsic dimensionality and generalisation(a.k.a. transfer learning).
- Passive Dynamic Walking. T. McGeer. 1990.
- Emergence of Locomotion Behaviours in Rich Environments. Nicolas Heess et al. 2017.
- Policy Gradients for Reinforcement Learning with Function Approximation. Richard S. Sutton, David McAllester, Satinder Singh & Yishay Mansour. 1999.
- Unicycles and Bifurcations. R. C. Johnson. 2002.
- Modular Multitask Reinforcement Learning with Policy Sketches. Jacob Andreas, Dan Klein, and Sergey Levine. 2017.
- Intrinsically Motivated Goal Exploration Processes with Automatic Curriculum Learning. Sébastien Forestier, Yoan Mollard, and Pierre-Yves Oudeyer. 2017.