Skip to content

Implementation of Trust Region Policy Optimization and Proximal Policy Optimization algorithms on the objective of Robot Walk.

Notifications You must be signed in to change notification settings

reinai/HumanoidRobotWalk

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HumanoidRobotWalk

Implementation of Trust Region Policy Optimization and Proximal Policy Optimization algorithms on the objective of Robot Walk.

Programs & libraries needed in order to run this project

  • OpenAI Gym : A toolkit for developing and comparing reinforcement learning algorithms
  • PyBullet Gym : PyBullet Robotics Environments fully compatible with Gym toolkit (uses the Bullet physics engine)
  • PyTorch : Open source machine learning library based on the Torch library
  • NumPy : Fundamental package for scientific computing with Python
  • matplotlib : Plotting library for the Python programming language and its numerical mathematics extension NumPy

Algorithms pseudocodes

Trust Region Policy Optimization (TRPO) - implemented by Vasilije Pantić

alt text

Proximal Policy Optimization (PPO) - implemented by Nikola Zubić

alt text

How to run?

For TRPO: Run trpo_main.py at root/code/trpo/,
For PPO: Run ppo_main.py at root/code/ppo/,
and enter the absolute file path to the trained model.

Trained models are available at: root/code/trained_models/.

In motion

TRPO

TRPO_in_motion

PPO

PPO_in_motion

Numerical results

Training time [h] 24 96
TRPO
Training time [h] 6.5 48
PPO
Click on image for full view.

Copyright (c) 2021 Nikola Zubić, Vasilije Pantić