Skip to content

jjkke88/trpo

Repository files navigation

recently the algorithm has been moved to https://github.com/jjkke88/RL_toolbox

trpo

trust region policy optimitztion base on gym and tensorflow

There are three versions of trpo, one for decrete action space like mountaincar, one for decreate action space task with image as input like atari games, and the last for continuous action space for pendulems.

The environment is base on openAI gym.

part of code refer to rllab

dependency

  • tensorflow 0.10
  • prettytensor
  • latest openai gym

constructure for code

  • baseline:baseline estimation of baseline function
  • checkpoint:folder to store model file, can not be delete or will cause some error
  • distribution:distribution base class, it can be used to calculate probability of distributions, for example Gaussian.
  • logger:have a Logger class for log data to .csv file
  • agent:for disperse action space and continous action space
  • log:store log file
  • experiment: contain many different main file, run main file can start trainning or testing
  • environment.py: environment
  • krylov.py: implement of some math method:conjugate gradient descent , calculating hessian matrix
  • parameters.py: config file
  • utils.py: implement of some basic function: getFlat, setFlat, lineaSearch

recent work

  • imple multi-thread trpo run python main_multi_thread.py to try
  • imple tensorflow distributed trpo
  • imple trpo multi-process

future work

  • complete trpo with image as input

About

trust region policy optimization base on gym and tensorflow, can run in distribution mode

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages