pzhokhov Fix ppo2 with MPI bug, other minor fixes (#735)
* joshim5 changes (width and height to WarpFrame wrapper)

* match network output with action distribution via a linear layer only if necessary (#167)

* support color vs. grayscale option in WarpFrame wrapper (#166)

* support color vs. grayscale option in WarpFrame wrapper

* Support color in other wrappers

* Updated per Peters suggestions

* fixing test failures

* ppo2 with microbatches (#168)

* pass microbatch_size to the model during construction

* microbatch fixes and test (#169)

* microbatch fixes and test

* tiny cleanup

* added assertions to the test

* vpg-related fix

* Peterz joshim5 subclass ppo2 model (#170)

* microbatch fixes and test

* tiny cleanup

* added assertions to the test

* vpg-related fix

* subclassing the model to make microbatched version of model WIP

* made microbatched model a subclass of ppo2 Model

* flake8 complaint

* mpi-less ppo2 (resolving merge conflict)

* flake8 and mpi4py imports in ppo2/model.py

* more un-mpying

* merge master

* updates to the benchmark viewer code + autopep8 (#184)

* viz docs and syntactic sugar wip

* update viewer yaml to use persistent volume claims

* move plot_util to baselines.common, update links

* use 1Tb hard drive for results viewer

* small updates to benchmark vizualizer code

* autopep8

* autopep8

* any folder can be a benchmark

* massage games image a little bit

* fixed --preload option in app.py

* remove preload from run_viewer.sh

* remove pdb breakpoints

* update bench-viewer.yaml

* fixed bug (#185)

* fixed bug 

it's wrong to do the else statement, because no other nodes would start.

* changed the fix slightly
Latest commit 97e0391 Nov 27, 2018

README.md

PPO2

  • Original paper: https://arxiv.org/abs/1707.06347

  • Baselines blog post: https://blog.openai.com/openai-baselines-ppo/

  • python -m baselines.run --alg=ppo2 --env=PongNoFrameskip-v4 runs the algorithm for 40M frames = 10M timesteps on an Atari Pong. See help (-h) for more options.

  • python -m baselines.run --alg=ppo2 --env=Ant-v2 --num_timesteps=1e6 runs the algorithm for 1M frames on a Mujoco Ant environment.

  • also refer to the repo-wide README.md