Skip to content

[PPO2] What’s the difference between PPO1 and PPO2 ? #485

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
stevenh-tw opened this issue Jul 28, 2018 · 3 comments
Closed

[PPO2] What’s the difference between PPO1 and PPO2 ? #485

stevenh-tw opened this issue Jul 28, 2018 · 3 comments
Labels

Comments

@stevenh-tw
Copy link

After reading some code,
I still not get the insight that why PPO2 is GPU optimized compared to PPO1 ?

Does anyone has any high-level insight?

@andytwigg
Copy link
Contributor

andytwigg commented Jul 28, 2018 via email

@pzhokhov
Copy link
Collaborator

pzhokhov commented Aug 17, 2018

it is more that PPO1 is not gpu-optimized :) PPO1 uses 1 environment per MPI worker. In other words, when you run PPO1 with multiple MPI processes, each process creates its own copy of the environment, and its own neural net (NN). The gradients for NN updates are aggregated across the workers by virtue of using MpiAdamOptimizer class.
PPO2 implementation (while with a recent updates it can use MPI as well) uses different version of parallelism. Head process with a single neural net creates a bunch of subprocesses that run separate environments (run environments == take actions and produce next observations and rewards). As @andytwigg describes, the observations and rewards from these multiple environments in subprocesses are batched together in the head process. For visual observations, that creates a big enough batch so that computation of NN gradients on a GPU starts making sense (esp. if using convnets). Note that, by default, multiple environments in subprocesses are only used for atari and retro video games, but not, for instance, for mujoco (because observations there are not visual).
All that being said - ppo1 is now obsolete, and its functionality (including mpi) is fully covered by ppo2.
Closing this; please reopen if some unclarity remains.

@Bick95
Copy link

Bick95 commented Dec 12, 2020

Hey!

@pzhokhov I saw your comment above, saying that PPO2's main advancement over PPO1 is that the former uses a more efficient form of parallelism than the latter.

However, on some other website I found that there shall be more undocumented changes.

Therefore, I was wondering whether a full list of all the small changes made in PPO2 compared to PPO1 exists, since I would be very interested in the details of the algorithm. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants