-
Notifications
You must be signed in to change notification settings - Fork 4.9k
[PPO2] What’s the difference between PPO1 and PPO2 ? #485
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Ppo2 uses vectorized envs to batch inference and training steps across
multiple instances of an environment
…On Sat, Jul 28, 2018, 9:43 AM Steven ***@***.***> wrote:
After reading some code,
I still not get the insight that why PPO2 is GPU optimized compared to
PPO1 ?
Does anyone has any high-level insight?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#485>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/ACXtta7Ie08soRxowLNd1kIcqpmGa4d5ks5uLJS1gaJpZM4VlGw9>
.
|
it is more that PPO1 is not gpu-optimized :) PPO1 uses 1 environment per MPI worker. In other words, when you run PPO1 with multiple MPI processes, each process creates its own copy of the environment, and its own neural net (NN). The gradients for NN updates are aggregated across the workers by virtue of using MpiAdamOptimizer class. |
Hey! @pzhokhov I saw your comment above, saying that PPO2's main advancement over PPO1 is that the former uses a more efficient form of parallelism than the latter. However, on some other website I found that there shall be more undocumented changes. Therefore, I was wondering whether a full list of all the small changes made in PPO2 compared to PPO1 exists, since I would be very interested in the details of the algorithm. :) |
After reading some code,
I still not get the insight that why PPO2 is GPU optimized compared to PPO1 ?
Does anyone has any high-level insight?
The text was updated successfully, but these errors were encountered: