You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@philtabor,
This is an intriguing implementation of PPO2. It is simple and it converges for cartpole quicker than any other I have seen. Taking a basic definition of "convergence" as 10 episodes in a row at total reward=max reward (200), this converges in ~230 episodes.
I tested a version with all pytorch functions converted to identical tensorflow 2.3 functions, and adding two gradient tapes to the .learn() function. It doesn't converge nearly as well. Do you have any idea why? is it a characteristic of pytorch that makes this implementation so successful?
Apologies if I have posted this twice, I am new to github.
The text was updated successfully, but these errors were encountered:
@philtabor,
This is an intriguing implementation of PPO2. It is simple and it converges for cartpole quicker than any other I have seen. Taking a basic definition of "convergence" as 10 episodes in a row at total reward=max reward (200), this converges in ~230 episodes.
I tested a version with all pytorch functions converted to identical tensorflow 2.3 functions, and adding two gradient tapes to the .learn() function. It doesn't converge nearly as well. Do you have any idea why? is it a characteristic of pytorch that makes this implementation so successful?
Apologies if I have posted this twice, I am new to github.
The text was updated successfully, but these errors were encountered: