Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pytorch - question. #31

Open
MM970 opened this issue Jan 15, 2021 · 1 comment
Open

Pytorch - question. #31

MM970 opened this issue Jan 15, 2021 · 1 comment

Comments

@MM970
Copy link

MM970 commented Jan 15, 2021

@philtabor,
This is an intriguing implementation of PPO2. It is simple and it converges for cartpole quicker than any other I have seen. Taking a basic definition of "convergence" as 10 episodes in a row at total reward=max reward (200), this converges in ~230 episodes.

I tested a version with all pytorch functions converted to identical tensorflow 2.3 functions, and adding two gradient tapes to the .learn() function. It doesn't converge nearly as well. Do you have any idea why? is it a characteristic of pytorch that makes this implementation so successful?

Apologies if I have posted this twice, I am new to github.

@philtabor
Copy link
Owner

Yeah, great question. I don't have much insight into the nuanced differences between the two frameworks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants