Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need help to understand how grad-inv accelerate learning process #4

Closed
sarvghotra opened this issue Jul 12, 2016 · 1 comment
Closed

Comments

@sarvghotra
Copy link

sarvghotra commented Jul 12, 2016

I hope I am not troubling you too much by asking questions.

Could you please help me to understand the notion of the recent changes made to accelerate learning ?
BTW is it converging on Reacher-v1 ?
Could you please also mention the time taken to learn and your system configuration ?
Also, look at this paper for reward scaling, it could be a reason for divergence just in case it is not converging.

@stevenpjg
Copy link
Owner

stevenpjg commented Aug 14, 2016

The gradient inverter is actually meant for having bounds on the parameter space. Gradients are downscaled as the parameters approaches the bound and are inverted if the parameters exceeds the value range. My interpretation on acceleration in learning speed: think of the grad inverter as a threshold to the gradients. This reduces gradients noise. Also, empirically I found an increase in learning speed when I used grad inverter. For more details, you can refer the paper link

I used NVIDIA GTX 960M to train, it took me around 10 hours to generate the learning curve in https://github.com/stevenpjg/ddpg-aigym/blob/master/learning_curve.png

Yes, it works for Reacher-v1 now but takes a while. You can accelerate it using a small wrapper to use normalized environments and reward scaling as mentioned link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants