You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I hope I am not troubling you too much by asking questions.
Could you please help me to understand the notion of the recent changes made to accelerate learning ?
BTW is it converging on Reacher-v1 ?
Could you please also mention the time taken to learn and your system configuration ?
Also, look at this paper for reward scaling, it could be a reason for divergence just in case it is not converging.
The text was updated successfully, but these errors were encountered:
The gradient inverter is actually meant for having bounds on the parameter space. Gradients are downscaled as the parameters approaches the bound and are inverted if the parameters exceeds the value range. My interpretation on acceleration in learning speed: think of the grad inverter as a threshold to the gradients. This reduces gradients noise. Also, empirically I found an increase in learning speed when I used grad inverter. For more details, you can refer the paper link
Yes, it works for Reacher-v1 now but takes a while. You can accelerate it using a small wrapper to use normalized environments and reward scaling as mentioned link
I hope I am not troubling you too much by asking questions.
Could you please help me to understand the notion of the recent changes made to accelerate learning ?
BTW is it converging on Reacher-v1 ?
Could you please also mention the time taken to learn and your system configuration ?
Also, look at this paper for reward scaling, it could be a reason for divergence just in case it is not converging.
The text was updated successfully, but these errors were encountered: