-
Notifications
You must be signed in to change notification settings - Fork 799
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conjugate Gradient Optimization sometimes fails (with NaN parameters) #24
Comments
Hi @alexbeloi, the step size is computed according to the TRPO paper: https://arxiv.org/pdf/1502.05477v4.pdf. You can find the formula in Appendix.C. How negative is the computed value of |
Hi @dementrock, it appears that mean KL is nonzero before taking the step because of something I'm doing. This issue came up when debugging the ISSampler with TRPO. What I'm doing is taking (off-policy) stored paths, computing the What I expected was that the on-policy Is there a difference between |
I feel there is some confusion on my part. Where does the NPO algorithm get values for |
Oh wow, super silly bug on my part. The last line of |
@alexbeloi Re difference between agent_infos and evaluating dist_info_vars: Does replacing |
@dementrock yes, that one line fix solves the NaN issue. I made a pull request with the patch and a (now working) example of TRPO with ISSampler. |
Awesome, thanks! |
In some of my experiments I sometimes get NaN parameters when training using TRPO and TNPG algorithms, this leads to the file containing ConjugateGradientOptimizer where it appears that under some circumstances the value passed to np.sqrt in line 168-170 is negative (specifically descent_direction.dot(Hx(descent_direction)) is negative), this defines initial_step_size which is then set to NaN.
Is there any citation available for this initial step size?
The variable naming for the terms in descent_direction.dot(Hx(descent_direction)) suggests that this is an inner product with respect to a Hessian (which would be positive semi-definite), but I'm not sure that's the case.
The text was updated successfully, but these errors were encountered: