Conjugate Gradient Optimization sometimes fails (with NaN parameters) #24

alexbeloi · 2016-06-30T23:55:26Z

In some of my experiments I sometimes get NaN parameters when training using TRPO and TNPG algorithms, this leads to the file containing ConjugateGradientOptimizer where it appears that under some circumstances the value passed to np.sqrt in line 168-170 is negative (specifically descent_direction.dot(Hx(descent_direction)) is negative), this defines initial_step_size which is then set to NaN.

Is there any citation available for this initial step size?

The variable naming for the terms in descent_direction.dot(Hx(descent_direction)) suggests that this is an inner product with respect to a Hessian (which would be positive semi-definite), but I'm not sure that's the case.

dementrock · 2016-07-01T06:12:35Z

Hi @alexbeloi, the step size is computed according to the TRPO paper: https://arxiv.org/pdf/1502.05477v4.pdf. You can find the formula in Appendix.C.

How negative is the computed value of descent_direction.dot(Hx(descent_direction)), and can you describe more about your setup? This could happen if the code has a bug so that if you compute the mean KL is nonzero (or not sufficiently close to zero) before taking the step. We've also observed it sometimes happen with recurrent networks, although adjusting the nonlinearity seems to have solved it.

alexbeloi · 2016-07-01T19:12:26Z

Hi @dementrock, it appears that mean KL is nonzero before taking the step because of something I'm doing. This issue came up when debugging the ISSampler with TRPO.

What I'm doing is taking (off-policy) stored paths, computing the agent_infos for those paths with respect to the current policy using _, agent_infos = policy.get_action(observations), and then those agent_infos get passed to old_dist_info_vars_list in the optimizer.

What I expected was that the on-policy agent_infos that I computed would be identical to the dist_info_vars = policy.dist_info_sym(obs_var, state_info_vars) evaluated by the optimizer before taking the step, so kl = dist.kl_sym(old_dist_info_vars, dist_info_vars) would be zero before the step, but this isn't the case.

Is there a difference between agent_info computed from _, agent_infos = policy.get_action(observations) and the evaluation of dist_info_vars = policy.dist_info_sym(obs_var, state_info_vars) for obs_var evaluated at observations?

alexbeloi · 2016-07-01T21:49:47Z

I feel there is some confusion on my part. Where does the NPO algorithm get values for old_dist_info_vars and dist_info_vars from?

alexbeloi · 2016-07-01T22:43:31Z

Oh wow, super silly bug on my part. The last line of is_sampler.py should return samples not return paths. This was the root of the issue.

dementrock · 2016-07-02T17:29:58Z

@alexbeloi Re difference between agent_infos and evaluating dist_info_vars: agent_infos may contain more entries than dist_info_vars, but for the common keys their values should be the same. Otherwise there is a bug somewhere.

Does replacing return paths with return samples solve the NaN issue?

alexbeloi · 2016-07-07T18:01:55Z

@dementrock yes, that one line fix solves the NaN issue. I made a pull request with the patch and a (now working) example of TRPO with ISSampler.

dementrock · 2016-07-07T18:07:46Z

Awesome, thanks!

alexbeloi mentioned this issue Jul 7, 2016

bugfix: sample_isweighted_paths returns correct paths now #28

Merged

dementrock closed this as completed Jul 7, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Conjugate Gradient Optimization sometimes fails (with NaN parameters) #24

Conjugate Gradient Optimization sometimes fails (with NaN parameters) #24

alexbeloi commented Jun 30, 2016

dementrock commented Jul 1, 2016

alexbeloi commented Jul 1, 2016

alexbeloi commented Jul 1, 2016

alexbeloi commented Jul 1, 2016

dementrock commented Jul 2, 2016

alexbeloi commented Jul 7, 2016

dementrock commented Jul 7, 2016

Conjugate Gradient Optimization sometimes fails (with NaN parameters) #24

Conjugate Gradient Optimization sometimes fails (with NaN parameters) #24

Comments

alexbeloi commented Jun 30, 2016

dementrock commented Jul 1, 2016

alexbeloi commented Jul 1, 2016

alexbeloi commented Jul 1, 2016

alexbeloi commented Jul 1, 2016

dementrock commented Jul 2, 2016

alexbeloi commented Jul 7, 2016

dementrock commented Jul 7, 2016