Conjugate Gradient Method in TRPO #79

JannisHal · 2021-12-06T22:22:52Z

In my experiments the conjugate gradient method in TRPO does not seem work quite right.
The residual is getting smaller if any very slow and is not even close to the tolerance (1e-2 instead of 1e-10) after the default amount iterations (10). Thus, the cg method never stops early and neither returns exact solution.

Furthermore, I checked the progress on the objective function of the related minimization problem f(x) = 1/2 * x.T * A * x - b*x
Interestingly, the values seem to increase instead of decrease monotonically.

I added the following code in line 158 in trpo.py to monitore the progress.
Fx = self._fisher_vector_product(x, obs, old_pol_dist).detach().cpu().numpy()
print(0.5*x.dot(Fx) - b.detach().cpu().numpy().dot(x))

After some debugging, I think the problem lies in the fisher vector product. The general CG-method worked in examples where I replaced the fvp with some normal matrix vector mulitplication.

I am not sure if this problem has any effect on the overall performance of TRPO. Nevetheless, the result of the conjugate gradient defines the search direction of the line search and is therefore a main part of the optimization.

The text was updated successfully, but these errors were encountered:

boris-il-forte · 2021-12-07T14:51:39Z

The implementation of the fisher dot product is done following this source:
https://www.telesens.co/2018/06/09/efficiently-computing-the-fisher-vector-product-in-trpo/

Notice that the fisher dot product is also adding damping, for this reason, you will not get the same results with normal matrix-vector multiplications.

However, we did test the behavior of the algorithm in deep neural networks, comparing it with the results of OpenAI baselines.
Both methods compute a very similar value for the fisher dot product and the damping parameter. However, probably due to some numerical issues, we notice that the first updates in the torch implementation tend to drive the solution away from the solution (as you might have noticed). However, by increasing the iterations of the conjugate gradient, we get a direction that is extremely close to the one provided by the OpenAI baselines implementation.

With this increased number of iterations, we get performances that are in line with the baselines ones, as you can see from the benchmark:
https://mushroom-rl-benchmark.readthedocs.io/en/latest/source/benchmarks/actor_critic/mujoco.html
https://mushroom-rl-benchmark.readthedocs.io/en/latest/source/benchmarks/actor_critic/bullet.html#

However, this detailed test was performed a long time ago, and it might be that something has changed in the torch implementation. If you find some discrepancy, let us know

boris-il-forte · 2021-12-07T15:52:33Z

nb: the damping method is really important in the neural network scenario due to the bad conditioning of the fisher matrix in this scenario.

boris-il-forte closed this as completed May 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Conjugate Gradient Method in TRPO #79

Conjugate Gradient Method in TRPO #79

JannisHal commented Dec 6, 2021

boris-il-forte commented Dec 7, 2021

boris-il-forte commented Dec 7, 2021

Conjugate Gradient Method in TRPO #79

Conjugate Gradient Method in TRPO #79

Comments

JannisHal commented Dec 6, 2021

boris-il-forte commented Dec 7, 2021

boris-il-forte commented Dec 7, 2021