Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conjugate Gradient Method in TRPO #79

Closed
JannisHal opened this issue Dec 6, 2021 · 2 comments
Closed

Conjugate Gradient Method in TRPO #79

JannisHal opened this issue Dec 6, 2021 · 2 comments

Comments

@JannisHal
Copy link

In my experiments the conjugate gradient method in TRPO does not seem work quite right.
The residual is getting smaller if any very slow and is not even close to the tolerance (1e-2 instead of 1e-10) after the default amount iterations (10). Thus, the cg method never stops early and neither returns exact solution.

Furthermore, I checked the progress on the objective function of the related minimization problem f(x) = 1/2 * x.T * A * x - b*x
Interestingly, the values seem to increase instead of decrease monotonically.

I added the following code in line 158 in trpo.py to monitore the progress.
Fx = self._fisher_vector_product(x, obs, old_pol_dist).detach().cpu().numpy()
print(0.5*x.dot(Fx) - b.detach().cpu().numpy().dot(x))

After some debugging, I think the problem lies in the fisher vector product. The general CG-method worked in examples where I replaced the fvp with some normal matrix vector mulitplication.

I am not sure if this problem has any effect on the overall performance of TRPO. Nevetheless, the result of the conjugate gradient defines the search direction of the line search and is therefore a main part of the optimization.

@boris-il-forte
Copy link
Collaborator

The implementation of the fisher dot product is done following this source:
https://www.telesens.co/2018/06/09/efficiently-computing-the-fisher-vector-product-in-trpo/

Notice that the fisher dot product is also adding damping, for this reason, you will not get the same results with normal matrix-vector multiplications.

However, we did test the behavior of the algorithm in deep neural networks, comparing it with the results of OpenAI baselines.
Both methods compute a very similar value for the fisher dot product and the damping parameter. However, probably due to some numerical issues, we notice that the first updates in the torch implementation tend to drive the solution away from the solution (as you might have noticed). However, by increasing the iterations of the conjugate gradient, we get a direction that is extremely close to the one provided by the OpenAI baselines implementation.

With this increased number of iterations, we get performances that are in line with the baselines ones, as you can see from the benchmark:
https://mushroom-rl-benchmark.readthedocs.io/en/latest/source/benchmarks/actor_critic/mujoco.html
https://mushroom-rl-benchmark.readthedocs.io/en/latest/source/benchmarks/actor_critic/bullet.html#

However, this detailed test was performed a long time ago, and it might be that something has changed in the torch implementation. If you find some discrepancy, let us know

@boris-il-forte
Copy link
Collaborator

nb: the damping method is really important in the neural network scenario due to the bad conditioning of the fisher matrix in this scenario.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants