-
Notifications
You must be signed in to change notification settings - Fork 144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conjugate Gradient Method in TRPO #79
Comments
The implementation of the fisher dot product is done following this source: Notice that the fisher dot product is also adding damping, for this reason, you will not get the same results with normal matrix-vector multiplications. However, we did test the behavior of the algorithm in deep neural networks, comparing it with the results of OpenAI baselines. With this increased number of iterations, we get performances that are in line with the baselines ones, as you can see from the benchmark: However, this detailed test was performed a long time ago, and it might be that something has changed in the torch implementation. If you find some discrepancy, let us know |
nb: the damping method is really important in the neural network scenario due to the bad conditioning of the fisher matrix in this scenario. |
In my experiments the conjugate gradient method in TRPO does not seem work quite right.
The residual is getting smaller if any very slow and is not even close to the tolerance (1e-2 instead of 1e-10) after the default amount iterations (10). Thus, the cg method never stops early and neither returns exact solution.
Furthermore, I checked the progress on the objective function of the related minimization problem f(x) = 1/2 * x.T * A * x - b*x
Interestingly, the values seem to increase instead of decrease monotonically.
I added the following code in line 158 in trpo.py to monitore the progress.
Fx = self._fisher_vector_product(x, obs, old_pol_dist).detach().cpu().numpy()
print(0.5*x.dot(Fx) - b.detach().cpu().numpy().dot(x))
After some debugging, I think the problem lies in the fisher vector product. The general CG-method worked in examples where I replaced the fvp with some normal matrix vector mulitplication.
I am not sure if this problem has any effect on the overall performance of TRPO. Nevetheless, the result of the conjugate gradient defines the search direction of the line search and is therefore a main part of the optimization.
The text was updated successfully, but these errors were encountered: