Gradient based policy optimisation. #41

patxikuku · 2020-04-02T10:55:54Z

Hello,

if I understood correctly, the authors of PILCO uses a gradient based method
for optimising the policy. In the current implementation it doesn't seem to the
case, you use L-BFGS-B without giving the computation of the jacobian.

Did you make any experiments using a gradient based method ?

nrontsis · 2020-04-02T11:01:53Z

Gradients are computed and used in L-BFGS-B. This is the whole point of using TensorFlow. Perhaps this is not immediately obvious when examining the code, because the gradient computation is handled via GPflow.

fuku10 · 2022-05-18T09:33:46Z

Hello,
Does it mean using numerical calculated gradient, not analytically calculated gradient?

nrontsis · 2022-05-18T18:11:56Z

It’s neither, it’s via automatic differentiation.

fuku10 · 2022-05-19T06:25:26Z

I thought the minimize() automatically calculate the gradient using the finite-difference method.
(In case of scipy.optimize.minimize;
"If None or False, the gradient will be estimated using 2-point finite difference estimation with an absolute step size.")
https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.minimize.html

Anyway, I'll study TensorFlow and GPflow.
Thanks!

nrontsis closed this as completed Apr 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gradient based policy optimisation. #41

Gradient based policy optimisation. #41

patxikuku commented Apr 2, 2020

nrontsis commented Apr 2, 2020

fuku10 commented May 18, 2022

nrontsis commented May 18, 2022

fuku10 commented May 19, 2022

Gradient based policy optimisation. #41

Gradient based policy optimisation. #41

Comments

patxikuku commented Apr 2, 2020

nrontsis commented Apr 2, 2020

fuku10 commented May 18, 2022

nrontsis commented May 18, 2022

fuku10 commented May 19, 2022