Cost for trajectory following #44

Pengxiao-Gao · 2020-04-27T14:09:15Z

Hi, I'm trying to use PILCO on Path tracking for my graduation thesis, but for now the control results are not ideal.
I think it could be improved with a reword for trajectory following.
Do you know an easy way to do this ?
Thanks a lot for the help

Stefan

maxvanmeer · 2020-06-04T11:50:23Z

I'm wondering the same thing, did you ever figure this out?

nrontsis · 2020-06-04T13:04:47Z

The reward is calculated here, so perhaps you could modify it to add what you need?

maxvanmeer · 2020-06-05T11:57:38Z

I think I managed to implement it in the original Matlab version. What you can do is:

Change the linear policy from M = Wm + b to M = Wm + b * r(t) for the current timestep t (make sure this t is passed to the function). Change the policy gradient dMdp as well - its gradient w.r.t. b used to be 1, but is r(t) now. I do not believe the gradient w.r.t the variance changes.
Alternatively, use another parametrization, as long as it uses r(t).
Pass the current time t to the cost function as well, use this r(t) for the immediate reward instead of a fixed x_target

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cost for trajectory following #44

Cost for trajectory following #44

Pengxiao-Gao commented Apr 27, 2020

maxvanmeer commented Jun 4, 2020

nrontsis commented Jun 4, 2020

maxvanmeer commented Jun 5, 2020

Cost for trajectory following #44

Cost for trajectory following #44

Comments

Pengxiao-Gao commented Apr 27, 2020

maxvanmeer commented Jun 4, 2020

nrontsis commented Jun 4, 2020

maxvanmeer commented Jun 5, 2020