Infinite horizon time LQR problem solved with Deep Learning and Dynamic Programming

See lqr_infinite_horizon.ipnyb

TODO: add entropy in the value function to add exploration. (We need to know the whole distribution!)

Finite Horizon time Stochastic Control solved using Deep Learning

See lqr.py.

We solve the control problem, by minimising J where g is convex. The policy alpha is parametrised with a neural network, and we use Method of successive approximations on Pontryagin Maximum principle. Algorithm:

Start with initial policy
Solve BSDE using Deep Learning for processes (Y_t, Z_t).
Update policy by maximising Hamiltonian (analog to Q-learning on model-free RL)
Go back to 2.

Example

Drunk agents trying to reach the origin (aka LQR: dX_t = a_t dt + dW_t, with running cost f(x,a) = a^2, and final cost g(x) = x^2)

TODO

Code is loopy. The bsde solver and the Hamiltonian should be vectorized across time.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Infinite horizon time LQR problem solved with Deep Learning and Dynamic Programming

Finite Horizon time Stochastic Control solved using Deep Learning

Example

TODO

Files

README.md

Latest commit

History

README.md

File metadata and controls

Infinite horizon time LQR problem solved with Deep Learning and Dynamic Programming

Finite Horizon time Stochastic Control solved using Deep Learning

Example

TODO