<a href="https://colab.research.google.com/github/SherbyRobotics/pyro/blob/colab/examples/notebooks/lqr_vs_dp_demo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### DP vs LQR for a pendulum swing-up

This page shows a quick demo of how DP (dynamic programming) can be used for finding global optimal control policy of non-linear systems.

<img src="https://alexandregirardca.files.wordpress.com/2021/12/lqr_vs_dp.jpg" alt="DP vs LQR" width="600" height="400">

**Importing Librairies**

This page uses the toolbox *pyro*.

In [None]:
!git clone -b dev-alex https://github.com/SherbyRobotics/pyro
import sys
sys.path.append('/content/pyro')

In [None]:
import pyro
import numpy as np
import matplotlib.pyplot as plt

from IPython import display
!apt install ffmpeg

from pyro.dynamic  import pendulum
from pyro.planning import discretizer
from pyro.analysis import costfunction
from pyro.planning import valueiteration
from pyro.control  import lqr

**Defining a dynamic system model**

Here we load a already defined class including all the dynamic equations

In [None]:
sys  = pendulum.SinglePendulum()

sys.xbar  = np.array([ -3.14 , 0 ]) # target and linearization point (upright position)

**Defining the cost function**

Here both controller are synthetized using a standard quadratic cost function or the type:

$J = \int  ( x' Q x + u' R u ) dt$

In [None]:
# Cost Function
qcf = costfunction.QuadraticCostFunction.from_sys( sys ) 

qcf.INF  = 10000     # The value iteration algo needs this parameter

qcf.Q[0,0] = 1
qcf.Q[1,1] = 1
qcf.R[0,0] = 1

print('Q=\n',qcf.Q)
print('R=\n',qcf.R)

**Synthetizing the "optimal" controllers**

*LQR controller*

Here we use a library function that: \\
1) linearize the pendulum equations at the nominal state $\bar x$ \\
2) use obtained linearized equations and the defined cost function to compute the LQR controller solution

In [None]:
lqr_ctl = lqr.linearize_and_synthesize_lqr_controller(sys, qcf)

*VI controller*

Here we use a library function that: \\
1) discretize the domain of the state and control inputs of the system \\
2a) *Commented line:* Use the value-iteraton to compute optimal cost to go and control actions \\
2b) Alternatively loads the results of a previous computation \\
3) Generate a continuous control law by interpolating in the computed discrete solution

In [None]:

vi = valueiteration.ValueIteration_2D( discretizer.GridDynamicSystem( sys ) , qcf )

vi.initialize()
#vi.compute_steps(200,True). # To compute from sratch instead of loading the solution
vi.load_data('/content/pyro/examples/demo/simple_pendulum_vi') # Loading a pre-computed solution
vi.assign_interpol_controller()

vi_ctl = vi.ctl

**Showing the computed control laws**

The next lines generate two figures showing a map illustrating the computed optimal torque to use as a function of the two system states: \\
$τ=f(θ,\dotθ)$

LQR

In [None]:
lqr_ctl.plot_control_law(sys=sys, n=100)

VI

In [None]:
vi_ctl.plot_control_law(sys=sys, n=100)

We can see that the lqr solution (first figure) is a linear map, while the VI solution (second figure) is a non-linear map that follow the natural dynamics. Also note that the range of required torque are much larger with the lqr solution. The LQR solution is the optimal solution locally in the linear range of the target state while the VI solution is the global optimal solution. If we zoom arround the target state $[\theta=-\pi,\dot\theta=0]$, locally both solutions will tends to the same linear solution.

**Simulations**

Here we show both control law in action, with a trajectory starting at the state $[\theta=-\pi,\dot\theta=0]$.

In [None]:
x0 = np.array([ 1 ,0])

LQR

In [None]:
cl_sys_lqr      =   lqr_ctl + sys 
cl_sys_lqr.x0   = x0
cl_sys_lqr.plot_trajectory('xu')


VI

In [None]:
cl_sys_vi      =   vi_ctl + sys 
cl_sys_vi.x0   = x0
cl_sys_vi.plot_trajectory('xu')

We can see that both solutions converge converge to the target. The simulation with the LQR shows that the pendulum goes directly toward the goal while the VI solution do a "pumping action" before swinging up toward the goal, in order to minimize the required torques. The VI solution achieve the same goal here with 12x less maximum torque than the LQR solution. Note that the torque in the VI simulation is "noisy" because the VI algorithm output a is discrete look-up table which lead to this type of impecfection when converting back into a continuous domain. 

**Animation of the simulations**

Here the following function generates and show animations of the same trajectories.

LQR

In [None]:
video_vi = cl_sys_lqr.generate_simulation_html_video()
html_vi  = display.HTML(video_vi)
display.display(html_vi)

VI

In [None]:
video_vi = cl_sys_vi.generate_simulation_html_video()
html_vi  = display.HTML(video_vi)
display.display(html_vi)

We see here the pumping action of the VI solution. This is one advantage of the VI algorithm: finding globally optimal solution for non-linear systems.

**Phase-plane trajectory**

Here the same trajectory are shown on the phase-plane of the pendulum. Here the vector field illustrate the natural dynamics allong which the pendulum would evolve naturally if no torque are applied on the system.

LQR

In [None]:
cl_sys_lqr.plot_phase_plane_trajectory_closed_loop()

VI

In [None]:
cl_sys_vi.plot_phase_plane_trajectory_closed_loop()

We can see here why the VI solution require less torque for the swing-up, the VI solution leverage the natural dynamics instead of trying to fight it with large torques.

**Performance**

Here the performance, in terms of the defined cost-function $J = \int  ( x' Q x + u' R u ) dt$, is compared. Note, $dJ =  x' Q x + u' R u $ is the increment of cost at each instant and $J$ is the cummulative cost.

LQR

In [None]:
cl_sys_lqr.plot_trajectory('j')

VI

In [None]:
cl_sys_vi.plot_trajectory('j')

We see that the LQR solution (first figure) is about 4x worst then the VI solution (second figure), based on the cost function.

**Take home message**

Globally optimal solutions are often orders of magnitudes better than local solution when the system to control are highly non linear, like the pendulum here (12x on the maximum required torque and 4x on the performance). Dynamic programming is one technique to find globally optimal solutions.