# Reference tracking LQR


This live script discusses how to solve a reference tracking Linear Quadratic 
Regulator (LQR) problem. The cost function is the following

$$ (\sum_{k=0}^{h-1}\|Mx_k-r_k\|^2 + u_k^T Ru_k +\beta_h\|Mx_h-r_h\|^2)$$

where the state is governed by a linear system

$$ x_{k+1} = Ax_k+Bu_k, \ \ k \in \mathbb{N}_0$$

for a given initial condition, and $r_k$ is the reference at time $k$ for 
the linear output $Mx_k$. We use dynamic programming to show that:

(i) the optimal control input policy is

$$ u_k = K_kx_k + L_k$$

where the gains $K_k$ coincide with the ones for the stabilization problem 
(when $r_k=0$);

(ii) the costs-to-go can be written as 

$$ J_k(x_k) = x_k^T P_k x_k + N_k x_k+\alpha_k$$

where the $P_k$ matrices will be shown to coincide with the ones for the 
stabilization problem (when $r_k=0$) and the $N_k$ matrices are a linear 
function of the reference signal.

Let us start with (ii) from which we will indirectly obtain (i). This statement 
is clear for $k=0$ since

$$\beta_h\|Mx_h-r_h\|^2 = x_h^T P_h x_h + N_h x_h+\alpha_h$$

for

$$ P_h = \beta_hM^T M, \ \ \ N_h = -2\beta_h r_h^T M, \ \ \ \ \alpha_h=\|r_h\|^2,$$

and assuming it is valid for a given $k$, we have

$J_k(x_k) = \min_{u_k}  \|Mx_k-r_k\|^2 + u_k^T Ru_k  + J_{k+1}(\underbrace{x_{k+1}}_{Ax_k+Bu_k})$

$= \min_{u_k}  x_k^T M^T Mx_k-2r_k^T M x_k+ \|r_k\|^2 + u_k^T Ru_k  +  (Ax_k+Bu_k)^T 
P_{k+1} (Ax_k+Bu_k) + N_{k+1} (Ax_k+Bu_k)+\alpha_{k+1}$

$= x_k^T (M^TM+A^T P_{k+1}A)x_k-2r_k^T Mx_k + \|r_k\|^2+N_{k+1}Ax_k$

$+\min_{u_k} u_k^T (R+B^T P_{k+1}B)u_k + 2u_k^T (B^T (P_{k+1}Ax_k+\frac{1}{2}N_{k+1}^T)) 
+\alpha_{k+1}$

$= x_k^T (M^T M+A^T P_{k+1}A-A^T P_{k+1}B(R+B^T P_{k+1}B)^{-1}B^T P_{k+1}A)x_k 	 
+ \|r_k\|^2+\alpha_{k+1}$

$-\frac{1}{4}N_{k+1}B(R+B^T P_{k+1}B)^{-1}B^T N_{k+1}^T$

$+(-2r_k^T M+N_{k+1}A-N_{k+1}B(R+B^T P_{k+1}B)^{-1}B^T  P_{k+1}A)x_k $

from which we conclude that

$$P_k  = (M^T M+A^T P_{k+1}A-A^T P_{k+1}B(R+B^T P_{k+1}B)^{-1}B^T P_{k+1}A)$$

$$N_k = -2r_k^T M+N_{k+1}(A+BK_k), \ \ K_k =-B(R+B^T P_{k+1}B)^{-1}B^T  P_{k+1}A$$

$$\alpha_k =\|r_k\|^2 -\frac{1}{4}N_{k+1}B(R+B^T P_{k+1}B)^{-1}B^T N_{k+1}^T + \alpha_{k+1}$$

and 

$$ u_k = K_kx_k + L_k$$

where 

$$ L_k = -\frac{1}{2}(R+B^T P_{k+1}B)^{-1}B^T N_{k+1}^T.$$

Moreover, from the recursion for $N_k$ we can conclude that

$$ N_k = -2(  \sum_{j=k+1}^h r_j^T M\Phi(j,k+1)), $$

where

$\Phi(j,k+1) = I, \text{ if }j=k+1, \Phi(j,k+1)  = \Pi_{\ell=k+1}^j(A+BK_\ell) 
\text{ if }j>k+1$.

The function `lqrreftrack` provides the gains matrices $K_k$ and $L_k$, in the matrix format `K(:,:,k)` and `L(:,:,k)` as well as the optimal control input $u_k$ in the format `u(:,k)` and corresponding state $x_k$ in the format `x(:,k),` given the input parameters $A$, $B$, $M$, $R$, $x_0$ and $r$ (which provides the reference in the format `r(:,k)`= $r_k$). An example is provided below for a double integrator (this is similar to the live script _Finite-horizon Linear Quadratic Control of a Double Integrator_ except that now a reference is introduced).

In [None]:
import numpy as np
from scipy import signal
import matplotlib.pyplot as plt

In [None]:
def lqrreftrack(A,B,M,R,r,x0,betah):
    
    # initializations
    h = max(r.shape)-1
    n = A.shape[0]
    m = B.shape[1]
    p = M.shape[1]
    P = np.zeros((n,n,h+1))
    N = np.zeros((1,n,h+1))
    K = np.zeros((m,n,h))
    L = np.zeros((m,1,h))
    u = np.zeros((m,h))
    x = np.zeros((n,h+1))

    # compute the gains, see above 
    P[:,:,h] = betah*(M.T*M)
    N[:,:,h] = -betah*2*r[h].T*M
    for k in range(h-1,-1,-1):
        P[:,:,k] = M.T@M + A.T@P[:,:,k+1]@A-A.T@P[:,:,k+1]@B@np.linalg.inv(R+B.T@P[:,:,k+1]@B)@B.T@P[:,:,k+1]@A
        K[:,:,k] = -np.linalg.inv(R + B.T @ P[:,:,k+1]@B)@B.T@P[:,:,k+1]@A
        N[:,:,k] = -2*r[k].T*M + N[:,:,k+1]@(A+B@K[:,:,k])
        L[:,:,k] = -np.linalg.inv(R+B.T@P[:,:,k+1]@B)@B.T*1/2@N[:,:,k+1].T


    # compute optimal trajectory (uk and xk)
    x[:,[0]]   = x0
    for k in range(h):
        u[:,k] = K[:,:,k]@x[:,k] + L[:,:,k][0]
        x[:,k+1] = A@x[:,k] + B@u[:,k]

    return u, x, K, L

In [None]:
# define parameters of the problem
Ac = np.array([[0, 1],[0, 0]])
Bc = np.array([[0],[1]])
tau = 0.2
A, B, C = signal.cont2discrete((Ac, Bc,np.zeros((1,2)), np.array([0.2])), tau)[:3]
M = np.array([[1, 0]])
betah = 10
R = 1
h = 100
x0 = np.array([[0],[0]])

# define reference
t  = np.arange(0,h)/h 
r  = t # ramp

# call the function which will provide the reference tracking input u and
# state x as well as the gains K and L
u, x, K, L = lqrreftrack(A, B, M, R, r, x0, betah=1)
f = plt.figure()
ax = plt.gca()
ax.plot(t,x[0,:], label="output")
ax.plot(t,r, label="reference")
ax.legend();