# Linear quadratic discounted cost problem
In this live script we consider the following system

$$\dot{x}(t)=A x(t)+B u(t)+w(t), \quad t \in \mathbb{R}\geq 0,$$

where $w$ is a zero-mean Gaussian white noise process with $\mathbb{E}[w(t) 
w(t+\tau)]=W \delta(\tau)$. We assume that the initial state is unknown and 
follows a Gaussian distribution with mean $\overline{x}_0$ and variance $\mathbb{E}\left[\left(x_{0}-\bar{x}_{0}\right)\left(x_{0}-\bar{x}_{0}\right)^{\top}\right]=\Theta_{0}$.  
Moreover, the following discounted cost is considered

$$\int_{0}^{\infty} e^{-\alpha t}\left(x(t)^{\top} Q x(t)+u(t)^{\top} R u(t)\right) 
d t$$

with a discount factor $\alpha \ \ge \ 0$. Note that when $\alpha\ = \ 0$ 
and $W\ =\ 0$,  this boils down to a standard LQR problem. As we shall see shortly, 
it can be argued that the optimal policy for the discounted cost problem is

$$u_k\ =\ K x_k$$

where $K$ can also be obtained by solving an algebraic Ricatti equation (ARE). 
However, when $\alpha \ > \ 0$, such an ARE, is different from the one for the 
case $\alpha\ = \ 0$. The goal of this live script, is to compute $K$, given 
$A$, $B$, $C$, $D$, and $\alpha$.

We now justify that the optimal control policy takes a simple static linear 
state feedback form and explain how to compute the optimal gains $K$. We start 
with a finite horizon discounted cost problem with no disturbances $W =\ 0$, 
described by

$$\min \int_{0}^{T} e^{-\alpha t}\left(x(t)^{\top} Q x(t)+u(t)^{\top} R u(t)\right) 
d t$$

for a given horizon $T$, for the system

$$\dot{x}(t)=A x(t)+B u(t), \quad t \in \mathbb{R}_{\geq 0}$$

Therefore, the HJB equation is given by

$$\min _{u} \frac{\partial}{\partial t} V(t, x)+\frac{\partial}{\partial x} 
V(t, x)(A x+B u)+e^{-\alpha t} x(t)^{\top} Q x(t)+e^{-\alpha t} u^{\top} R u=0$$

where $V(T, x)=0$, for every $x$, is the terminal constraint. 

By differentiating the expression above with respect to the control input 
$u$ and setting the derivative to zero, we obtain the optimal control input 
$u$, which minimizes the HJB, equal to 

$$u=-e^{\alpha t} R^{-1} B^{\top}\left[\frac{\partial}{\partial x} V(t, x)\right]^{\top}.$$

We will not show $V(t, x)=e^{-\alpha t} x^{\top} P(t) x$ satisfies the HJB 
equation. By replacing the optimal $u$ and the previous $V(x,t)$, we obtain

$$-\alpha e^{-\alpha t} x^{\top} P(t) x+e^{-\alpha t} x^{\top} \dot{P}(t) 
x+e^{-\alpha t} x^{\top} P(t) A x+e^{-\alpha t} x^{\top} A^{\top} P(t) x$$

$$+e^{-\alpha t} x^{\top} Q x-e^{-\alpha t} x^{\top}\left(P(t) B R^{-1} B^{\top} 
P(t)\right) x=0$$

which holds, together with the terminal condition, if $P(t)$ satisfies

$$\dot{P}(t)=-\left(-\alpha P(t)+A^{\top} P(t)+P(t) A+Q-P(t) B R^{-1} B^{\top} 
P(t)\right), \quad P(T)=0.$$

Then the optimal policy is $u=K(t)x$, where $K(t)=-R^{-1} B^{\top} P(t)$. 
As $T$ converves to infinity, as in the usual case when $\alpha=0$ , the solution 
$P(t)$ will be such that $P(0)$ converges to $P$, where $P$ is the solution 
of the following ARE: $-\alpha P+A^{\top} P+P A+Q-P B R^{-1} B^{\top} P=0$. 
Then the control law converges to 

$$u=Kx,$$

where $K=-R^{-1}B^TP$.

Note that if we consider disturbances for the case of finite horizon, a similar 
reasoning to the one discussed in the lectures for the case $a=0$ allows us 
to conclude that certainty equivalence holds and the policy is the same. Since 
the exponentially decaying weighting factor makes the cost bounded even in the 
presence of disturbances we do not need to consider the average cost, simply 
taking the expect cost is enough. The policy for the infinite horizon case with 
disturbances, as in the case $a=0$, is a simple state feedback policy as described 
above.

For convenience, the ARE is rearranged as $\left(A-\frac{\alpha}{2} I\right)^{\top} 
P+P\left(A-\frac{\alpha}{2} I\right)+Q-P B R^{-1} B^{\top} P=0$. So the discounted 
cost problem is equivalent to an undiscounted one and it can be solved by the 
`lqr` function, by noticing that matrix $A$ should be replaced by $A\ -\ \frac{\alpha}{2}I$. 

The problem described above, is implemented in the function `lqdiscounted`, 
which takes  $A$, $B$, $C$, $D$, and $\alpha$ as inputs and provides the optimal 
gains $K$.

In [None]:
import numpy as np
import control
control.use_numpy_matrix(False)

In [None]:
def lqdiscounted(A,B,Q,R,alpha):
    
    return control.lqr(A-alpha/2,B,Q,R)[0]

In [None]:
# Define inputs and run the function
A = np.array([[0, 1],[ -3, 0]])
B = np.array([[0],[1]])
Q = np.array([[1, 0],[0, 0]])
R = 1
alpha = 0
K = lqdiscounted(A,B,Q,R,alpha)

In [None]:
K