# Min-max quadratic game


In this live script, we wish to control a linear plant

$$x_{k+1}=A x_{k}+B u_{k}+\omega _{k}$$

for $k \in \mathcal{K}:=\{0,1, \ldots, h-1\}$, where $x_k$, $u_k$, $\omega_k$ are the state, the control input, and the disturbance at time $k \in \mathcal{K}$. To this effect, we can consider the following cost

$$J:=\sum_{k=0}^{h-1} x_{k}^{\top} Q x_{k}+u_{k}^{\top} R u_{k}$$

However, as opposed to the standard linear quadratic optimal control framework, 
where $\{\omega _k|k\in\ \mathcal{K}\}$ is a sequence of independent and identically 
distributed Gaussian random variables, assume here that $\{\omega _k|k\in\ \mathcal{K}\}$is 
a sequence of worst-case disturbances (in the sense that it maximizes the cost) 
with bounded energy

$$c:=\sum_{k=0}^{h-1} w_{k}^{\top} w_{k}$$

Since the largest the energy of the disturbance signal is, the largest the 
cost $J$ will potentially be, we can formulate the problem as wishing to find 
a control policy which will assure us that for every disturbance input,

\begin{equation*}
\frac{J}{c} \leq \gamma^{2}
\label{eq:control_pol} \tag{1}
\end{equation*}

(or equivalently $J-\gamma ^2 c \leq 0$ ) for the smallest value of $\gamma$. 
It can be shown that if, for a given positive $\gamma$, there exists a control 
policy for $u_k$ such that the solution to the following problem (in general 
for a non-zero initial condition) is bounded

$$\min _{\left\{u_{k}=\mu_{k}\left(x_{k}\right) | k \in \mathcal{K}\right\}\left\{w_{k}=\nu_{k}\left(x_{k}\right) 
| k \in \mathcal{K}\right\}} \sum_{k=0}^{h-1} x_{k}^{\top} Q x_{k}+u_{k}^{\top} 
R u_{k}-\gamma^{2} w_{k}^{\top} w_{k}$$

then $\eqref{eq:control_pol}$ holds for every $\left\{w_{k} | k \in\{0,1, \ldots, h-1\}\right\}$ 
when $x_{0}=0$ and when this control policy is used. The notation $u_{k}=\mu_{k}\left(x_{k}\right), 
w_{k}=\nu_{k}\left(x_{k}\right)$ highlights that it is assumed that the values 
of the control input and disturbances at time $k$ are allowed to be functions 
of the state $x_{k}$ at time $k$. A similar procedure to dynamic programming, 
called the min max algorithm provides the optimal policies for both the control 
input and the disturbances (seen here as a decision making agent who wants to 
maximize the cost):

$$J_{k}\left(x_{k}\right)=\min _{u_{k}=\mu_{k}\left(x_{k}\right)} \max _{w_{k}=\nu_{k}\left(x_{k}\right)} 
x_{k}^{\top} Q x_{k}+u_{k}^{\top} R u_{k}-\gamma^{2} w_{k}^{\top} w_{k}+J_{k+1}\left(A 
x_{k}+B u_{k}+w_{k}\right)$$

for $k \in\{h-1, h-2, \ldots, 0\}$ and $J_{h}=x_{h}^{\top} P_{h} x_{h}$, with 
$P_{h}=0$. The minimizer and maximizer for each state $x_{k}$ boil down to the 
optimal policies and are given by

$$u_{k}=K_{k} x_{k}$$

$$w_{k}=L_{k} x_{k}$$

for some gains $K_k$ and $L_k$. Note that the gains $K_k$ differ in general 
from the optimal LQR gains. Their derivation can be found in the appendix below.



The matlab function `lqminmax` presented below, provides the optimal gains 
$L_k$ and $K_k$ of this problem. The user should input the appropriate $A, B, 
Q, R$ matrices, as well as the horizon $h$ and the value for the positive $\gamma$. 
An example is given next.



In [None]:
import numpy as np

In [None]:
def lqminmax(A, B, Q, R, h, gamma):
    n = A.shape[0]
    P = np.zeros((n, n, h + 1))
    Phi = np.zeros((n, n, h + 1))
    P_ = np.zeros((n, n, h + 1))
    Ph = np.zeros(n)
    P[:, :, -1] = Ph

    K = np.zeros((1,2,10))
    L = np.zeros((2,2,10))

    for k in range(h, 0, -1):
        Phi[:,:,k] = P[:,:,k] + P[:,:,k] @ np.linalg.inv(gamma**2 * np.identity(n) - P[:,:,k]) @ P[:,:,k]
        P[:,:,k-1] = Q + A.T @ Phi[:,:,k] @ A - A.T @ Phi[:,:,k] @ B @ np.linalg.inv(R + B.T @ Phi[:,:,k] @ B) @ B.T @ Phi[:,:,k] @ A
        K[:,:,k-1] = - np.linalg.inv(R + B.T @ Phi[:,:,k] @ B) @ B.T @ Phi[:,:,k] @ A
        L[:,:,k-1] = np.linalg.inv(gamma**2 * np.identity(n) - P[:,:,k]) @ P[:,:,k] @ (A + B @ K[:,:,k-1])

    return K, L

In [None]:
# input your parameters for the minmax problem
A = np.array([[1, 1], [0, 1]])
B = np.array([[0], [1]])
Q = np.eye(2)
R = 1
h = 10
gamma = 10
# call the minmax solver function
[K,L] = lqminmax(A, B, Q, R, h, gamma)

In [None]:
K

In [None]:
L

### Appendix

We can show by induction that the iteration

$$J_{k}\left(x_{k}\right)=\min _{u_{k}=\mu_{k}\left(x_{k}\right)} \max _{w_{k}=\nu_{k}\left(x_{k}\right)} 
x_{k}^{\top} Q x_{k}+u_{k}^{\top} R u_{k}-\gamma^{2} w_{k}^{\top} w_{k}+J_{k+1}\left(A 
x_{k}+B u_{k}+w_{k}\right)$$

results in a quadratic function $J_k(x_k)=x_k^TP_kx_k$. Let us first maximize 
with respect to $w_k$ and then minimize with respect to $u_k$. 

Differentiating $J_k(x_k)$ with respect to $w_k$ gives the following maximizer

$$w_{k}=\left(\gamma^{2} I-P_{k+1}\right)^{-1} P_{k+1}\left(A x_{k}+B u_{k}\right)$$

provided that $(\gamma^{2} I-P_{k+1})$ is positive definite. This will always 
be the case for sufficiently large $\gamma$ which will be assumed hereafter. 
Replacing $w_k$ in $J_k(x_k)$, we obtain 

$$J_{k}\left(x_{k}\right)=\min _{u_{k}=\mu_{k}\left(x_{k}\right)} x_{k}^{\top} 
Q x_{k}+u_{k}^{\top} R u_{k}+\left(A x_{k}+B u_{k}\right)^{\top} P_{k+1}\left(A 
x_{k}+B u_{k}\right)$$

$$+\left(A x_{k}+B u_{k}\right)^{\top} P_{k+1}\left(\gamma^{2} I-P_{k+1}\right)^{-1} 
P_{k+1}\left(A x_{k}+B u_{k}\right)$$

$$=\min _{u_{k}=\mu_{k}\left(x_{k}\right)} x_{k}^{\top} Q x_{k}+u_{k}^{\top} 
R u_{k}+\left(A x_{k}+B u_{k}\right)^{\top} \bar{P}_{k+1}\left(A x_{k}+B u_{k}\right)$$

where $\bar{P}_{k+1}=P_{k+1}+P_{k+1}\left(\gamma^{2} I-P_{k+1}\right)^{-1} 
P_{k+1}$.  By differentiating again with respect to $u_k$, we obtain 

$$u_{k}=-\left(R+B^{\top} \bar{P}_{k+1} B\right)^{-1} \bar{P}_{k+1} A x_{k},$$

and by replacing it back to $J_k(x_k)$, we obtain 

$$J_{k}\left(x_{k}\right)=x_{k}^{\top} P_{k} x_{k},$$

where

$$P_{k}=Q+A^{\top} \bar{P}_{k+1} A-A^{\top} \bar{P}_{k+1} B\left(R+B^{\top} 
\bar{P}_{k+1} B\right)^{-1} B^{\top} \bar{P}_{k+1} A.$$

We can iterate from $h-1$ to $0$ the following set of equations with $P_k(h)=0$ 

$$\bar{P}_{k+1} =P_{k+1}+P_{k+1}\left(\gamma^{2} I-P_{k+1}\right)^{-1} P_{k+1}$$

$$P_{k} =Q+A^{\top} \bar{P}_{k+1} A-A^{\top} \bar{P}_{k+1} B\left(R+B^{\top} 
\bar{P}_{k+1} B\right)^{-1} B^{\top} \bar{P}_{k+1} A$$

and conclude that the optimal policies for the control input and disturbances 
are

$$u_{k}=K_k x_{k}, \ \ \ \ K_k:=-\left(R+B^{\top} \bar{P}_{k+1} B\right)^{-1} \bar{P}_{k+1}A,$$

and

$$w_{k}=L_k x_{k}, \ \ \ \ L_k:=(\gamma^{2} I-P_{k+1})^{-1} P_{k+1}(A+BK_k).$$
