# 線形システムにおけるSystem Level Synthesis (SLS)

参考：
* [On the Sample Complexity of the Linear Quadratic Regulator](https://arxiv.org/abs/1710.01688)の3.2章
* [Regret Bounds for Robust Adaptive Control of the Linear Quadratic Regulator](https://arxiv.org/abs/1805.09388)

[RL_LQR.ipynb](RL_LQR.ipynb)では二次の目的関数と次の線形なシステムを考えました．

$$
\begin{aligned}
&\lim _{T \rightarrow \infty} \frac{1}{T} \sum_{t=1}^T \mathbb{E}\left[x_t^* Q x_t+u_t^* R u_t\right]\\
&x_{k+1}=A x_k+B u_k+w_k, \quad x_0 \sim \mathcal{D}, w_k \sim \mathcal{N}\left(0, \sigma_w^2 I\right)
\end{aligned}
$$

この問題はリッカチ方程式やSLSで解くことができます．
しかし，現実世界ではダイナミクスの$A$と$B$の真の値が不明な場合があります．
次の問題を考えましょう．

---

真の$A, B$は不明ですが，その推定値$\hat{A}, \hat{B}$はわかっている場合を考えます．

表記：
* 推定誤差を$\Delta_A:=\widehat{A}-A$ および $\Delta_B:=\widehat{B}-B$
* $\widehat{\boldsymbol{\Delta}}:=\left[\begin{array}{ll}\Delta_A & \Delta_B\end{array}\right]\left[\begin{array}{l}\boldsymbol{\Phi}_x \\ \boldsymbol{\Phi}_u\end{array}\right]$


[RL_LQR_system_level_synthesis](RL_LQR_system_level_synthesis.ipynb)を思い出すと，
$\Phi_x, \Phi_u$は
$$
\left[\begin{array}{lll}
z I-\widehat{A} & -\widehat{B}
\end{array}\right]\left[\begin{array}{l}
\boldsymbol{\Phi}_x \\
\boldsymbol{\Phi}_u
\end{array}\right]=I
$$

を満たすものを考えれば良いわけですね．
この方程式について，次が成立します：

$$
\left[\begin{array}{lll}
z I-\widehat{A} & -\widehat{B}
\end{array}\right]\left[\begin{array}{l}
\boldsymbol{\Phi}_x \\
\boldsymbol{\Phi}_u
\end{array}\right]=I \quad \text { if and only if } \quad\left[\begin{array}{ll}
z I-A_{\star} & -B_{\star}
\end{array}\right]\left[\begin{array}{l}
\mathbf{\Phi}_x \\
\mathbf{\Phi}_u
\end{array}\right]=I+\widehat{\boldsymbol{\Delta}}
$$

ここで，$(I + \widehat{\Delta})^{-1}$が存在する場合，$\Phi_x, \Phi_u$は真のシステム$(A, B)$では次の応答を示します（[Regret Bounds for Robust Adaptive Control of the Linear Quadratic Regulator](https://arxiv.org/abs/1805.09388)のTheorem A.2）：

$$
\left[\begin{array}{l}
\mathbf{x} \\
\mathbf{u}
\end{array}\right]=\left[\begin{array}{l}
\boldsymbol{\Phi}_x \\
\mathbf{\Phi}_u
\end{array}\right](I+\widehat{\boldsymbol{\Delta}})^{-1} \mathbf{w}
$$

さらに，次が成立します．

---

**[On the Sample Complexity of the Linear Quadratic Regulator](https://arxiv.org/abs/1710.01688)のLemma 3.4**

$\boldsymbol{K}$はシステム$(\widehat{A}, \widehat{B})$を安定させるコントローラーとします．また，$\left(\boldsymbol{\Phi}_x, \boldsymbol{\Phi}_u\right)$は$\boldsymbol{K}$のシステム応答とします．

このとき，もし$\boldsymbol{K}$が$(A, B)$を安定させるならば，$\boldsymbol{K}$はシステム$A, B$において次のコストを達成します．

$$
J(A, B, \mathbf{K}):=\left\|\left[\begin{array}{cc}
Q^{\frac{1}{2}} & 0 \\
0 & R^{\frac{1}{2}}
\end{array}\right]\left[\begin{array}{l}
\boldsymbol{\Phi}_x \\
\boldsymbol{\Phi}_u
\end{array}\right]\left(I+\left[\begin{array}{ll}
\Delta_A & \Delta_B
\end{array}\right]\left[\begin{array}{l}
\mathbf{\Phi}_x \\
\mathbf{\Phi}_u
\end{array}\right]\right)^{-1}\right\|_{\mathcal{H}_2}
$$

また，$\boldsymbol{K}$が$(A, B)$を安定させる十分条件は$\|\hat{\boldsymbol{\Delta}}\|_{\mathcal{H}_{\infty}}<1$です．



---

**最適化について**

$$
\left\|\Delta_A\right\|_2 \leq \epsilon_A, \quad\left\|\Delta_B\right\|_2 \leq \epsilon_B
$$

である場合を考えましょう．Lemma 3.4より，次のrobust LQR問題を考えることができます．

$$
\begin{aligned}
& \min _{\mathbf{\Phi}_x, \mathbf{\Phi}_u} \sup _{\substack{\left\|\Delta_A\right\|_2 \leq \epsilon_A \\
\left\|\Delta_B\right\|_2 \leq \epsilon_B}} 
\left\|\left[\begin{array}{cc}
Q^{\frac{1}{2}} & 0 \\
0 & R^{\frac{1}{2}}
\end{array}\right]\left[\begin{array}{l}
\boldsymbol{\Phi}_x \\
\boldsymbol{\Phi}_u
\end{array}\right]\left(I+\left[\begin{array}{ll}
\Delta_A & \Delta_B
\end{array}\right]\left[\begin{array}{l}
\mathbf{\Phi}_x \\
\mathbf{\Phi}_u
\end{array}\right]\right)^{-1}\right\|_{\mathcal{H}_2}\\
& \text { s.t. }\left[\begin{array}{ll}
z I-\widehat{A} & -\widehat{B}
\end{array}\right]\left[\begin{array}{l}
\mathbf{\Phi}_x \\
\mathbf{\Phi}_u
\end{array}\right]=I, \quad \mathbf{\Phi}_x, \mathbf{\Phi}_u \in \frac{1}{z} \mathcal{R} \mathcal{H}_{\infty} .
\end{aligned}
$$

このmin-max問題は一般に解くのが難しいです．そこで，緩和した問題を考えましょう．まず次のUpper boundが成立します．

$$
\begin{aligned}
J(A, B, \mathbf{K})
&=\left\|\left[\begin{array}{cc}
Q^{\frac{1}{2}} & 0 \\
0 & R^{\frac{1}{2}}
\end{array}\right]\left[\begin{array}{l}
\boldsymbol{\Phi}_x \\
\boldsymbol{\Phi}_u
\end{array}\right]\left(I+\left[\begin{array}{ll}
\Delta_A & \Delta_B
\end{array}\right]\left[\begin{array}{l}
\mathbf{\Phi}_x \\
\mathbf{\Phi}_u
\end{array}\right]\right)^{-1}\right\|_{\mathcal{H}_2}\\
&\leq
\left\|\left(I+ \left[\begin{array}{ll}
\Delta_A & \Delta_B
\end{array}\right]\left[\begin{array}{l}
\mathbf{\Phi}_x \\
\mathbf{\Phi}_u
\end{array}\right]
\right)^{-1}\right\|_{\mathcal{H}_{\infty}} 
\left\|\left[\begin{array}{cc}
Q^{\frac{1}{2}} & 0 \\
0 & R^{\frac{1}{2}}
\end{array}\right]\left[\begin{array}{l}
\boldsymbol{\Phi}_x \\
\boldsymbol{\Phi}_u
\end{array}\right]\right\|_{\mathcal{H}_2}\\
&=\left\|(I+\hat{\boldsymbol{\Delta}})^{-1}\right\|_{\mathcal{H}_{\infty}} J(\widehat{A}, \widehat{B}, \mathbf{K})
\end{aligned}
$$

さらに$\|\hat{\boldsymbol{\Delta}}\|_{\mathcal{H}_{\infty}}<1$であれば，

$$
J(A, B, \mathbf{K}) \leq\left\|(I+\hat{\boldsymbol{\Delta}})^{-1}\right\|_{\mathcal{H}_{\infty}} J(\widehat{A}, \widehat{B}, \mathbf{K}) \leq \frac{1}{1-\|\hat{\boldsymbol{\Delta}}\|_{\mathcal{H}_{\infty}}} J(\widehat{A}, \widehat{B}, \mathbf{K})
$$

が成り立ちます．続いて$\|\hat{\boldsymbol{\Delta}}\|_{\mathcal{H}_{\infty}}$のバウンドを考えましょう．次が成り立ちます．

---

**Proposition 3.5**

任意の$\alpha \in (0, 1)$について，

$$
\|\hat{\boldsymbol{\Delta}}\|_{\mathcal{H}_{\infty}} \leq\left\|\left[\begin{array}{c}
\frac{\epsilon_A}{\sqrt{\alpha}} \boldsymbol{\Phi}_x \\
\frac{\epsilon_B}{\sqrt{1-\alpha}} \boldsymbol{\Phi}_u
\end{array}\right]\right\|_{\mathcal{H}_{\infty}}=: H_\alpha\left(\boldsymbol{\Phi}_x, \boldsymbol{\Phi}_u\right)
$$

が成立します．証明は省略．

---


以上を踏まえると，

$$
\sup _{\substack{\left\|\Delta_A\right\|_2 \leq \epsilon_A \\
\left\|\Delta_B\right\|_2 \leq \epsilon_B}} J(A, B, \mathbf{K}) \leq\left\|\left[\begin{array}{cc}
Q^{\frac{1}{2}} & 0 \\
0 & R^{\frac{1}{2}}
\end{array}\right]\left[\begin{array}{l}
\boldsymbol{\Phi}_x \\
\boldsymbol{\Phi}_u
\end{array}\right]\right\|_{\mathcal{H}_2} \frac{1}{1-H_\alpha\left(\boldsymbol{\Phi}_x, \boldsymbol{\Phi}_u\right)}=\frac{J(\widehat{A}, \widehat{B}, \mathbf{K})}{1-H_\alpha\left(\mathbf{\Phi}_x, \boldsymbol{\Phi}_u\right)}
$$

が成立します（$H_\alpha\left(\boldsymbol{\Phi}_x, \boldsymbol{\Phi}_u\right)<1$のときのみ）．

さらにゴニョゴニョ変形すれば，次の問題を解くことで，ロバストLQRを達成することができます．

$$
\begin{gathered}
\operatorname{minimize}_{\gamma \in[0,1)} \frac{1}{1-\gamma} \min _{\mathbf{\Phi}_x, \mathbf{\Phi}_u}\left\|\left[\begin{array}{cc}
Q^{\frac{1}{2}} & 0 \\
0 & R^{\frac{1}{2}}
\end{array}\right]\left[\begin{array}{l}
\boldsymbol{\Phi}_x \\
\boldsymbol{\Phi}_u
\end{array}\right]\right\|_{\mathcal{H}_2} \\
\text { s.t. }[z I-\widehat{A}-\widehat{B}]\left[\begin{array}{l}
\mathbf{\Phi}_x \\
\boldsymbol{\Phi}_u
\end{array}\right]=I,\left\|\left[\begin{array}{c}
\frac{\epsilon_A}{\sqrt{\alpha}} \boldsymbol{\Phi}_x \\
\frac{\epsilon_B}{\sqrt{1-\alpha}} \boldsymbol{\Phi}_u
\end{array}\right]\right\|_{\mathcal{H}_{\infty}} \leq \gamma \\
\boldsymbol{\Phi}_x, \boldsymbol{\Phi}_u \in \frac{1}{z} \mathcal{R} \mathcal{H}_{\infty} .
\end{gathered}
$$

実験してみましょう．

In [1]:
import numpy as np
import matplotlib.pyplot as plt
from celluloid import Camera


def inverted_pendulum_dynamics(dt: float = 1.0):
    ''' 倒立振子のダイナミクスを返します．
    参考（式20）：https://ctms.engin.umich.edu/CTMS/index.php?example=InvertedPendulum&section=SystemModeling
    状態：[x, x', phi, phi']
    '''
    M = 0.5  # mass of the cart
    m = 0.2  # mass of the pendulum

    I = 0.006   # mass moment of inertia
    l = 0.3  # length to pendulum center of mass

    b = 0.1  # coefficient of friction for cart
    g = 9.8   # gravity

    X = I * (M + m) + M * m * (l ** 2)
    A = np.array([
        [0, 1, 0, 0],
        [0, -(I + m*(l**2))*b / X, (m**2) * g * (l**2) / X, 0],
        [0, 0, 0, 1],
        [0, -m*l*b / X, m*g*l*(M+m) / X, 0] 
    ]) * dt   

    B = np.array([0, (I + m*(l**2)) / X, 0, m*l / X]).reshape((4, 1))
    return A, B


def draw_inverted_pendulum(camera: Camera, states: np.ndarray):
    '''倒立振子を描画します
    Args:
        states (np.ndarray): horizon x (state dim)
    '''

    def _draw(x):
        cart_pos = x[0]
        l = 0.3
        pendulum_x = cart_pos + 1 * np.sin(x[2])
        pendulum_y = 1 * np.cos(x[2])

        plt.hlines(0, -2, 2, color="black")
        plt.plot([cart_pos, pendulum_x], [0, pendulum_y], color="black")

    plt.xlim([-2, 2])
    plt.ylim([-2, 2])
    plt.xlabel("x [m]")
    plt.ylabel("y [m]")

    for x in states:
        _draw(x)
        camera.snap()


In [2]:
import cvxpy as cp


def psd_sqrt(P):
    '''
    行列Pの平方根を計算します．
    https://kagamiz.hatenablog.com/entry/2020/05/16/212214 参照．
    '''
    assert len(P.shape) == 2
    assert P.shape[0] == P.shape[1]
    w, v = np.linalg.eigh(P)
    assert (w >= 0).all()
    return v @ np.diag(np.sqrt(w)) @ v.T


def solve_robust_sls(Ahat, Bhat, eps_A, eps_B, Q, R, T, gamma, alpha):
    """長さT のFIRフィルタに対してのSLS synthesis を解きます．"""

    assert T >= 1
    n, p = Bhat.shape

    Q_sqrt = psd_sqrt(Q)
    R_sqrt = psd_sqrt(R)

    # Phi_x = \sum_{k=1}^{T} Phi_x[k] z^{-k}
    Phi_x = cp.Variable((T*n, n), name="Phi_x")

    # Phi_u = \sum_{k=1}^{T} Phi_u[k] z^{-k}
    Phi_u = cp.Variable((T*p, n), name="Phi_u")

    # htwo_cost
    htwo_cost = cp.Variable(name="htwo_cost")

    # subspace constraint:
    # [zI - A, -B] * [Phi_x; Phi_u] = I
    #
    # Note that:
    # z Phi_x = \sum_{k=0}^{T-1} Phi_x[k+1] z^{-k}
    #
    # This means that:
    # 1) Phi_x[1] = I
    # 2) Phi_x[k+1] = A@Phi_x[k] + B@Phi_u[k] for k=1, ..., T-1
    # 3) A@Phi_x[T] + B@Phi_u[T] = 0

    constr = []

    constr.append(Phi_x[:n, :] == np.eye(n))
    for k in range(T-1):
        constr.append(Phi_x[n*(k+1):n*(k+1+1), :] == Ahat@Phi_x[n*k:n*(k+1), :] + Bhat@Phi_u[p*k:p*(k+1), :])
    constr.append(Ahat@Phi_x[n*(T-1):, :] + Bhat@Phi_u[p*(T-1):, :] == 0)

    # H2 constraint:
    # By Parseval's identity, this is equal (up to constants) to
    #
    # frobenius_norm(
    #   [ Q_sqrt@Phi_x[1] ;
    #     ...
    #     Q_sqrt@Phi_x[T] ;
    #     R_sqrt@Phi_u[1] ;
    #     ...
    #     R_sqrt@Phi_u[T]
    #   ]
    # ) <= htwo_cost
    # TODO: what is the best way to implement this in cvxpy?
    constr.append(
        cp.norm(
            cp.bmat(
                [[Q_sqrt@Phi_x[n*k:n*(k+1), :]] for k in range(T)] +
                [[R_sqrt@Phi_u[p*k:p*(k+1), :]] for k in range(T)]),
            'fro') <= htwo_cost)


    # H-infinity constraint
    #
    # We want to enforce ||H(z)||_inf <= gamma, where
    #
    #   H(z) = \sum_{k=1}^{T} [ mult_x * Phi_x[k] ; mult_u * Phi_u[k] ] z^{-k}.
    #
    # Here, each of the FIR coefficients has size (n+p) x n. Since n+p>n, we enforce
    # the constraint on the transpose system H^T(z). The LMI constraint
    # for this comes from Theorem 5.8 of
    # Positive trigonometric polynomials and signal processing applications (2007) by
    # B. Dumitrescu.
    #
    # Here is a table to map the variable names in the text to this program
    #
    #       Text          Program                   Comment
    # -------------------------------------------------------------
    #         p             n                   Output dim
    #         m             n+p                 Input dim
    #         n             T                   FIR horizon
    #       p(n+1)          n(T+1)              SDP variable size
    #      p(n+1) x m       n(T+1) x (n+p)

    mult_x = eps_A/np.sqrt(alpha)
    mult_u = eps_B/np.sqrt(1-alpha)

    # Hbar has size (T+1)*n x (n+p)
    Hbar = cp.bmat(
        [[np.zeros((n, n)), np.zeros((n, p))]] +
        [[mult_x*Phi_x[n*k:n*(k+1), :].T, mult_u*Phi_u[p*k:p*(k+1), :].T] for k in range(T)])

    Q = cp.Variable((n*(T+1), n*(T+1)), name="Q", PSD=True)

    # Case k==0: the block diag of Q has to sum to gamma^2 * eye(n)
    gamma_sq = gamma ** 2
    constr.append(
        sum([Q[n*t:n*(t+1), n*t:n*(t+1)] for t in range(T+1)]) == gamma_sq*np.eye(n))

    # Case k>0: the block off-diag of Q has to sum to zero
    for k in range(1, T+1):
        constr.append(
            sum([Q[n*t:n*(t+1), n*(t+k):n*(t+1+k)] for t in range(T+1-k)]) == np.zeros((n, n)))

    # Constraint (5.45)
    constr.append(
        cp.bmat([
            [Q, Hbar],
            [Hbar.T, np.eye(n+p)]]) == cp.Variable((n*(T+1) + (n+p), n*(T+1) + (n+p)), PSD=True))

    prob = cp.Problem(cp.Minimize(htwo_cost), constr)
    prob.solve(solver=cp.SCS)

    if prob.status == cp.OPTIMAL:
        Phi_x = np.array(Phi_x.value)
        Phi_u = np.array(Phi_u.value)
        return (True, prob.value, Phi_x, Phi_u)
    else:
        return (False, None, None, None)


def robust_sls_roll_forward(A, B, Phi_u, x0, sigma_w, horizon):
    """SLSで計算したPhi_x, Phi_uでシステムを走らせます．
    """

    states = []
    inputs = []
    x = x0.copy()
    noises = []

    for h in range(horizon):
        u = np.zeros(B.shape[1])
        # h=0: None
        # h=1: Phi_u[0] @ noises[0]
        # h=2: Phi_u[1] @ noises[0] + Phi_u[0] @ noises[1] 
        # h=3: Phi_u[2] @ noises[0] + Phi_u[1] @ noises[1] + Phi_u[0] @ noises[2]
        for t, noise in enumerate(noises):
            u = u + Phi_u[h - t - 1] @ noise
        states.append(x)
        inputs.append(u)

        # これは認識できないとします
        noise = np.random.normal(x.shape) * sigma_w
        nx = A @ x + B @ u + noise

        noises.append(nx - A @ x + B @ u)
        x = nx

    states.append(x)
    inputs.append(u)
    return np.array(states), np.array(inputs)


def cost_LQR_samples(states, inputs, Q, R):
    """
    LQRの期待コストをサンプルから計算します．
    """
    assert states.shape[1] == Q.shape[0]
    assert inputs.shape[1] == R.shape[0]
    horizon = states.shape[0]

    cost = np.trace(states @ Q @ states.T) + np.trace(inputs @ R @ inputs.T)
    return cost / horizon



In [3]:
def sample_2_to_2_ball(A, eps):
    """Sample from the 2->2 ball around A with radius eps

    """
    Asample = np.array(A)
    Delta_A = np.random.normal(size=Asample.shape)
    Delta_A = Delta_A / np.linalg.norm(Delta_A, ord=2) * eps
    Asample += Delta_A
    TOL = 1e-7
    assert (np.linalg.norm(Asample - A, ord=2) <= eps + TOL), str(np.linalg.norm(Asample - A, ord=2))
    return Asample


In [8]:
# テスト
A, B = inverted_pendulum_dynamics(0.1)
Q = np.array([
    [1, 0, 0, 0],
    [0, 0, 0, 0],
    [0, 0, 1, 0],
    [0, 0, 0, 0],
])
R = np.eye(1)
eps = 0.1

Ahat = sample_2_to_2_ball(A, eps)
Bhat = sample_2_to_2_ball(B, eps)

solved, cost, Phi_x, Phi_u = solve_robust_sls(Ahat, Bhat, eps, eps, Q, R, 50, 0.98, 0.3)
print(cost)


1.4715945879178167
