# Linear Quadratic Regulator

The previous section showed that the eigenvalues of the stable system regarding the angle and angular velocity of the pole have characteristics of continuous systems rather than discrete systems.

Regardless, we build linear quadratic regulators by solving both the continuous and discrete algebraic Ricatti equation.

In [1]:
import gym
import tensorflow as tf
import numpy as np
import pickle
import control

In [2]:
model = tf.keras.models.load_model(
    './cartpole_system_model', custom_objects=None, compile=True, options=None
)
np_weights = model.get_weights()

A = np_weights[0]
B = np_weights[1].T
print("A Matrix")
print(A)
print("B Matrix")
print(B)

A Matrix
[[ 1.0000035e+00 -1.2942681e-05 -2.3801964e-05  1.6305943e-03]
 [ 2.0008639e-02  9.9992085e-01 -2.6311722e-05 -9.5092575e-04]
 [-8.2056704e-07 -1.3424688e-02  1.0000260e+00  3.1254122e-01]
 [-4.4493249e-06 -5.0352475e-05  2.0024499e-02  9.9951327e-01]]
B Matrix
[[ 6.6651064e-06]
 [ 1.9508155e-01]
 [-1.1186228e-05]
 [-2.9142728e-01]]


2022-04-11 00:44:04.751746: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


In [3]:
def cartpole_simulation(K):
    env = gym.make('CartPole-v1')
    x = env.reset()
    cumul_reward = 0
    for _ in range(500):
        u = np.matmul(-K, x)
        if u < 0:
            u = 0
        else:
            u = 1
        x, reward, done, _ = env.step(u)
        cumul_reward += reward
        if done:
            break
    env.close()
    return cumul_reward

We define a cost function for observation/state spaces $x$ and control inputs $u$.

$x^T Qx + u^T Ru$,

with $Q$ as the state space weight and $R$ as the control input space weight.

Infinite-horizon continuous algebraic Ricatti equation (CARE):

$Q - S(t)B R^{-1} B^T S(t) + S(t)A + A^T S(t) = 0$

The K gain can be derived as

$K = R^{-1} B^T S(t)$, 

which can be used for the control input $u = -Kx$.

In [4]:
Q = np.diag([1, 1, 10, 100])

# R (control input gain) is set as the identity matrix
S, Lambda_S, K_gain = control.care(A, B, Q, R=None)

print("Solution")
print(S, "\n")

print("Eigenvalues")
print(Lambda_S, "\n")

print("K (Gain)")
print(K_gain, "\n")

Solution
[[ 1.10519999e+17  2.01569998e+14 -5.62671191e+14  1.37481907e+14]
 [ 2.01569998e+14  3.67714030e+11 -1.02621701e+12  2.50800322e+11]
 [-5.62671191e+14 -1.02621701e+12  2.86462969e+12 -6.99936869e+11]
 [ 1.37481907e+14  2.50800322e+11 -6.99936869e+11  1.71059032e+11]] 

Eigenvalues
[-3.08798405+0.j         -1.00275954+0.01339002j -1.00275954-0.01339002j
 -0.99619599+0.j        ] 

K (Gain)
[[-4.69942858e+08 -8.70061097e+05  2.39227217e+06 -5.93293568e+05]] 



In [5]:
cartpole_simulation(K_gain)

10.0

Infinite-horizon discrete algebraic Ricatti equation (DARE):
$S = Q + A^T SA - (A^T S B) {(R + B^T S B)}^-1 (B^T SA)$, 

where the K gain is,

$K = {(R + B^T SB)}^{-1} B^T S A$.

In [6]:
Q = np.diag([1, 1, 10, 100])

# R (control input gain) is set as the identity matrix
S, Lambda_S, K_gain = control.dare(A, B, Q, R=None)

print("Solution")
print(S, "\n")

print("Eigenvalues")
print(Lambda_S, "\n")

print("K (Gain)")
print(K_gain, "\n")

Solution
[[ 1.25364207e+05  2.75759689e+03 -2.43295843e+02  1.97238912e+03]
 [ 2.75759689e+03  2.85394942e+02  1.65239983e+01  1.99380007e+02]
 [-2.43295843e+02  1.65239983e+01  1.18872284e+02  4.90372483e+01]
 [ 1.97238912e+03  1.99380007e+02  4.90372483e+01  2.63603442e+02]] 

Eigenvalues
[0.09526306+0.j 0.90502869+0.j 0.99519253+0.j 0.9997579 +0.j] 

K (Gain)
[[-3.1145489  -0.19519992 -1.02162026 -3.57657182]] 



In [7]:
cartpole_simulation(K_gain)

137.0

It seems that despite the aforementioned indication of a continuous control system, DARE performs much better.

In [9]:
save_dare_controller_dict = {"K": K_gain, "S": S, "A": A, "B": B}
print(save_dare_controller_dict)
with open('./cartpole_system_model/dare_controller.pkl', 'wb') as filepath:
    pickle.dump(save_dare_controller_dict, filepath, protocol=pickle.HIGHEST_PROTOCOL)

{'K': array([[-3.1145489 , -0.19519992, -1.02162026, -3.57657182]]), 'S': array([[ 1.25364207e+05,  2.75759689e+03, -2.43295843e+02,
         1.97238912e+03],
       [ 2.75759689e+03,  2.85394942e+02,  1.65239983e+01,
         1.99380007e+02],
       [-2.43295843e+02,  1.65239983e+01,  1.18872284e+02,
         4.90372483e+01],
       [ 1.97238912e+03,  1.99380007e+02,  4.90372483e+01,
         2.63603442e+02]]), 'A': array([[ 1.0000035e+00, -1.2942681e-05, -2.3801964e-05,  1.6305943e-03],
       [ 2.0008639e-02,  9.9992085e-01, -2.6311722e-05, -9.5092575e-04],
       [-8.2056704e-07, -1.3424688e-02,  1.0000260e+00,  3.1254122e-01],
       [-4.4493249e-06, -5.0352475e-05,  2.0024499e-02,  9.9951327e-01]],
      dtype=float32), 'B': array([[ 6.6651064e-06],
       [ 1.9508155e-01],
       [-1.1186228e-05],
       [-2.9142728e-01]], dtype=float32)}
