# Setup

**Run the following cell before doing anything else**. Alternatively, you can select **Run - Run All Cells** in the menu.

In [3]:
import numpy as np

# Computations in slides

## Slide 15 - Computation of Value Function

In this example, we need to solve the system:

\begin{align*}
V(1)&=0.3(2 + V(2)) + 0.6(2 + V(4)) + 0.1(-2 + V(5))\\
V(2)&=0.4(4 + V(1)) + 0.5(1 + V(3)) + 0.1(-3 + V(5))\\
V(3)&=0.8(2 + V(2)) + 0.1(1 + V(4)) + 0.1(-1 + V(5))\\
V(4)&=0.2(2 + V(1)) + 0.7(1 + V(3)) + 0.1(-1 + V(5))\\
V(5)&=0
\end{align*}

We first rewrite the system in the form $AV=b$:

\begin{align*}
V(1) - 0.3V(2)-0.6V(4)&=0.3\cdot2 + 0.6\cdot 2 + 0.1\cdot(-2)\\
-0.4V(1) + V(2) - 0.5V(3) &=0.4\cdot4 + 0.5\cdot1 + 0.1\cdot(-3)\\
-0.8V(2) + V(3) - 0.1V(4)&=0.8\cdot2 + 0.1\cdot1 + 0.1\cdot(-1)\\
-0.2V(1) - 0.7V(3) + V(4)&=0.2\cdot2 + 0.7\cdot1 + 0.1\cdot(-1)\\
\end{align*}


In [18]:
A = np.array([
    [ 1.0, -0.3,  0.0, -0.6],
    [-0.4,  1.0, -0.5,  0.0],
    [ 0.0, -0.8,  1.0, -0.1],
    [-0.2,  0.0, -0.7,  1.0],
], dtype=np.float64)

print(A)

b = np.array([
    0.3 * 2 + 0.6 * 2 + 0.1 * (-2),
    0.4 * 4 + 0.5 * 1 + 0.1 * (-3),
    0.8 * 2 + 0.1 * 1 + 0.1 * (-1),
    0.2 * 2 + 0.7 * 1 + 0.1 * (-1),
], dtype=np.float64)

print(b)

[[ 1.  -0.3  0.  -0.6]
 [-0.4  1.  -0.5  0. ]
 [ 0.  -0.8  1.  -0.1]
 [-0.2  0.  -0.7  1. ]]
[1.6 1.8 1.6 1. ]


In [10]:
V = np.linalg.solve(A, b)
print(V)

[15.51196172 15.94258373 15.87559809 15.215311  ]


In [13]:
for i, v in enumerate(V):
    print(f'V({i+1}) = {V[i]:6.3f}')
print(f'V(5) = {0:6.3f}')

V(1) = 15.512
V(2) = 15.943
V(3) = 15.876
V(4) = 15.215
V(5) =  0.000


## Slide 44 - Solution by Jacobi iteration

We want to solve the system:

\begin{align*}
V(\mathtt{h})&=\frac{2}{3}r_{\mathtt{s}}+\frac{1}{3}r_{\mathtt{w}}
+\gamma\left[\left(\frac{2}{3}\alpha+\frac{1}{3}\right)V(\mathtt{h})+\frac{2}{3}(1-\alpha)V(\mathtt{l})\right]\\
V(\mathtt{l})&=-\frac{3}{5}(1-\beta)+\frac{1}{5}\beta r_{\mathtt{s}}+\frac{1}{5}r_{\mathtt{w}}\\
&+\gamma\left[\left(\frac{1}{5}(1-\beta)+\frac{3}{5}\right)V(\mathtt{h})+
\left(\frac{1}{5}\beta+\frac{1}{5}\right)V(\mathtt{l})\right]
\end{align*}

To use a Jacobi iteration, we first write this system in the form:
$$
V = b + \gamma QV
$$

Let's first define the parameters in the model:

In [16]:
alpha = 0.8
beta = 0.3
r_search = 15
r_wait = 10
gamma = 0.9

Now define $Q$ and $b$. We adopt the convention that $\mathtt{V[0]}=V_{\mathtt{h}}$ and $\mathtt{V[1]}=V_{\mathtt{l}}$

In [22]:
Q = np.array([
    [2/3 * alpha + 1/3, 2/3 * (1 - alpha)],
    [1/5 * (1 - beta) + 3/5, 1/5 * beta + 1/5],
], dtype=np.float64)


print(f'Matrix Q:\n{Q}')

b = np.array([
    2/3 * r_search + 1/3 * r_wait,
    -3/5 * (1 - beta) + 1/5 * beta * r_search + 1/5 * r_wait
])

print(f'Vector b:\n{b}')

Matrix Q:
[[0.86666667 0.13333333]
 [0.74       0.26      ]]
Vector b:
[13.33333333  2.48      ]


In [24]:
# Jacobi iteration:
V = np.zeros(2, dtype=np.float64)
n_iterations = 180
n_report = 30
for n in range(1, n_iterations + 1):
    V = b + gamma * Q @ V
    if (n % n_report == 0):
        print(f'n={n:3d}, V(high)={V[0]:10.8f}, V(low)={V[1]:10.8f}') 

n= 30, V(high)=113.68382504, V(low)=101.43401316
n= 60, V(high)=118.42373411, V(low)=106.17392222
n= 90, V(high)=118.62466434, V(low)=106.37485246
n=120, V(high)=118.63318201, V(low)=106.38337012
n=150, V(high)=118.63354308, V(low)=106.38373119
n=180, V(high)=118.63355839, V(low)=106.38374650


Let's check with a solution via LU decomposition:

In [26]:
A = np.eye(2, dtype=np.float64) - gamma * Q
V_lu = np.linalg.solve(A, b)
print(f' V(high)={V_lu[0]:10.8f}, V(low)={V_lu[1]:10.8f}')

 V(high)=118.63355907, V(low)=106.38374718


## Slide 46 - Solution by Gauss-Seidel iteration

In [27]:
# Gauss-Seidel iteration:
V = np.zeros(2, dtype=np.float64)
n_iterations = 180
n_report = 30
for n in range(1, n_iterations + 1):
    V[0] = b[0] + gamma * Q[0, :] @ V
    V[1] = b[1] + gamma * Q[1, :] @ V
    if (n % n_report == 0):
        print(f'n={n:3d}, V(high)={V[0]:10.8f}, V(low)={V[1]:10.8f}')

n= 30, V(high)=115.21947040, V(low)=103.29702236
n= 60, V(high)=118.53517952, V(low)=106.29480087
n= 90, V(high)=118.63072419, V(low)=106.38118412
n=120, V(high)=118.63347738, V(low)=106.38367332
n=150, V(high)=118.63355671, V(low)=106.38374505
n=180, V(high)=118.63355900, V(low)=106.38374712


Deterministic policy evaluation

In [4]:
alpha = 0.8
beta = 0.3
rsearch = 15
rwait = 10
rempty = -3.0
gamma = 0.9
print(f'Model parameters:\n{alpha=}, {beta=}, {rsearch=}, {rwait=}, {rempty=}, {gamma=}')

Model parameters:
alpha=0.8, beta=0.3, rsearch=15, rwait=10, rempty=-3.0, gamma=0.9


Here are the matrices for the policy $\pi(\mathtt{high})=\mathtt{search}$, $\pi(\mathtt{low})=\mathtt{search}$:

In [5]:
P = np.array([[alpha, 1 - alpha], [1 - beta, beta]], dtype=np.float64)
print(P)

[[0.8 0.2]
 [0.7 0.3]]


In [6]:
b = np.array([rsearch, rempty * (1 - beta) + rsearch * beta], dtype=np.float64)
print(b)

[15.   2.4]


In [7]:
A = np.eye(2, dtype=np.float64) - gamma * P
print(A)

[[ 0.28 -0.18]
 [-0.63  0.73]]


In [8]:
V = np.linalg.solve(A,b)
print(V)

[125.07692308 111.23076923]


# Jacobi

In [12]:
n_iterations = 100
V = np.zeros(2, dtype=np.float64)
print(f'Initial approximation: {V}')
for i in range(n_iterations):
    V = b + gamma * P @ V
    print(f'Step {i + 1}: {V}')
    

Initial approximation: [0. 0.]
Step 1: [15.   2.4]
Step 2: [26.232 12.498]
Step 3: [36.13668 22.30062]
Step 4: [45.0325212 31.1872758]
Step 5: [53.03712491 39.19105282]
Step 6: [60.24111944 46.39497295]
Step 7: [66.72470113 52.87854795]
Step 8: [72.55992344 58.71376966]
Step 9: [77.81162342 63.96546958]
Step 10: [82.53815338 68.69199954]
Step 11: [86.79203035 72.94587651]
Step 12: [90.62051963 76.77436578]
Step 13: [94.06615997 80.22000613]
Step 14: [97.16723628 83.32108244]
Step 15: [99.95820496 86.11205112]
Step 16: [102.47007677  88.62392293]
Step 17: [104.7307614   90.88460756]
Step 18: [106.76537757  92.91922372]
Step 19: [108.59653212  94.75037828]
Step 20: [110.24457122  96.39841737]
Step 21: [111.7278064   97.88165256]
Step 22: [113.06271807  99.21656422]
Step 23: [114.26413857 100.41798472]
Step 24: [115.34541702 101.49926318]
Step 25: [116.31856763 102.47241378]
Step 26: [117.19440317 103.34824933]
Step 27: [117.98265516 104.13650132]
Step 28: [118.69208195 104.84592811]
Step

In [13]:
n_iterations = 100
V = np.zeros(2, dtype=np.float64)
print(f'Initial approximation: {V}')
for i in range(n_iterations):
    V[0] = b[0] + gamma * (P[0, 0] * V[0] + P[0, 1] * V[1])
    V[1] = b[1] + gamma * (P[1, 0] * V[0] + P[1, 1] * V[1])
    print(f'Step {i + 1}: {V}')
    

Initial approximation: [0. 0.]
Step 1: [15.   11.85]
Step 2: [27.933   23.19729]
Step 3: [39.2872722  33.41424979]
Step 4: [49.30140095 42.48173004]
Step 5: [58.14372009 50.50061077]
Step 6: [65.9535884 57.5859256]
Step 7: [72.85205026 63.84499157]
Step 8: [78.94557467 69.37385977]
Step 9: [84.32810852 74.2576505 ]
Step 10: [89.08261522 78.57161323]
Step 11: [93.28237334 82.38223078]
Step 12: [96.99211035 85.74823183]
Step 13: [100.26900118  88.72149334]
Step 14: [103.16354965  91.34783948]
Step 15: [105.72036685  93.66774778]
Step 16: [107.97885873  95.7169729 ]
Step 17: [109.97383341  97.52709773]
Step 18: [111.73603765  99.12602011]
Step 19: [113.29263073 100.53838279]
Step 20: [114.66760302 101.78595326]
Step 21: [115.88214576 102.88795921]
Step 22: [116.95497761 103.86138488]
Step 23: [117.90263316 104.72123281]
Step 24: [118.73971778 105.48075506]
Step 25: [119.47913271 106.15165747]
Step 26: [120.1322739  106.74428007]
Step 27: [120.70920762 107.26775642]
Step 28: [121.21882564 