### Assignment 7
_Jiajie Lu_

In [1]:
import numpy as np
import pandas as pd

#### Problem 1
>A machine may be in two states:  good or bad.  It produces an item at the end of each time period.  If the machine is bad, the item is bad/defective as well.  If the machine is good, then the  item  is  good.   A  machine,  which  is  good  at  stage $t$ may  become  bad  at  stage $t+ 1$  with probability $p$.  A bad machine remains bad,  unless replaced.  The state of the machine is not visible and can be identified only by inspecting the produced items.  An item produced in period $t$ may be inspected immediately at cost $I$.  The inspection is perfect,  that is,  it distinguishes between good and defective items without mistakes.  If the inspection finds the item bad, the machine may be replaced at a cost $R$.  The cost of producing a bad item is $C$.

**(a)** Formulate a finite horizon dynamic programming problem to minimize the cost of operating the machine.

**Solution:**<br>
* State Space: <br>
state of machine, unobservable space
$$
\mathcal{X}=\{0:\text{"Good"}, 1:\text{"Bad"}\},
$$
probability that the machine is good is $\alpha$, belief state
$$
\xi=\{\alpha, 1-\alpha\}.
$$
* Control Space:
$$
\mathcal{U}=\{0:\text{"do nothing"}, 1:\text{"inspect and replace the broken"}\}.
$$
* Feasible Mapping:
$$
U(x)=\{0,1\}.
$$
* Transition Probability:
$$
\alpha_{t+1}=\left\{
\begin{array}{ll}
(1-p)\alpha_t, &u=0,\\
1-p, &u=1.
\end{array}
\right.
$$
It is plain that if we do nothing, then the machine will be good at next stage with probability $(1-p)\alpha_t$ where $\alpha_t$ is the probability that the machine is good at present stage. And if we do inspection and replace the bad machine, then it will go bad at next stage with probability $p$. Equivalently, it will keep good in probaility of $1-p$.
* Cost Function:
$$
c(\alpha_t, u_t)=(1-\alpha_t)C+u[I+(1-\alpha_t)R].
$$
First term $(1-\alpha_t)C$ represents the expected cost for producing bad products. And the second term is the expected cost if we do inspection. And the machine is bad with probability $1-\alpha_t$ that means our expected cost for replacement is $(1-\alpha_t)R$.
* Dynamic Programming Equation:
$$
\begin{eqnarray}
v_t(\alpha_t)&=&\min_{u\in\mathcal{U}}\{c(\alpha_t,u)+v_{t+1}(\alpha_{t+1})\},\\
&=&\min\{(1-\alpha_t)C+v_{t+1}[(1-p)\alpha_t], I+(1-\alpha_t)(R+C)+v_{t+1}(1-p)\}, \quad t=1,\ldots,N, \alpha\in[0,1].\\
v_{N+1}(\alpha)&=&0.
\end{eqnarray}
$$

**(b)** The initial state of the machine is good.  Solve the problem for $p= 0.2$,$I= 1$,$R= 3$,$C= 2$, and $N= 18$.

**Solution:**<br>
It's easy to notice from the transition probability that the belief probability $\alpha_t=(1-p)^k$, $k=1,2,\ldots,N-1$. Then we can rewrite our dynamic programming equation as
$$
\begin{eqnarray}
v_t(k)&=&\min\{[1-(1-p)^k]C+v_{t+1}(k+1), I+[1-(1-p)^k](R+C)+v_{t+1}(1)\}\\
&=&\min\{2(1-0.8^k)+v_{t+1}(k+1), 1+5(1-0.8^k)+v_{t+1}(1)\}, \quad k=1,2,\ldots,t-1,\quad t=1,\ldots,18,\\
v_{19}(k)&=&0,\quad k=1,\ldots,17.
\end{eqnarray}
$$

In [2]:
# set parameter
p = 0.2
I, R, C = 1, 3, 2
N = 18

# init the policy table
# row: N, col: k
v_mat = np.zeros((N, N-1))
p_mat = np.zeros((N, N-1))

# update v_N
for k in range(N-1):
    vs = [(1-(1-p)**(k+1))*C, I+(1-(1-p)**(k+1))*(R+C)]
    v_mat[N-1, k], p_mat[N-1, k] = np.min(vs), np.argmin(vs)
    
# update v_2 - v_{N-1}
for t in range(N-2, 0, -1):
    # belief state
    for k in range(t):
        vs = [(1-(1-p)**(k+1))*C+v_mat[t+1,k+1], I+(1-(1-p)**(k+1))*(R+C)+v_mat[t+1,0]]
        v_mat[t, k], p_mat[t, k] = np.min(vs), np.argmin(vs)
        
# update v_1
vs = [v_mat[1, 1], v_mat[1,0]]
v_mat[0,0], p_mat[0,0] = v_mat[1, 0], 0

Then we show that the optimal policy is just

In [3]:
print(f"Time Period 00: {p_mat[0,0:1]}")
for idx in range(1, N):
    print(f"Time Period {str(idx).zfill(2)}: {p_mat[idx, 0:idx]}")

Time Period 00: [0.]
Time Period 01: [0.]
Time Period 02: [0. 0.]
Time Period 03: [0. 0. 0.]
Time Period 04: [0. 0. 0. 1.]
Time Period 05: [0. 0. 0. 1. 1.]
Time Period 06: [0. 0. 0. 1. 1. 1.]
Time Period 07: [0. 0. 0. 1. 1. 1. 1.]
Time Period 08: [0. 0. 1. 1. 1. 1. 1. 1.]
Time Period 09: [0. 0. 0. 1. 1. 1. 1. 1. 1.]
Time Period 10: [0. 0. 0. 1. 1. 1. 1. 1. 1. 1.]
Time Period 11: [0. 0. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
Time Period 12: [0. 0. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
Time Period 13: [0. 0. 0. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
Time Period 14: [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
Time Period 15: [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
Time Period 16: [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
Time Period 17: [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
