title | date |
---|---|
Class 8 |
January 31, 2023 |
- Times
$t = 0, 1, \ldots,T$ -
$\mathcal{S}$ : State Space -
$\mathcal{A}$ : Action Space $S_{t+1} = f_t(S_t, A_t, W_t)$
Here Expectation is not over revenue as we have assumed it to be deterministic (over the state). (This is when MDP is a Markov Reward Process).
Two state MDP
Notation: (reward, probability)
- State
$S_1$ - Action
$a_{1,1}$ -
$\bc{5, .5}$ : come back to$S_1$ -
$\bc{5, .5}$ : go to$S_2$
-
- Action
$a_{1,2}$ -
$\bc{10, 1}$ : goto$S_2$
-
- Action
- State
$S_2$ - Action
$a_{2,1}$ -
$\bc{-1, 1}$ : goto$S_1$
-
- Action
Write the Bellman Optimality Equation and find the optimal policy.
(Assume the N for the finite horizon problem and solve for that).