## CHEME 5660 Lab 8: Formulation and Solution of the Branched Tiger Problem as a Markov Decision Process (MDP)

<img src="./figs/Fig-Branched-MDP-Schematic-no-a-labels.png" style="margin:auto; width:60%"/>

## Introduction
Fill me in

## Lab 8 setup

In [1]:
import Pkg; Pkg.activate("."); Pkg.resolve(); Pkg.instantiate();

[32m[1m  Activating[22m[39m project at `~/Desktop/julia_work/CHEME-5660-Markets-Mayhem-Example-Notebooks/labs/lab-8-MDP-Tiger-Problem`
[32m[1m  No Changes[22m[39m to `~/Desktop/julia_work/CHEME-5660-Markets-Mayhem-Example-Notebooks/labs/lab-8-MDP-Tiger-Problem/Project.toml`
[32m[1m  No Changes[22m[39m to `~/Desktop/julia_work/CHEME-5660-Markets-Mayhem-Example-Notebooks/labs/lab-8-MDP-Tiger-Problem/Manifest.toml`


In [2]:
# load req packages -
using PrettyTables

In [3]:
include("CHEME-5660-Lab-8-CodeLib.jl");

In [4]:
# setup some global constants -
α = 0.80; # probability of moving the direction we are expect

In [5]:
# setup the states and actions -
safety = 1;
tiger = 15;

states = range(safety,stop=tiger, step=1) |> collect;
actions = [1,2,3,4]; # a₁ = move left, a₂ = move right, a₃ = move up, a₄ = move down
γ = 0.95;

#### Configure the rewards array

In [6]:
# setup the rewards -
R = Array{Float64,2}(undef,length(states), length(actions));

# most of the rewards are zero -
fill!(R,0.0) # fill R w/zeros

# set the rewards for the ends -
R[safety + 1,1] = 10; # if in state 2, and we take action 1 = we live, get married, our kids are all doctors, and we are generally content
R[tiger-1, 2] = -100; # if in state N - 1, and we take action 2 = we get eaten. Bad.

# rewards for the by-passes. 
R[2,3] = -10.0;
R[2,4] = -1.0;

R[9,3] = -1.0;
R[9,4] = -1.0;


R[8,1] = 0.0;

#### Configure the transition array

In [7]:
# Setup the transitions
T = Array{Float64,3}(undef, length(states), length(states), length(actions));
fill!(T,0.0);

# We need to put values into the transition array (these are probabilities, so eah row much sum to 1)
T[safety, 1, 1:length(actions)] .= 1.0; # if we are in state 1, we stay in state 1 ∀a ∈ 𝒜
T[tiger, tiger, 1:length(actions)] .= 1.0; # if we are in state 5, we stay in state 5 

# left actions -
for s ∈ 2:(tiger - 1)
    T[s,s-1,1] = α;
    T[s,s+1,1] = (1-α);
end

# right actions -
for s ∈ 2:(tiger - 1)
    T[s,s-1,2] = (1-α);
    T[s,s+1,2] = α; 
end

# Node 2 -
T[2,:,2] .= 0.0
T[2,3,3] = α;
T[2,6,3] = (1-α);
T[2,6,4] = α;
T[2,3,4] = (1-α);

# Node 3 -
T[3,2,4] = α;
T[3,4,4] = (1-α);
T[3,2,1] = 0.0;
T[3,4,1] = 0.0;

# Node 5 -
T[5,:,2] .= 0.0;
T[5,9,4] = α;
T[5,4,4] = (1-α);

# Node 6 -
T[6,2,3] = α;
T[6,7,3] = (1-α);

# Node 8 -
T[8,:,2] .= 0.0
T[8,9,3] = α;
T[8,7,3] = (1-α);

# Node 9 -
T[9,:,1] .= 0.0
T[9,8,4] = α;
T[9,5,4] = (1-α);
T[9,5,3] = α;
T[9,8,3] = (1-α);

In [8]:
mdp_problem = build(MDPProblem; 𝒮 = states, 𝒜 = actions, T = T, R = R, γ = γ);

In [9]:
U = solve(mdp_problem,1000)

15-element Vector{Float64}:
  0.0
 12.127173992305273
 11.195652591080394
 10.415791352254672
 10.037344121229339
 11.164961544819933
 10.25425952983119
  9.309940819831795
  7.982650406105737
  7.351565491190357
  6.7618483292105065
  6.182413452135977
  5.491624852294697
  4.17363488774397
  0.0

In [10]:
Q_array = Q(mdp_problem, U)[2:end-1,:];

In [11]:
# compute the policy -
policy = π(Q_array);

# display the policy -
policy_data_array = Array{Any,2}(undef, length(states)-2, 2);

for s = 1:length(states)-2
    
    policy_keyword = "left"
    policy_index = policy[s];
    if policy_index == 2
       policy_keyword = "right" 
    elseif policy_index == 3
        policy_keyword = "up" 
    elseif policy_index == 4
        policy_keyword = "down" 
    end
    
    policy_data_array[s,1] = s+1;
    policy_data_array[s,2] = policy_keyword;    
end

policy_table_header = (["State s", "Action"])

pretty_table(policy_data_array; header=policy_table_header)

┌─────────┬────────┐
│[1m State s [0m│[1m Action [0m│
├─────────┼────────┤
│       2 │   left │
│       3 │   down │
│       4 │   left │
│       5 │   left │
│       6 │     up │
│       7 │   left │
│       8 │   left │
│       9 │   down │
│      10 │   left │
│      11 │   left │
│      12 │   left │
│      13 │   left │
│      14 │   left │
└─────────┴────────┘
