# POMDPModia.jl: Getting Started

The POMDPModia.jl package fully integrates with most other packages in the JuliaPOMDP ecosystem, including POMDPs.jl, POMDPPolicies.jl, BeliefUpdaters.jl, POMDPSimulators.jl, online and offline solvers.

## Multiple grid worlds example

In this notebook, a MODIA will be created for an example that has multiple amounts of different grid world MDPs. The objective is to pick the safest action, w.r.t. the individual states for each MDP.




In [1]:
include("../src/main.jl")

In [2]:
using POMDPs
using POMDPModels: SimpleGridWorld, GWPos
using DiscreteValueIteration
using Random

A MODIA object containing only MDPs consists of four components:
1. **Decision Problems (DPs)**: The minimum number of unique POMDPs required to effectively describe the problem at hand.
2. **Decision Components (DCs)**: The amount of each DP.
3. **Safety Sort Function (SSF)**: The function used to sort and pick the "safest" action amonng all the actions suggested by the policies of each individual POMDP.
4. **Markov Processes (MPs)**: The entire stack of MDPs kept track, totalling to an amount of sum(DCs).

In this example, we define three different grid world characteristics as follows.

In [3]:
grid_problem1 = SimpleGridWorld(size=(4,4), rewards=Dict(GWPos(4,1)=>-10.0, GWPos(4,3)=>30.0));
grid_problem2 = SimpleGridWorld(size=(4,4), rewards=Dict(GWPos(2,2)=>-20.0, GWPos(4,4)=>20.0));
grid_problem3 = SimpleGridWorld(size=(4,4), rewards=Dict(GWPos(3,4)=>-30.0, GWPos(1,4)=>10.0));

The first three components of MODIA can then be created:

In [4]:
DPs = [grid_problem1, grid_problem2, grid_problem3];
DCs = [4, 1, 2];
function SSF(acts::AbstractArray)   # say, we prefer going :up the most, then :down, then :left, lastly :right.
    keys = (:up, :down, :left, :right)
    vals = [1, 2, 3, 4]
    return keys[minimum(map(item -> Dict(keys.=>vals)[item], acts))]
end;

We can instantiate a MODIA object, using one of the two methods below: Either defining a MODIA object, and then initializing the states (using a distribution of initial states) for each MP it contains; or executing both operations simultaneously.

In [5]:
## Method 1 ##
modia = MODIA(DPs, DCs, SSF)
initialize_states!(modia, POMDPs.initialstate);

## Method 2 ##
modia = MODIA(DPs, DCs, SSF, POMDPs.initialstate);

Let's inspect this modia object.

In [6]:
typeof(modia)   # automatically determined that all MPs are MDPs.

MODIA_of_MDPs

In [7]:
actions(modia)  # set of all possible actions in MPs.

Set{NTuple{4,Symbol}} with 1 element:
  (:up, :down, :left, :right)

In [8]:
collect(modia)  # all MPs inside modia.

7-element Array{SimpleGridWorld,1}:
 SimpleGridWorld((4, 4), Dict{StaticArrays.SArray{Tuple{2},Int64,1,2},Float64}([4, 1] => -10.0,[4, 3] => 30.0), Set(StaticArrays.SArray{Tuple{2},Int64,1,2}[[4, 1], [4, 3]]), 0.7, 0.95)
 SimpleGridWorld((4, 4), Dict{StaticArrays.SArray{Tuple{2},Int64,1,2},Float64}([4, 1] => -10.0,[4, 3] => 30.0), Set(StaticArrays.SArray{Tuple{2},Int64,1,2}[[4, 1], [4, 3]]), 0.7, 0.95)
 SimpleGridWorld((4, 4), Dict{StaticArrays.SArray{Tuple{2},Int64,1,2},Float64}([4, 1] => -10.0,[4, 3] => 30.0), Set(StaticArrays.SArray{Tuple{2},Int64,1,2}[[4, 1], [4, 3]]), 0.7, 0.95)
 SimpleGridWorld((4, 4), Dict{StaticArrays.SArray{Tuple{2},Int64,1,2},Float64}([4, 1] => -10.0,[4, 3] => 30.0), Set(StaticArrays.SArray{Tuple{2},Int64,1,2}[[4, 1], [4, 3]]), 0.7, 0.95)
 SimpleGridWorld((4, 4), Dict{StaticArrays.SArray{Tuple{2},Int64,1,2},Float64}([4, 4] => 20.0,[2, 2] => -20.0), Set(StaticArrays.SArray{Tuple{2},Int64,1,2}[[4, 4], [2, 2]]), 0.7, 0.95)
 SimpleGridWorld((4, 4), Dict{StaticArr

In [9]:
collect_states(modia)  # all initial states inside modia.

7-element Array{StaticArrays.SArray{Tuple{2},Int64,1,2},1}:
 [1, 3]
 [3, 3]
 [4, 2]
 [1, 4]
 [2, 2]
 [1, 3]
 [4, 2]

Now, let's choose an offline solver to compute optimal policies to each MP individually.

In [10]:
solver = ValueIterationSolver();
policies = POMDPs.solve(solver, modia);  # includes 7 different policy, for each MP.
act = safest_action(policies, modia)  # safest action, according to our SSF.

:up

In [11]:
POMDPs.action.(policies.policies, modia.states.states)

7-element Array{Symbol,1}:
 :right
 :right
 :up
 :right
 :up
 :up
 :left

We can simulate the MODIA for a desired number of steps, and return the received discounted reward at the horizon. To do so, we use the following inputs:
1. **Simulator properties**: An object to prescribe properties such as RNG, max steps, etc.
2. **The MODIA object**: A MODIA instantiated using DPs, DCs, SSF, and initial states.
3. **Policies (online or offline)**: Policies to compute optimal actions for each MP.
4. **MODIA Modification Function (MMF)**: [Optional] A function that specifies when new DPs or DCs are instantiated or terminated within MODIA. 

In [12]:
sim = POMDPSimulators.RolloutSimulator(max_steps=5)
r_totals = POMDPs.simulate(sim, modia, policies)

7-element Array{Float64,1}:
   0.0
   0.0
   0.0
   0.0
 -20.0
   0.0
   0.0

The returned result above is the total discounted rewards for the specified 5 steps in the horizon. 

We can define our custom MMF using the following functions to add/remove DPs and DCs.

In [13]:
grid_problem4 = SimpleGridWorld(size=(4,4), rewards=Dict(GWPos(1,2)=>-5.0, GWPos(3,4)=>5.0));
push_DP!(modia, grid_problem4);
push_DCs!(modia, 4, 6, POMDPs.initialstate);  # DP: 4, DCs: 6
delete_DPs!(modia, [1,2])
delete_DC!(modia, 1, 1)  # DP: 1, DC: 1

These operations should result in the remainder of one `grid_problem3` and six `grid_problem4` Markov processes in our MODIA object, which we can verify below.

In [14]:
modia.DPs

2-element Array{SimpleGridWorld,1}:
 SimpleGridWorld((4, 4), Dict{StaticArrays.SArray{Tuple{2},Int64,1,2},Float64}([1, 4] => 10.0,[3, 4] => -30.0), Set(StaticArrays.SArray{Tuple{2},Int64,1,2}[[1, 4], [3, 4]]), 0.7, 0.95)
 SimpleGridWorld((4, 4), Dict{StaticArrays.SArray{Tuple{2},Int64,1,2},Float64}([1, 2] => -5.0,[3, 4] => 5.0), Set(StaticArrays.SArray{Tuple{2},Int64,1,2}[[1, 2], [3, 4]]), 0.7, 0.95)

In [15]:
modia.DCs

2-element Array{Int64,1}:
 1
 6