# POMDPModia.jl: Getting Started

The POMDPModia.jl package fully integrates with most other packages in the JuliaPOMDP ecosystem, including POMDPs.jl, POMDPPolicies.jl, BeliefUpdaters.jl, POMDPSimulators.jl, online and offline solvers.

## Multiple tigers example

In this notebook, a MODIA will be created for a example that has multiple amounts of different TigerPOMDPs behind doors. The objective is to pick the safest action, w.r.t. the individual beliefs tracked for each POMDP.




In [1]:
include("../src/main.jl")

In [2]:
using POMDPs
using POMDPModels: TigerPOMDP
using QMDP: QMDPSolver
using Random

A MODIA object consists of five components:
1. **Decision Problems (DPs)**: The minimum number of unique POMDPs required to effectively describe the problem at hand.
2. **Decision Components (DCs)**: The amount of each DP.
3. **Safety Sort Function (SSF)**: The function used to sort and pick the "safest" action amonng all the actions suggested by the policies of each individual POMDP.
4. **Markov Processes (MPs)**: The entire stack of MDPs and/or POMDPs kept track, totalling to an amount of sum(DCs).
5. **Beliefs**: Each individual belief kept track for each individual POMDP in MPs. 

In this example, we define three different tiger characteristics as follows.

In [3]:
tiger_problem1 = TigerPOMDP(-1.0, -100.0, 10.0, 0.90, 0.90);
tiger_problem2 = TigerPOMDP(-2.0, -50.0, 17.0, 0.80, 0.60);
tiger_problem3 = TigerPOMDP(-3.0, -75.0, 8.0, 0.85, 0.75);

The first three components of MODIA can then be created:

In [4]:
DPs = [tiger_problem1, tiger_problem2, tiger_problem3];
DCs = [4, 1, 2];
SSF = Base.minimum;

We can instantiate a MODIA object, using one of the two methods below. Notice that we are initializing beliefs of markov processes through the BeliefUpdaters.jl package.

In [5]:
## Method 1 ##
modia = MODIA(DPs, DCs, SSF)
initialize_beliefs!(modia, BeliefUpdaters.uniform_belief);

## Method 2 ##
modia = MODIA(DPs, DCs, SSF, BeliefUpdaters.uniform_belief);

Let's inspect this modia object.

In [6]:
typeof(modia)   # automatically determined that all MPs are POMDPS.

MODIA_of_POMDPs

In [7]:
actions(modia)  # set of all possible actions in MPs.

Set{Int64} with 3 elements:
  2
  0
  1

In [8]:
collect(modia)  # all MPs inside modia.

7-element Array{TigerPOMDP,1}:
 TigerPOMDP(-1.0, -100.0, 10.0, 0.9, 0.9)
 TigerPOMDP(-1.0, -100.0, 10.0, 0.9, 0.9)
 TigerPOMDP(-1.0, -100.0, 10.0, 0.9, 0.9)
 TigerPOMDP(-1.0, -100.0, 10.0, 0.9, 0.9)
 TigerPOMDP(-2.0, -50.0, 17.0, 0.8, 0.6)
 TigerPOMDP(-3.0, -75.0, 8.0, 0.85, 0.75)
 TigerPOMDP(-3.0, -75.0, 8.0, 0.85, 0.75)

In [9]:
collect_beliefs(modia)  # all beliefs inside modia.

7-element Array{DiscreteBelief{TigerPOMDP,Bool},1}:
 DiscreteBelief{TigerPOMDP,Bool}(TigerPOMDP(-1.0, -100.0, 10.0, 0.9, 0.9), Bool[0, 1], [0.5, 0.5])
 DiscreteBelief{TigerPOMDP,Bool}(TigerPOMDP(-1.0, -100.0, 10.0, 0.9, 0.9), Bool[0, 1], [0.5, 0.5])
 DiscreteBelief{TigerPOMDP,Bool}(TigerPOMDP(-1.0, -100.0, 10.0, 0.9, 0.9), Bool[0, 1], [0.5, 0.5])
 DiscreteBelief{TigerPOMDP,Bool}(TigerPOMDP(-1.0, -100.0, 10.0, 0.9, 0.9), Bool[0, 1], [0.5, 0.5])
 DiscreteBelief{TigerPOMDP,Bool}(TigerPOMDP(-2.0, -50.0, 17.0, 0.8, 0.6), Bool[0, 1], [0.5, 0.5])
 DiscreteBelief{TigerPOMDP,Bool}(TigerPOMDP(-3.0, -75.0, 8.0, 0.85, 0.75), Bool[0, 1], [0.5, 0.5])
 DiscreteBelief{TigerPOMDP,Bool}(TigerPOMDP(-3.0, -75.0, 8.0, 0.85, 0.75), Bool[0, 1], [0.5, 0.5])

Now, let's choose an offline solver to compute optimal policies to each MP individually.

In [10]:
solver = QMDPSolver();
policies = POMDPs.solve(solver, modia);  # includes 7 different policy, for each MP.
act = safest_action(policies, modia);  # safest action, according to our SSF.

Let's define belief updaters, and receive and updated belief for some random observations.

In [11]:
bu = BeliefUpdaters.DiscreteUpdater(modia);    # can also use updater(policies) to retrieve an appropiate belief updater.
random_obs = [Bool(rand(0:1)) for _ in modia.markov_prcs]  # create random observations for all POMDPs.
new_belief = POMDPs.update(bu, modia, act, random_obs);   # calculates new belief, but does not update within modia.

We can see that the belief of each POMDP has shifted, according to the random observations we have received:

In [12]:
new_belief.beliefs

7-element Array{DiscreteBelief{TigerPOMDP,Bool},1}:
 DiscreteBelief{TigerPOMDP,Bool}(TigerPOMDP(-1.0, -100.0, 10.0, 0.9, 0.9), Bool[0, 1], [0.09999999999999998, 0.9])
 DiscreteBelief{TigerPOMDP,Bool}(TigerPOMDP(-1.0, -100.0, 10.0, 0.9, 0.9), Bool[0, 1], [0.9, 0.09999999999999998])
 DiscreteBelief{TigerPOMDP,Bool}(TigerPOMDP(-1.0, -100.0, 10.0, 0.9, 0.9), Bool[0, 1], [0.9, 0.09999999999999998])
 DiscreteBelief{TigerPOMDP,Bool}(TigerPOMDP(-1.0, -100.0, 10.0, 0.9, 0.9), Bool[0, 1], [0.9, 0.09999999999999998])
 DiscreteBelief{TigerPOMDP,Bool}(TigerPOMDP(-2.0, -50.0, 17.0, 0.8, 0.6), Bool[0, 1], [0.8, 0.19999999999999996])
 DiscreteBelief{TigerPOMDP,Bool}(TigerPOMDP(-3.0, -75.0, 8.0, 0.85, 0.75), Bool[0, 1], [0.15000000000000002, 0.85])
 DiscreteBelief{TigerPOMDP,Bool}(TigerPOMDP(-3.0, -75.0, 8.0, 0.85, 0.75), Bool[0, 1], [0.85, 0.15000000000000002])

To explicitly update the belief within a MODIA object, we can use the following command.

In [13]:
update!(bu, modia, act, random_obs);   # calculates new belief, and updates within modia
collect_beliefs(modia)

7-element Array{DiscreteBelief{TigerPOMDP,Bool},1}:
 DiscreteBelief{TigerPOMDP,Bool}(TigerPOMDP(-1.0, -100.0, 10.0, 0.9, 0.9), Bool[0, 1], [0.09999999999999998, 0.9])
 DiscreteBelief{TigerPOMDP,Bool}(TigerPOMDP(-1.0, -100.0, 10.0, 0.9, 0.9), Bool[0, 1], [0.9, 0.09999999999999998])
 DiscreteBelief{TigerPOMDP,Bool}(TigerPOMDP(-1.0, -100.0, 10.0, 0.9, 0.9), Bool[0, 1], [0.9, 0.09999999999999998])
 DiscreteBelief{TigerPOMDP,Bool}(TigerPOMDP(-1.0, -100.0, 10.0, 0.9, 0.9), Bool[0, 1], [0.9, 0.09999999999999998])
 DiscreteBelief{TigerPOMDP,Bool}(TigerPOMDP(-2.0, -50.0, 17.0, 0.8, 0.6), Bool[0, 1], [0.8, 0.19999999999999996])
 DiscreteBelief{TigerPOMDP,Bool}(TigerPOMDP(-3.0, -75.0, 8.0, 0.85, 0.75), Bool[0, 1], [0.15000000000000002, 0.85])
 DiscreteBelief{TigerPOMDP,Bool}(TigerPOMDP(-3.0, -75.0, 8.0, 0.85, 0.75), Bool[0, 1], [0.85, 0.15000000000000002])

Instead of keeping track of observations and belief updates by hand, we can simulate the MODIA for a desired number of steps, and received the discounted reward at the horizon. To do so, we use the following inputs:
1. **Simulator properties**: An object to prescribe properties such as RNG, max steps, etc.
2. **The MODIA object**: A MODIA instantiated using DPs, DCs, SSF, and initial beliefs.
3. **Policies (online or offline)**: Policies to compute optimal actions for each MP.
4. **Belief updater**: Belief updater object used to update POMDP beliefs are new observations are received.
5. **Initial belief**: Beliefs to initialize simulation (defaults to modia.beliefs).
6. **Initial states**: States to initialize the simulation.
7. **MODIA Modification Function (MMF)**: [Optional] A function that specifies when new DPs or DCs are instantiated or terminated within MODIA. 

In [14]:
sim = POMDPSimulators.RolloutSimulator(max_steps=5)
initial_beliefs = modia.beliefs
initial_states = convert(Array{Bool},(rand(initialstate(modia))))
r_totals = POMDPs.simulate(sim, modia, policies, bu, initial_beliefs, initial_states)

7-element Array{Float64,1}:
   4.814900000000001
   4.814900000000001
   4.814900000000001
 -84.2851
   2.2288
 -49.65234375
  -2.96484375

The returned result above is the total discounted rewards for the specified 5 steps in the horizon. 

We can define our custom MMF using the following functions to add/remove DPs and DCs.

In [15]:
tiger_problem4 = TigerPOMDP(-4.0, -80.0, 3.0, 0.55, 0.35);   # a new DP.
add_DP!(modia, tiger_problem4);
add_DCs!(modia, 4, 6, BeliefUpdaters.uniform_belief);  # DP: 4, DCs: 6
remove_DPs!(modia, [1,2])
remove_DC!(modia, 1, 1)  # DP: 1, DC: 1

These operations result in the following DPs and DCs in our MODIA object.

In [16]:
modia.DPs

2-element Array{TigerPOMDP,1}:
 TigerPOMDP(-3.0, -75.0, 8.0, 0.85, 0.75)
 TigerPOMDP(-4.0, -80.0, 3.0, 0.55, 0.35)

In [17]:
modia.DCs

2-element Array{Int64,1}:
 1
 6