# Introduction

This notebook is meant to walk you through the various components of using the Hybrid Stochastic Planning framework to solve Dynamic Multimodal Stochastic Shortest Path problems (called CMSSPs here). You will observe how to obtain policies for the local layer MDPs, and how to use them in conjunction with the overall HSP solver (HHPC here).

We will use a simple toy domain for illustrative purposes, different from the domain in the paper. There are 4 gridworlds numbered 1 through 4 and connected to each other as follows: 1 -> 2, 1 -> 3, 2 -> 4, 3 ->4. Each gridworld is a 1 x 1 continuous grid (see [ContinuumWorld.jl](https://github.com/JuliaPOMDP/ContinuumWorld.jl) for how it works). The agent begins in grid 1 and must reach any state in grid 4 to succeed.

Each grid world that is the source of a possible switch to another grid has a sub-region where the mode switch is valid. At any timestep, the context comprises a single point in each respective switch region, around which the switch is currently possible, e.g. (1,2) -> [0.34,0.56] (in grid 1); (1,3) -> [0.67,0..25] (in grid 1); (2,3) -> [0.4,0.7] (in grid 2) and so on. The future context is the estimated positions of the switch points over the next few timesteps, up to the context horizon. When the agent attempts a mode switch, it must be close to the valid switch point for that timestep.

## DMSSP Definition

There are several intersecting parts to a DMSSP definition, so, for this tutorial, we have implemented the DMSSP definition for you in [grids_continuum.jl](https://github.com/sisl/CMSSPs.jl/blob/master/src/domains/grids_continuum/grids_continuum.jl). Feel free to review it before proceeding with this notebook, but it won't be necessary. One thing worth noting is the definition of the inter-modal transition rules, through a [Tabular MDP](https://github.com/JuliaPOMDP/POMDPModels.jl/blob/master/src/Tabular.jl) model (see the `get_grids_continuum_mdp()` method).

## Generating Local Layer Policies

With the DMSSP model defined, the next major step is the pre-processing to generate the control policies for the local layer of the planning framework. As mentioned in the README, the general local layer logic is implemented in [local_layer.jl](https://github.com/sisl/CMSSPs.jl/blob/master/src/hhpc/local_layer.jl). This logic works with the domain-specific definitions we have implemented already.
For our toy example, the 4 grid worlds will share the same underlying continuum world dynamics, and so the control policy needs to only be solved for once, after which it can be shared. The next few code snippets will demonstrate how to do so. The code is also available as a [single script](https://github.com/sisl/CMSSPs.jl/blob/master/scripts/grids_continuum/grids_continuum_policies.jl).

In [1]:
## Load a bunch of helper packages needed for the local layer script
using GridInterpolations # For approximating the value function locally
using LocalFunctionApproximation # The wrapper around GridInterpolation needed by the value iteration solver
using StaticArrays
using JLD2, FileIO
using Random
using ContinuumWorld # To define the underlying continuum grid world models
using CMSSPs # Our library

┌ Info: Recompiling stale cache file /Users/shushmanchoudhury/.julia/compiled/v1.0/GridInterpolations/7sRrY.ji for GridInterpolations [bb4c363b-b914-514b-8517-4eb369bc008a]
└ @ Base loading.jl:1184
┌ Info: Recompiling stale cache file /Users/shushmanchoudhury/.julia/compiled/v1.0/LocalFunctionApproximation/X3EwC.ji for LocalFunctionApproximation [db97f5ab-fc25-52dd-a8f9-02a257c35074]
└ @ Base loading.jl:1184
┌ Info: Recompiling stale cache file /Users/shushmanchoudhury/.julia/compiled/v1.0/JLD2/O1EyT.ji for JLD2 [033835bb-8acc-5ee8-8aae-3f567f8a3819]
└ @ Base loading.jl:1184
┌ Info: Recompiling stale cache file /Users/shushmanchoudhury/.julia/compiled/v1.0/ContinuumWorld/bidme.ji for ContinuumWorld [5cbb95a3-277b-5373-895a-7e14bd91b3cc]
└ @ Base loading.jl:1184
┌ Info: Recompiling stale cache file /Users/shushmanchoudhury/.julia/compiled/v1.0/CMSSPs/hr8Mt.ji for CMSSPs [eb9e90fe-355c-11e9-2c04-91cd8710bc84]
└ @ Base loading.jl:1184


In [2]:
## Set up the continuum world after reading in the problem parameters
rng = MersenneTwister(10) # For reproducing

params_fn = "../scripts/grids_continuum/grids-continuum-params1.toml" # File comments describe parameters
params = continuum_parse_params(params_fn) # Create a struct of params that is passed around

# Create the possible control (movement) actions for the agent in the Continuum World
grid_cworld_actions = [Vec2(params.move_amt, 0.0), Vec2(-params.move_amt, 0.0),
                       Vec2(0.0, params.move_amt), Vec2(0.0, -params.move_amt), Vec2(0.0, 0.0)]

# Define the Continuum World that will be the local layer MDP
# No rewards, undiscounted
cworld = CWorld(xlim = (0.0, 1.0), ylim = (0.0, 1.0),
                reward_regions = [], rewards = [], terminal = [],
                stdev = params.move_std, actions = grid_cworld_actions, discount = 1.0)

# Use the same CWorld for all 4 modes
cworlds = Dict(1=>cworld, 2=>cworld, 3=>cworld, 4=>cworld)
params.cworlds = cworlds

# Only need to find policy for one mode, since shared across modes
modal_mdp = GridsContinuumMDPType(1, params, grid_cworld_actions, params.beta, params.horizon_limit);

In [3]:
## Create the interpolator to be used for value function approximation
xy_spacing = polyspace_symmetric(1.0, params.vals_along_axis, 3)
xy_grid = RectangleGrid(xy_spacing, xy_spacing)
lfa = LocalGIFunctionApproximator(xy_grid);

In [4]:
# Compute and update the terminal cost
compute_terminalcost_localapprox!(modal_mdp, lfa)

# Compute policy with localApproxVI
# There will be one iteration for out-of-horizon and one for in-horizon
modal_policy = finite_horizon_VI_localapprox!(modal_mdp, lfa, true,
                                              params.num_generative_samples, rng)

# Compute min_cost_per_horizon
compute_min_value_per_horizon_localapprox!(modal_policy)

# Save horizon policy to file
inhor_fn = "grids-continuum-params1-inhor.jld2"
outhor_fn = "grids-continuum-params1-outhor.jld2"
save_modal_horizon_policy_localapprox(modal_policy, inhor_fn,outhor_fn, modal_mdp)

max_contr_cost = 0.09179802036198285
[Iteration 

┌ Info: Getting max control cost
└ @ CMSSPs.HHPC /Users/shushmanchoudhury/.julia/dev/CMSSPs/src/hhpc/local_layer.jl:188


1   ] residual:       3.15 | iteration runtime:   5462.287 ms, (      5.46 s total)
[Iteration 1   ] residual:       7.46 | iteration runtime:   5460.957 ms, (      5.46 s total)


Now you have two JLD2 files which encode the in-horizon and out-of-horizon policies for the continuum world. These are the results of the local pre-processing step. Now we can actually run the HSP solver on the full problem.

## Running the Solver

We will now briefly demonstrate how to use the HSP solver to solve DMSSP problems. The general domain-agnostic solver logic is defined in [hhpc_framework.jl](https://github.com/sisl/CMSSPs.jl/blob/master/src/hhpc/hhpc_framework.jl); it uses both the global planning layer from [global_layer.jl](https://github.com/sisl/CMSSPs.jl/blob/master/src/hhpc/global_layer.jl) and the policies obtained from local pre-processing. As before, all the code below is availaible in a [single script](https://github.com/sisl/CMSSPs.jl/blob/master/scripts/grids_continuum/grids_continuum_simulator.jl). Whenever the agent makes a successful mode switch to grid 4, it has succeeded and the episode terminates.

In [5]:
rng = MersenneTwister(1234) # Different seed for simulation
const TRIALS = 5 # Total number of episodes to try

# Load the policies
inhor_fn = "grids-continuum-params1-inhor.jld2"
outhor_fn = "grids-continuum-params1-outhor.jld2"
modal_horizon_policy = load_modal_horizon_policy_localapprox(inhor_fn, outhor_fn)

# Create a dictionary of modal MDPs and policies - all share the same
modal_policies = Dict{Int64, ModalFinInfHorPolicy}()
modal_mdps = Dict{Int64, GridsContinuumMDPType}()
for m in CONTINUUM_MODES
    global modal_policies
    global modal_mdps
    modal_policies[m] = ModalFinInfHorPolicy(modal_horizon_policy, nothing)
    modal_mdps[m] = modal_horizon_policy.in_horizon_policy.mdp
end

# Define the params struct and the dmssp problem object
params = modal_horizon_policy.in_horizon_policy.mdp.params
cmssp = create_continuum_cmssp(params, rng)

# Other parameters required by the solver - the replanning period (in terms of timesteps) and the goal modes
deltaT = 5
goal_modes = [CONTINUUM_GOAL_MODE];

In [6]:
# Need to additionally import POMDPs directly to use solve
using POMDPs

# Now run the solver on several different episodes for TRIALS number of times
for trial = 1:TRIALS

    @info "Generating start state and context"
    start_state = generate_start_state(cmssp, rng)
    set_start_context_set!(cmssp, rng)

    @debug start_state

    @info "Creating solver"
    continuum_solver = GridsContinuumSolverType{typeof(rng)}(params.num_bridge_samples,
                                                modal_policies,
                                                deltaT,
                                                goal_modes,
                                                start_state,
                                                zero(ContinuumDummyBookkeeping),
                                                rng)

    solve(continuum_solver, cmssp)
end 

┌ Info: Generating start state and context
└ @ Main In[6]:7
┌ Info: Creating solver
└ @ Main In[6]:13
┌ Info: ("Successful mode switch to grid ", 3)
└ @ CMSSPs.CMSSPDomains /Users/shushmanchoudhury/.julia/dev/CMSSPs/src/domains/grids_continuum/grids_continuum.jl:441
┌ Info: ("Successful mode switch to grid ", 4)
└ @ CMSSPs.CMSSPDomains /Users/shushmanchoudhury/.julia/dev/CMSSPs/src/domains/grids_continuum/grids_continuum.jl:441
┌ Info: Generating start state and context
└ @ Main In[6]:7
┌ Info: Creating solver
└ @ Main In[6]:13
┌ Info: ("Successful mode switch to grid ", 2)
└ @ CMSSPs.CMSSPDomains /Users/shushmanchoudhury/.julia/dev/CMSSPs/src/domains/grids_continuum/grids_continuum.jl:441
┌ Info: ("Mode switch to grid ", 4, "failed ")
└ @ CMSSPs.CMSSPDomains /Users/shushmanchoudhury/.julia/dev/CMSSPs/src/domains/grids_continuum/grids_continuum.jl:445
┌ Info: ("Mode switch to grid ", 4, "failed ")
└ @ CMSSPs.CMSSPDomains /Users/shushmanchoudhury/.julia/dev/CMSSPs/src/domains/grids_cont