# POMCP

POMCP is one of the most widely-used online POMDP methods. It is a variant of Monte Carlo tree search in which each node corresponds to an action-observation history.

(To get the required packages for this notebook, make sure to run the `install.jl` script)

In [1]:
using POMDPModels # for the crying baby problem
using BasicPOMCP
using Interact
rng = MersenneTwister(7);

In [2]:
pomdp = BabyPOMDP()
b = initial_state_distribution(pomdp);

Since POMCP is an online method, it does no offline computation before being deployed into the environment. Thus the `solve` function from `POMDPs.jl` does not computation, but instead creates a planner optimized for the problem.

In [3]:
solver = POMCPSolver(c=10.0); # c is the exploration constant
planner = solve(solver, pomdp);

## Building the tree

To make a decision, POMCP constructs a tree by repeating the following process:
1. run a simulation through the tree choosing actions based on the *UCB criterion*
2. when a leaf node is reached, expand the node one level deeper and continue the simulation with a *rollout policy* (often random)
3. propagate the rewards back up the tree to maintain an estimate of the *value*\* for each action

\*This is denoted by *V* in the `BasicPOMCP` package and the original paper that describes POMCP, but it is the same thing as the state-action value typically denoted by *Q*.

In [4]:
N = 10
tree = BasicPOMCP.POMCPTree(pomdp)
trees = [deepcopy(tree)]
for i in 1:N
    s = rand(rng, b)
    BasicPOMCP.simulate(planner, s, BasicPOMCP.POMCPObsNode(tree, 1), 10)
    push!(trees, deepcopy(tree))
end

In [5]:
@manipulate for i in 1:length(trees)
    D3Tree(trees[i], init_expand=100, init_duration=0)
end

## Online operation

When the agent is interacting with the environment, it constructs a new tree at each step, makes a decision. For this demonstation, we'll use 

In [6]:
using POMDPs
using LaserTag
rng = MersenneTwister(14)
laser = gen_lasertag(rng=rng)
s = initial_state(laser, rng)
sp,o,r = generate_sor(laser, s, 1, rng)
HTML("""<div style="width: 400px; margin: 0 auto">
    $(stringmime("image/svg+xml", LaserTagVis(laser, s=sp, o=o)))
        </div>
    """)



In [7]:
using DiscreteValueIteration # value iteration is used instead of rollouts
using POMDPToolbox
using BasicPOMCP
solver = POMCPSolver(tree_queries=1000,
                      c=20.0,
                      max_depth=100,
                      estimate_value=FOValue(ValueIterationSolver()),
                      rng=MersenneTwister(13)
                     )
planner = solve(solver, laser);



In [8]:
tree_recorder = PolicyWrapper(planner, payload=[]) do planner, payload, b
    a = action(planner, b)
    push!(payload, get(planner._tree))
    return a
end

sim = HistoryRecorder(max_steps=100, show_progress=true, rng=MersenneTwister(30))
history = simulate(sim, laser, tree_recorder);

[32mSimulating...100%|██████████████████████████████████████| Time: 0:00:01[39m


In [9]:
using Interact
using D3Trees
value = Interact.value
ltrees = tree_recorder.payload
steps = eachstep(history, "a,r,sp,o,bp")
BasicPOMCP.node_tag(a::Int) = LaserTag.ACTION_NAMES[a] 
@manipulate for i in 1:length(trees)
    ltree = stringmime("text/html", D3Tree(ltrees[i],
                                          init_duration=0,
                                          init_expand=1,
                                          svg_height=300))
    pic = stringmime("image/svg+xml", LaserTagVis(laser, steps[i]...))
    HTML("""<div style="width: 300px; margin: 0 auto">$pic</div><div>$ltree</div>""")
end

