# Tutorial: Data-Driven Proposals in Gen

This tutorial introduces you to an important inference programming feature in Gen --- using custom "data-driven" proposals to accelerate Monte Carlo inference. Data-driven proposals use information in the observed data set to choose the proposal distibution for latent variables in a generative model. Data-driven proposals can have trainable parameters that are trained data simulated from a generative model to improve their efficacy. This training process is sometimes called 'amortized inference' or 'inference compilation'.

We focus on using data-dtiven proposals with importance sampling, which is one of the simpler classes of Monte Carlo inference algorithms. Data-driven proposals can also be used with Markov Chain Monte Carlo (MCMC) and sequential Monte Carlo (SMC), but these are not covered in this tutorial.

This tutorial builds on a probabilistic model for the motion of an autonomous agent that was introduced in the [introduction to modeling tutorial](https://github.com/probcomp/gen-examples/tree/master/tutorial-modeling-intro). We show that we can improve the efficiency of inference in this model using two types of custom proposals for the destination of the agent: First, we write a hand-coded data-driven proposal with a single parameter that we tune using amortized inference. Second, we write a data-driven proposal based on a deep neural network, which we also train using amortized inference. We show an implementation of the neural-network based proposal using the built-in modeling language, and then an implementation using the TensorFlow modeling DSL.

## Outline

**Section 1.** [Recap on the generative model of an autonomous agent](#model-recap)

**Section 2.** [Writing a data-driven proposal as a generative function](#custom-proposal)

**Section 3.** [Using a data-driven proposal within importance sampling](#custom-proposal)

**Section 3.** [Training the parameters of a data-driven proposal](#custom-proposal)

**Section 3.** [Writing and training a deep-learning based data-driven proposal](#deep)

**Section 3.** [Writing a data-driven proposal that uses TensorFlow](#deep)

In [1]:
using Gen
using GenViz
using PyPlot
using JLD

┌ Info: Recompiling stale cache file /home/marcoct/.julia/compiled/v1.0/Gen/OEZG1.ji for Gen [ea4f424c-a589-11e8-07c0-fd5c91b9da4a]
└ @ Base loading.jl:1190


In [2]:
viz_server = VizServer(8002)
sleep(1)

## 1: Recap on the generative model of an autonomous agent   <a name="model-recap"></a>

We begin by loading the source libraries for the generative model of an autonomous agent that was introduced in a previous tutorial:

In [3]:
include("../inverse-planning/geometric_primitives.jl");
include("../inverse-planning/scene.jl");
include("../inverse-planning/planning.jl");

We redefine the generative model:

In [4]:
@gen function agent_model(scene::Scene, dt::Float64, num_ticks::Int, planner_params::PlannerParams)

    # sample the start point of the agent from the prior
    start_x = @addr(uniform(0, 1), :start_x)
    start_y = @addr(uniform(0, 1), :start_y)
    start = Point(start_x, start_y)

    # sample the destination point of the agent from the prior
    dest_x = @addr(uniform(0, 1), :dest_x)
    dest_y = @addr(uniform(0, 1), :dest_y)
    dest = Point(dest_x, dest_y)

    # plan a path that avoids obstacles in the scene
    maybe_path = plan_path(start, dest, scene, planner_params)
    planning_failed = maybe_path == nothing
    
    # sample the speed from the prior
    speed = @addr(uniform(0.3, 1), :speed)

    if planning_failed
        
        # path planning failed, assume the agent stays as the start location indefinitely
        locations = fill(start, num_ticks)
    else
        
        # path planning succeeded, move along the path at constant speed
        locations = walk_path(maybe_path, speed, dt, num_ticks)
    end

    # generate noisy measurements of the agent's location at each time point
    noise = 0.01
    for (i, point) in enumerate(locations)
        x = @addr(normal(point.x, noise), :meas => (i, :x))
        y = @addr(normal(point.y, noise), :meas => (i, :y))
    end

    return (planning_failed, maybe_path)
end;

And we redefine a function that converts a trace of this model into a value that is easily serializable to JSON, for use with the GenViz visualization framework:

In [5]:
function trace_to_dict(trace)
    args = Gen.get_args(trace)
    (scene, dt, num_ticks, planner_params) = args
    choices = Gen.get_assmt(trace)
    (planning_failed, maybe_path) = Gen.get_retval(trace)

    d = Dict()

    # scene (the obstacles)
    d["scene"] = scene

    # the points along the planned path
    if planning_failed
        d["path"] = []
    else
        d["path"] = maybe_path.points
    end

    # start and destination location
    d["start"] = Point(choices[:start_x], choices[:start_y])
    d["dest"] = Point(choices[:dest_x], choices[:dest_y])

    # the observed location of the agent over time
    local measurements
    measurements = Vector{Point}(undef, num_ticks)
    for i=1:num_ticks
        measurements[i] = Point(choices[:meas => (i, :x)], choices[:meas => (i, :y)])
    end
    d["measurements"] = measurements

    return d
end;

We redefine the generic importance sampling algorithm that we used in the previous notebook:

In [6]:
function do_inference_agent_model(scene::Scene, dt::Float64, num_ticks::Int, planner_params::PlannerParams, start::Point,
                                  measurements::Vector{Point}, amount_of_computation::Int)
    
    # Create an "Assignment" that maps model addresses (:y, i)
    # to observed values ys[i]. We leave :slope and :intercept
    # unconstrained, because we want them to be inferred.
    observations = Gen.DynamicAssignment()
    observations[:start_x] = start.x
    observations[:start_y] = start.y
    for (i, m) in enumerate(measurements)
        observations[:meas => (i, :x)] = m.x
        observations[:meas => (i, :y)] = m.y
    end
    
    # Call importance_resampling to obtain a likely trace consistent
    # with our observations.
    (trace, _) = Gen.importance_resampling(agent_model, (scene, dt, num_ticks, planner_params), observations, amount_of_computation)
    
    return trace
end;

We redefine a scene:

In [7]:
scene = Scene(0, 1, 0, 1)
add_obstacle!(scene, make_square(Point(0.30, 0.20), 0.1))
add_obstacle!(scene, make_square(Point(0.83, 0.80), 0.1))
add_obstacle!(scene, make_square(Point(0.80, 0.40), 0.1))
horizontal = false
vertical = true
wall_thickness = 0.02
add_obstacle!(scene, make_line(horizontal, Point(0.20, 0.40), 0.40, wall_thickness))
add_obstacle!(scene, make_line(vertical, Point(0.60, 0.40), 0.40, wall_thickness))
add_obstacle!(scene, make_line(horizontal, Point(0.60 - 0.15, 0.80), 0.15 + wall_thickness, wall_thickness))
add_obstacle!(scene, make_line(horizontal, Point(0.20, 0.80), 0.15, wall_thickness))
add_obstacle!(scene, make_line(vertical, Point(0.20, 0.40), 0.40, wall_thickness));

We will assume the agent starts in the lower left-hand corner. And we will assume particular parameters for the planning algorithm that the agent uses. We will also assume that there are 10 measurements, separated `0.1` time units.

In [8]:
start = Point(0.1, 0.1)
dt = 0.1
num_ticks = 10
planner_params = PlannerParams(300, 3.0, 2000, 1.);

We will infer the destination of the agent for the given sequence of observed locations:

In [9]:
measurements = [
    Point(0.0980245, 0.104775),
    Point(0.113734, 0.150773),
    Point(0.100412, 0.195499),
    Point(0.114794, 0.237386),
    Point(0.0957668, 0.277711),
    Point(0.140181, 0.31304),
    Point(0.124384, 0.356242),
    Point(0.122272, 0.414463),
    Point(0.124597, 0.462056),
    Point(0.126227, 0.498338)];

Below, we run this algorithm 1000 times, to generate 1000 approximate samples from the posterior distribution on the destination. The inferred destinations should appear as red dots on the map.

In [10]:
info = Dict("measurements" => measurements, "scene" => scene, "start" => start)
viz = Viz(viz_server, joinpath(@__DIR__, "../inverse-planning/overlay-viz/dist"), info)
openInNotebook(viz)
sleep(5)
for i=1:1000
    trace = do_inference_agent_model(scene, dt, num_ticks, planner_params, start, measurements, 50)
    putTrace!(viz, i, trace_to_dict(trace))
end
displayInNotebook(viz)

## 2. Writing a data-driven proposal as a generative function <a name="custom-proposal"></a>

The inference algorithm above used a variant of [`Gen.importance_resampling`](https://probcomp.github.io/Gen/dev/ref/inference/#Gen.importance_resampling) that does not take a custom proposal distribution. It uses the default proposal distribution associated with the generative model. For generative functions defined using the built-in modeling DSL, the default proposal distribution is based on *ancestral sampling*, which involves sampling unconstrained random choices from the distributions specified in the generative mode.

We can sample from the default proposal distribution using `Gen.initialize`. The cell below shows samples of the destination of the agent sampled from this distribution.

In [145]:
info = Dict("measurements" => measurements, "scene" => scene, "start" => start)
viz = Viz(viz_server, joinpath(@__DIR__, "../inverse-planning/overlay-viz/dist"), info)
for i=1:1000
    (trace, _) = Gen.initialize(agent_model, (scene, dt, num_ticks, planner_params))
    putTrace!(viz, i, trace_to_dict(trace))
end
displayInNotebook(viz)

Intuitively, if we see the data set (blue dot representing the start location and black dots representing the observed location of the agent over time), we might guess that the agent is more likely to be heading into the upper part of the scene., because we don't expect the agent to unecessarily backtrack. A simple heuristic based on just the first measurement and the last measurement might be:

- If the x-coordinate of the last measurement is greater than the x-coordinate of the first measurement, make it more likely to propose values for `:dest_x` that are near, or greater than, the x-coordinate of the last measurement. We think the agent is probably headed generally to the right.

- If the x-coordinate of the last measurment is less than the x-coordinate of the first measurement, make it more likely to propose values for `:dest_x` that are near, or small than, the x-coordinate of the last measurement. We think the agent is probably headed generally to the left.

We can apply the same heuristic separately for the y-coordinate.

To implement this idea, we discretize the x-axis and y-axis of the scene into bins:

In [146]:
num_x_bins = 5
num_y_bins = 5;

5

We will then sample the x-coordinate of the destination from a [piecewise_uniform](https://probcomp.github.io/Gen/dev/ref/distributions/#Gen.piecewise_uniform) distribution, where we set higher probability for certain bins based on the heuristic described above and use a uniform distribution within a bin. The `compute_bin_probs` function below computes the probability for each bin. The bounds of the scene are given by `min` and `max`. The coordinate of the last measured point are given by `first` and `last`. We compute the probability by assigning a "score" to each bin based on the heuristic above --- if the bin should receive lower probability, it gets a score of 1., and if it should receive higher probability, it gets a bin of `score_high`.

In [151]:
function compute_bin_prob(first::Float64, last::Float64, bin::Int, last_bin::Int, score_high)
    last >= first && bin >= last_bin && return score_high
    last < first && bin <= last_bin && return score_high
    return 1.
end

function compute_bin_probs(num_bins::Int, min::Float64, max::Float64, first::Float64, last::Float64, score_high)
    bin_len = (max - min) / num_bins
    last_bin = Int(floor(last / bin_len)) + 1
    probs = [compute_bin_prob(first, last, bin, last_bin, score_high) for bin=1:num_bins]
    total = sum(probs)
    return [p / total for p in probs]
end;

We will see how to automatically tune the value of `score_high` shortly. For now, we will use a value of 5. Below, we see that for the data set of measurements, shown above the probabilities of higher bins are indeed 3x larger than those of lower bins, becuase the agent seems to be headed up. 

In [191]:
compute_bin_probs(num_y_bins, scene.ymin, scene.ymax, measurements[1].y, measurements[end].y, 5.)

5-element Array{Float64,1}:
 0.058823529411764705
 0.058823529411764705
 0.29411764705882354 
 0.29411764705882354 
 0.29411764705882354 

Below, we write a generative function that applies this heuristic for both the x-coordinate and y-coordinate, and samples the destination coordinates at addresses `:dest_x` and `:dest_y`. 

In [185]:
@gen function custom_dest_proposal(measurements::Vector{Point}, scene::Scene)

    score_high = 5.
    
    x_first = measurements[1].x
    x_last = measurements[end].x
    y_first = measurements[1].y
    y_last = measurements[end].y
    
    # sample dest_x
    x_probs = compute_bin_probs(num_x_bins, scene.xmin, scene.xmax, x_first, x_last, score_high)
    x_bounds = collect(range(scene.xmin, stop=scene.xmax, length=num_x_bins+1))
    @addr(Gen.piecewise_uniform(x_bounds, x_probs), :dest_x)
    
    # sample dest_y
    y_probs = compute_bin_probs(num_y_bins, scene.ymin, scene.ymax, y_first, y_last, score_high)
    y_bounds = collect(range(scene.ymin, stop=scene.ymax, length=num_y_bins+1))
    @addr(Gen.piecewise_uniform(y_bounds, y_probs), :dest_y)
    
    return nothing
end;

We can propose values of random choices from the proposal function using [`Gen.propose`](https://probcomp.github.io/Gen/dev/ref/gfi/#Gen.propose). This method returns the assignment of choices, as well as some other information. For you, you can think of `Gen.propose` as similar to `Gen.initialize` except that it does not produce a full execution trace, and it does not accept constraints. We can see the random choices for one run of the proposal on our data set:

In [186]:
(proposed_choices, _) = Gen.propose(custom_dest_proposal, (measurements, scene))
println(proposed_choices)

│
├── :dest_y : 0.937521258008105
│
└── :dest_x : 0.08699521319751274



Below, we run the proposal 1000 times. For each run, we generate a trace of the model where the `:dest_x` and `:dest_y` choices are constrained to the proposed values, and visualize the resulting traces:

In [187]:
info = Dict("measurements" => measurements, "scene" => scene, "start" => start)
viz = Viz(viz_server, joinpath(@__DIR__, "../inverse-planning/overlay-viz/dist"), info)
for i=1:1000
    (proposed_choices, _) = Gen.propose(custom_dest_proposal, (measurements, scene))
    (trace, _) = Gen.initialize(agent_model, (scene, dt, num_ticks, planner_params), proposed_choices)
    putTrace!(viz, i, trace_to_dict(trace))
end
displayInNotebook(viz)

We see that the proposal distribution reflects our aim --- destinations are more likely to be proposed in the top half of the scene.

## 3. Using a data-driven proposal within importance sampling

We now use our data-driven proposal within an inference algorith. There is a second variant of [Gen.importance_resampling](https://probcomp.github.io/Gen/dev/ref/inference/#Gen.importance_resampling) that accepts a generative function representing a custm proposal. The proposal generative function may make traced random choices at the addresses of a subset of the unobserved random choices made by the generative model. In our case, these addresses with by `:dest_x` and `:dest_y`. Below, we write an inference program that uses this second variant of importance resampling. Because we will experiment with different data-driven proposals, we make the the proposal into an agument of our inference program. We assume that the proposal accepts arguments `(measurements, scene)`.

In [207]:
function do_inference_agent_model_data_driven(dest_proposal::GenerativeFunction,
                                              scene::Scene, dt::Float64,
                                              num_ticks::Int,planner_params::PlannerParams,
                                              start::Point, measurements::Vector{Point}, 
                                              amount_of_computation::Int)
    
    observations = Gen.DynamicAssignment((:start_x, start.x), (:start_y, start.y))
    for (i, m) in enumerate(measurements)
        observations[:meas => (i, :x)] = m.x
        observations[:meas => (i, :y)] = m.y
    end
    
    # invoke the variant of importance_resampling that accepts a custom proposal 
    (trace, _) = Gen.importance_resampling(agent_model, (scene, dt, num_ticks, planner_params), observations, 
        dest_proposal, (measurements, scene), amount_of_computation)
    
    return trace
end;

We run the algorithm below with amount of computation set fo `5` and visualize the results:

In [208]:
info = Dict("measurements" => measurements, "scene" => scene, "start" => start)
viz = Viz(viz_server, joinpath(@__DIR__, "../inverse-planning/overlay-viz/dist"), info)
openInNotebook(viz)
sleep(5)
for i=1:1000
    trace = do_inference_agent_model_data_driven(custom_dest_proposal, 
        scene, dt, num_ticks, planner_params, start, measurements, 5)
    putTrace!(viz, i, trace_to_dict(trace))
end
displayInNotebook(viz)

We compare this to the original algorithm that used the default proposal, for the same "amount of computation".

In [190]:
info = Dict("measurements" => measurements, "scene" => scene, "start" => start)
viz = Viz(viz_server, joinpath(@__DIR__, "../inverse-planning/overlay-viz/dist"), info)
openInNotebook(viz)
sleep(5)
for i=1:1000
    trace = do_inference_agent_model(scene, dt, num_ticks, planner_params, start, measurements, 5)
    putTrace!(viz, i, trace_to_dict(trace))
end
displayInNotebook(viz)

We see that the results are somewhat more accurate using the data-driven proposal.

## 4. Training the parameters of a data-driven proposal

Our choice of the `score_high` value was somewhat arbitrary. To use more informed value, we can make `score_high` into a [*trainable parameter*](https://probcomp.github.io/Gen/dev/ref/modeling/#Trainable-parameters-1) of the generative function. Below, we write a new version of the proposal function that is trainable. The optimization algorithms we will use for training work best with *unconstrained* parameters, but `score_high` must be positive. Therefore, we introduce an unconstrained trainable parameter `log_score_high`, and use `exp()` to ensure that `score_high` is positive:

In [201]:
@gen function custom_dest_proposal_trainable(measurements::Vector{Point}, scene::Scene)

    @param log_score_high::Float64
    
    x_first = measurements[1].x
    x_last = measurements[end].x
    y_first = measurements[1].y
    y_last = measurements[end].y
    
    # sample dest_x
    x_probs = compute_bin_probs(num_x_bins, scene.xmin, scene.xmax, x_first, x_last, exp(log_score_high))
    x_bounds = collect(range(scene.xmin, stop=scene.xmax, length=num_x_bins+1))
    @addr(Gen.piecewise_uniform(x_bounds, x_probs), :dest_x)
    
    # sample dest_y
    y_probs = compute_bin_probs(num_y_bins, scene.ymin, scene.ymax, y_first, y_last, exp(log_score_high))
    y_bounds = collect(range(scene.ymin, stop=scene.ymax, length=num_y_bins+1))
    @addr(Gen.piecewise_uniform(y_bounds, y_probs), :dest_y)
    
    return nothing
end;

We initialize the value of `score_high` to 1.

In [202]:
Gen.init_param!(custom_dest_proposal_trainable, :log_score_high, 0.);

Let's visualize the proposed distribution prior to training. This should be a uniform distribution on the scene.

In [198]:
info = Dict("measurements" => measurements, "scene" => scene, "start" => start)
viz = Viz(viz_server, joinpath(@__DIR__, "../inverse-planning/overlay-viz/dist"), info)
for i=1:1000
    (proposed_choices, _) = Gen.propose(custom_dest_proposal_trainable, (measurements, scene))
    (trace, _) = Gen.initialize(agent_model, (scene, dt, num_ticks, planner_params), proposed_choices)
    putTrace!(viz, i, trace_to_dict(trace))
end
displayInNotebook(viz)

Now, we train the generative function. First, we will require a data-generator that generates the training data. The data-generator is a function of no arguments that returns a tuple of `(inputs, constraints)`. The `inputs` are the arguments to the generative function being trained, and the `constraints` contains the desired values of random choices made by the function for those arguments. For the training distribution, we will use the distribution induced by the generative model (`agent_model`), restricted to cases where planning actually succeeded. When planning failed, the agent just stays at the same location for all time, and we won't worry about tuning our proposal for that case. The training procedure will attempt to maximize the expected conditional log probablity (density) that the proposal function generates the constrained values, when run on the arguments. Note that this is an *average case* objective function --- the resulting proposal distribution may perform better on some data sets than others.

In [199]:
function data_generator()
    
    local assmt
    local measurements
    
    # obtain an execution of the model where planning succeeded
    done = false
    while !done
        (assmt, _, retval) = Gen.propose(agent_model, (scene, dt, num_ticks, planner_params))
        (planning_failed, maybe_path) = retval
        done = !planning_failed
    end

    # construct arguments to the proposal function being trained
    measurements = [Point(assmt[:meas => (i, :x)], assmt[:meas => (i, :y)]) for i=1:num_ticks]
    inputs = (measurements, scene)
    
    # construct constraints for the proposal function being trained
    constraints = Gen.DynamicAssignment()
    constraints[:dest_x] = assmt[:dest_x]
    constraints[:dest_y] = assmt[:dest_y]
    
    return (inputs, constraints)
end;

data_generator (generic function with 1 method)

Finally, we use the [`Gen.train!`](https://probcomp.github.io/Gen/dev/ref/inference/#Gen.train!) method to actually do the training. We will use gradient descent with a fixed step size:

In [204]:
update = Gen.ParamUpdate(Gen.FixedStepGradientDescent(0.001), custom_dest_proposal_trainable)
Gen.train!(custom_dest_proposal, data_generator, update, 1000, 100, 1, 100, verbose=false)

UndefVarError: UndefVarError: FixedStepGradientDescent not defined

We can read out the new value for `score_high`:

In [205]:
println(exp(Gen.get_param(custom_dest_proposal_trainable, :log_score_high)))

1.0


We see that the optimal value of the parameter is indeed larger than our initial guess. This validates that the heuristic is indeed a useful one. We visualize the proposal distribution below:

In [None]:
info = Dict("measurements" => measurements, "scene" => scene, "start" => start)
viz = Viz(viz_server, joinpath(@__DIR__, "../inverse-planning/overlay-viz/dist"), info)
for i=1:1000
    (proposed_choices, _) = Gen.propose(custom_dest_proposal_trainable, (measurements, scene))
    (trace, _) = Gen.initialize(agent_model, (scene, dt, num_ticks, planner_params), proposed_choices)
    putTrace!(viz, i, trace_to_dict(trace))
end
displayInNotebook(viz)

We can visualize the results of inference, using this new proposal:

In [None]:
info = Dict("measurements" => measurements, "scene" => scene, "start" => start)
viz = Viz(viz_server, joinpath(@__DIR__, "../inverse-planning/overlay-viz/dist"), info)
openInNotebook(viz)
sleep(5)
for i=1:1000
    trace = do_inference_agent_model_data_driven(custom_dest_proposal_trainable,
        scene, dt, num_ticks, planner_params, start, measurements, 5)
    putTrace!(viz, i, trace_to_dict(trace))
end
displayInNotebook(viz)

------------
### Exercise

Can you devise a data-driven proposal for the speed of the agent? Do you expect it to work equally well on all data sets? You do not need to implement it.

----------------

## 4. Writing and training a deep learning based data-driven proposal

The heuristic data-driven proposal above gave some improvement in efficiency, but it was extremely simple. One way of constructing complex data-driven proposals is to use deep learning.

In [None]:
sigmoid(x) = 1 ./ (1 .+ exp.(-x))

In [134]:
function make_proposal_tiles(proposal, measurements::Vector{Point}, scene::Scene, num_x_bins::Int, num_y_bins::Int)
    dest_proposal_tiles = []
    xspan = scene.xmax - scene.xmin
    yspan = scene.ymax - scene.ymin
    w = xspan / num_x_bins
    h = yspan / num_y_bins
    for col=1:num_x_bins
        for row=1:num_y_bins
            x = scene.xmin + (col - 1) * w
            y = scene.ymin + (row - 1) * h
            assmt = DynamicAssignment()
            assmt[:dest_x] = x + (w/2)
            assmt[:dest_y] = y + (h/2)
            (weight, _) = assess(proposal, (measurements, scene), assmt)
            tile = Dict("x" => x, "y" => y, "w" => w, "h" => h, "density" => exp(weight))
            push!(dest_proposal_tiles, tile)
        end
    end
    max_density = maximum([tile["density"] for tile in dest_proposal_tiles])
    for tile in dest_proposal_tiles
        tile["density"] /= max_density
    end
    dest_proposal_tiles
end;

In [135]:
tiles = make_proposal_tiles(custom_dest_proposal, measurements, scene, 5, 5);


In [136]:
tiles = make_proposal_tiles(custom_dest_proposal, measurements, scene, 5, 5)
info = Dict("measurements" => measurements, "scene" => scene, "start" => start, "tiles" => tiles)
viz = Viz(viz_server, joinpath(@__DIR__, "../inverse-planning/overlay-viz/dist"), info)
displayInNotebook(viz)

In [137]:
measurements2 = [Point(m.y, m.x) for m in measurements]
tiles = make_proposal_tiles(custom_dest_proposal, measurements2, scene, 5, 5)
info = Dict("measurements" => measurements2, "scene" => scene, "start" => start, "tiles" => tiles)
viz = Viz(viz_server, joinpath(@__DIR__, "../inverse-planning/overlay-viz/dist"), info)
displayInNotebook(viz)

In [138]:
measurements2 = [measurements[1] for _=1:length(measurements)]
tiles = make_proposal_tiles(custom_dest_proposal, measurements2, scene, 5, 5)
info = Dict("measurements" => measurements2, "scene" => scene, "start" => start, "tiles" => tiles)
viz = Viz(viz_server, joinpath(@__DIR__, "../inverse-planning/overlay-viz/dist"), info)
displayInNotebook(viz)

In [139]:
function data_generator()
    
    local assmt
    local measurements
    
    # obtain a trace of the model where planning succeeded
    done = false
    while !done
        (assmt, _, retval) = Gen.propose(agent_model, (scene, dt, num_ticks, planner_params))
        (planning_failed, maybe_path) = retval
        done = !planning_failed
    end

    measurements = [Point(assmt[:meas => (i, :x)], assmt[:meas => (i, :y)]) for i=1:num_ticks]
    inputs = (measurements, scene)
    
    constraints = Gen.DynamicAssignment()
    constraints[:dest_x] = assmt[:dest_x]
    constraints[:dest_y] = assmt[:dest_y]
    
    (inputs, constraints)
end

data_generator (generic function with 1 method)

In [140]:
import ReverseDiff
ReverseDiff.increment_deriv!(::Float64, ::Float64) = error("not implemented")

In [141]:
update = ParamUpdate(GradientDescent(0.001, 1000000), custom_dest_proposal)
train!(custom_dest_proposal, data_generator, update, 20, 10000, 100, 100; verbose=true)

generating data for epoch 1
training for epoch 1...
epoch 1 avg score: 0.7524821076503807
generating data for epoch 2
training for epoch 2...
epoch 2 avg score: 0.7656684435513987
generating data for epoch 3
training for epoch 3...
epoch 3 avg score: 0.7297969234117233
generating data for epoch 4
training for epoch 4...
epoch 4 avg score: 0.7434293979468981
generating data for epoch 5
training for epoch 5...
epoch 5 avg score: 0.7395748195271813
generating data for epoch 6
training for epoch 6...
epoch 6 avg score: 0.745246336642638
generating data for epoch 7
training for epoch 7...
epoch 7 avg score: 0.7407590630112608
generating data for epoch 8
training for epoch 8...
epoch 8 avg score: 0.7378088727951994
generating data for epoch 9
training for epoch 9...
epoch 9 avg score: 0.7129221875639568
generating data for epoch 10
training for epoch 10...
epoch 10 avg score: 0.759224083273339
generating data for epoch 11
training for epoch 11...
epoch 11 avg score: 0.763548033758274
generat

20-element Array{Float64,1}:
 0.7524821076503807
 0.7656684435513987
 0.7297969234117233
 0.7434293979468981
 0.7395748195271813
 0.745246336642638 
 0.7407590630112608
 0.7378088727951994
 0.7129221875639568
 0.759224083273339 
 0.763548033758274 
 0.7498511210306563
 0.7230126187128172
 0.7356012690142822
 0.7431949778916416
 0.7463259009436769
 0.7221471382887077
 0.7292686673690996
 0.7283566568652329
 0.7501743546233584

In [58]:
println(get_param(custom_dest_proposal, :score_high))

2.2155604422716375


We now illustrate this for the `speed` random choice.

Let's first generate some traces from an approximation to the posterior distribution for the data set we defined above, by running our inference algorithm.

In [None]:
posterior_traces = []
for i=1:250
    trace = do_inference_agent_model(scene, dt, num_ticks, planner_params, start, measurements, 50)
    push!(posterior_traces, trace)
end

We also generate some traces from the prior distribution:

In [None]:
num_ticks = 30
prior_traces = []
for i=1:250
    (trace, _) = Gen.initialize(agent_model, (scene, dt, num_ticks, planner_params))
    push!(prior_traces, trace)
end

We then extract the value of the `:speed` random choice from each trace, and compare the two distributions.

In [None]:
figure(figsize=(12, 4))

subplot(2, 1, 1)
speeds = [let assmt = Gen.get_assmt(t); assmt[:speed] end for t in prior_traces];
scatter(speeds, randn(length(prior_traces)))
gca()[:set_xlim](0, 1)
title("Speed samples from the prior distribution")

subplot(2, 1, 2)
speeds = [let assmt = Gen.get_assmt(t); assmt[:speed] end for t in posterior_traces];
scatter(speeds, randn(length(posterior_traces)))
gca()[:set_xlim](0, 1)
title("Speed samples from the approximate posterior distribution")

tight_layout()

It looks like the posterior distribution on the speed is fairly concentrated in the range 0.4-0.5. We could make our importance sampling algorithm more efficient if we could use a proposal that is closer to this approximate posterior by sampling more frequently from this region.

Given the measurements, it seems intuitive that we should be able to come up with a good heuristic estimate of the speed, using the distance between consecutive measurements. We ignore successive measurements that are too close together, because these could be sampled from the part of the trajectory after the agent has already reached its destination and stopped. Below we write a function that computes a guess for the speed by taking the median of distance between consecutive data points. This heuristic is based on the assumption that the majority of the agent's paths will be composed of straight-line segments. However, this heuristic estimate is unlikely to be accurate if the path planner failed or if the true speed is small, because in these cases the estimate will be completely or partially dominated by the contributions from the measurement noise. Therefore, we only use consecutive distance measurements if they are larger than the measurement noise (threshold `0.03`). We return the guess, and the number of distance measurements used to make the guess.

In [None]:
using Statistics: median

function guess_speed(measurements::Vector{Point}, dt::Float64)
    n = length(measurements)
    dists = Vector{Float64}()
    num_used = 0
    for i=1:n-1
        d = dist(measurements[i], measurements[i+1])
        d_start = dist(measurements[1], measurements[i+1])
        d_end = dist(measurements[end], measurements[i+1])
        if d_start > 0.04 && d_end > 0.04
            push!(dists, d)
            num_used += 1
        end
    end
    if num_used == 0
        guess = NaN
    else
        guess = median(dists) / dt
    end
    (guess, num_used)
end;

Below, we define the generative function for the proposal. It takes the measurements, and the time step, and samples a random choice with address `:speed`. If not enough sufficiently large consecutive distance measurements were used to form the guess, we revert to a uniform distribution on the possible values of speed, which is the same distribution used in the default proposal. If there were enough, then we sample from a mixture of a Beta distribution with a mode at guess, and a uniform distribution on the possible values of speed.

In [None]:
@gen function propose_speed(measurements::Vector{Point}, dt::Float64)
    (guess, num_used) = guess_speed(measurements, dt)
    if num_used > 4
        N = 70
        alpha = ((guess - 0.1)/(1.0 - 0.1)) * (N - 2) + 1
        beta = N - alpha
        speed = @addr(beta_uniform(0.9, alpha, beta, 0.1, 1.0), :speed)
    else
        speed = @addr(uniform_continuous(0.1, 1), :speed)
    end
    return speed
end;

Below, we sample from the custom propsal, and find that it is indeed more concentrated around the posterior distibution for this particular data set:

In [None]:
figure(figsize=(12, 2))

speeds = [propose_speed(measurements, dt) for _=1:length(posterior_traces)];
scatter(speeds, randn(length(posterior_traces)))
gca()[:set_xlim](0, 1)
title("Speed samples from the custom proposal")

tight_layout()

We chose to mix the Beta distribution with a uniform distribution because we are cannot be certain that the posterior distribution is really concentrated near the mode of our Beta distribution -- it is still possible that our heuristic estimator will find the wrong mode and that our filter will fail to revert to the uniform distribution. If our proposal is too overconfident, our importance sampling algorithm will still converge to the posterior distribution (provided it still has the same support as the posterior), but it will converge very slowly. It is better to waste some percentage of the proposal runs (e.g. `10%` in this case) than to risk very slow convergence for certain data sets.

If you are interested, see [1] for recent work characterizing how the efficiency of importance sampling depends on the relationship between the proposal and the posterior. One conclusion of this paper, which corroborates experience in importance sampling practice, is that the Kullback-Leibler (KL) divergence from the posterior distribution to the proposal distribution modulates the efficiency of importance sampling. This direction of KL divergence is very large when when the proposal places low probability on a a mode that is present in the posterior distribution. Therefore, we design our proposal to reserve some probability mass that is distributed more evenly.

[1] Chatterjee, Sourav, and Persi Diaconis. ["The sample size required in importance sampling."](https://arxiv.org/pdf/1511.01437.pdf) The Annals of Applied Probability 28.2 (2018): 1099-1135,

Now, let us understand the behavior of the custom proposal on various types of data sets. Let's begin by visualizing somes of the traces sampled from the prior:

In [None]:
viz = Viz(viz_server, joinpath(@__DIR__, "../inverse-planning/grid-viz/dist"), [])
for (i, trace) in enumerate(prior_traces[1:12])
    putTrace!(viz, i, trace_to_dict(trace))
end
displayInNotebook(viz)

Below, for each of these traces, we plot the true speed (dashed blue line), heuristic estimate (red line), and the resulting custom proposal distribution (orange line). We use the method [`Gen.assess`](https://probcomp.github.io/Gen/dev/ref/gfi/#Gen.assess) to obtain the log probability density of the custom proposal for different speed values.

In [None]:
figure(figsize=(10, 4))

for (i, trace) in enumerate(prior_traces[1:12])
    
    local measurements
    
    # construct measurements vector by pulling values from the trace
    choices = get_assmt(trace)
    measurements = Vector{Point}(undef, num_ticks)
    for i=1:num_ticks
        measurements[i] = Point(choices[:meas => (i, :x)], choices[:meas => (i, :y)])
    end
    
    # get true speed from the trace
    speed = choices[:speed]
    
    # use the heuristic to guess the speed from the measurments
    (guess, num_used) = guess_speed(measurements, dt)
    
    # obtain the log probability density for each value of s
    logpdfs = [Gen.assess(propose_speed, (measurements, dt),  Gen.DynamicAssignment((:speed, s)))[1]
                for s in 0.1:0.01:1.0]
    
    subplot(3, 4, i)
    plot([0.1, 0.1], [0, 1], "k")
    plot([1, 1], [0, 1], "k")
    plot([guess, guess], [0, 1], "r")
    plot([speed, speed], [0, 1], "--")
    plot(0.1:0.01:1.0, exp.(logpdfs) / maximum(exp.(logpdfs)))
    ax = gca()
    ax[:set_xlim](0, 1.10)
    ax[:set_ylim](0, 1.05)
    title("Trace $i")
end
tight_layout();

We find that the proposal is accurate in many cases and correctly reverts to a generic proposal when the heuristic estimate is unlikely to be accurate.

We now use this custom data-driven proposal in a new inference program:

In [None]:
function do_inference_agent_model_custom(scene::Scene, dt::Float64,
                                         num_ticks::Int,planner_params::PlannerParams,
                                        start::Point, measurements::Vector{Point},  anount_of_computation::Int)
    
    observations = DynamicAssignment()
    observations[:start_x] = start.x
    observations[:start_y] = start.y
    for (i, m) in enumerate(measurements)
        observationsf[:meas => (i, :x)] = m.x
        observations[:meas => (i, :y)] = m.y
    end
    
    (trace, _) = importance_resampling(agent_model, (scene, dt, num_ticks, planner_params), observations, 
        propose_speed, (measurements, dt), anount_of_computation)
    
    return trace
end;

Below, we run the new algorithm.

In [None]:
info = Dict("measurements" => measurements, "scene" => scene, "start" => start)
viz = Viz(viz_server, joinpath(@__DIR__, "../inverse-planning/overlay-viz/dist"), info)
openInNotebook(viz)
sleep(5)
for i=1:1000
    trace = do_inference_agent_model_custom(scene, dt, num_ticks, planner_params, start, measurements, 50)
    putTrace!(viz, i, trace_to_dict(trace))
end
displayInNotebook(viz)

Let's compare the results for a smaller number of particles:

In [None]:
info = Dict("measurements" => measurements, "scene" => scene, "start" => start)
viz = Viz(viz_server, joinpath(@__DIR__, "../inverse-planning/overlay-viz/dist"), info)
openInNotebook(viz)
sleep(5)
for i=1:1000
    trace = do_inference_agent_model_custom(scene, dt, num_ticks, planner_params, start, measurements, 5)
    putTrace!(viz, i, trace_to_dict(trace))
end
displayInNotebook(viz)

In [None]:
info = Dict("measurements" => measurements, "scene" => scene, "start" => start)
viz = Viz(viz_server, joinpath(@__DIR__, "../inverse-planning/overlay-viz/dist"), info)
openInNotebook(viz)
sleep(5)
for i=1:1000
    trace = do_inference_agent_model(scene, dt, num_ticks, planner_params, start, measurements, 10)
    putTrace!(viz, i, trace_to_dict(trace))
end
displayInNotebook(viz)

We can quantify the change in performance of the algorithm when using the default and custom proposals by comparing the output inferred destination distributions against a gold-standard destination distribution obtained using a large amount of computation. The cell below runs the inference algorithm with the default proposal 1000 times, where each run uses a large number of particles (10000). This cell takes two hours to run. We have already precomputed the results, and we have made this cell not runnable.

Now, we write a function that forms a histogram-based density estimate of the distribution on the destination from a collection of traces.

In [None]:
function make_location_histogram(traces, nrows::Int, ncols::Int, scene::Scene)
    locations = [let a = get_assmt(t); Point(a[:dest_x], a[:dest_y]) end for t in traces]
    counts = fill(0.01, nrows * ncols)
    xspan = scene.xmax - scene.xmin
    yspan = scene.ymax - scene.ymin
    for loc in locations
        row = Int(floor((loc.y - scene.ymin) / yspan * nrows))
        col = Int(floor((loc.x - scene.xmin) / xspan * ncols))
        @assert row < nrows
        @assert col < ncols
        counts[row * ncols + col + 1] += 1
    end
    freqs = counts ./ sum(counts)
    freqs
end

We will use 10 rows and 10 columns in the histogram:

In [None]:
nrows = 10
ncols = 10;

We now generate the gold-standard histogram from the gold-standard traces. We have precomputed the result, so we have made this cell not runnable.

We now load the precomputed results from disk:

In [None]:
gold_standard_histogram = load("gold_standard_histogram.jld", "gold_standard_histogram");

Next, we generate histograms for each of the algorithms we want to compare to the gold standard. We test the default proposal and custom proposal algorithms, for various numbers of particles. We also measure the running time per output sample. This cell takes a few minutes to run.

In [None]:
default_histograms = Dict{Int,Vector{Float64}}()
default_times = Dict{Int,Vector{Float64}}()
custom_histograms = Dict{Int,Vector{Float64}}()
custom_times = Dict{Int,Vector{Float64}}()
num_particles_list = [1, 3, 10, 30, 100]

nruns = 1000
for n in num_particles_list
    
    println("evaluating algorithms with num_particles=$n")
    
    # run importance sampling with default proposal
    traces = Vector{Any}(undef, nruns)
    times = Vector{Float64}(undef, nruns)
    for i=1:nruns
        start_time = time_ns()
        traces[i] = do_inference(scene, dt, num_ticks, start, measurements, n)
        times[i] = (time_ns() - start_time)/1e9
    end
    default_histograms[n] = make_location_histogram(traces, nrows, ncols, scene)
    default_times[n] = times
    
    # run importance sampling with custom proposal
    traces = Vector{Any}(undef, nruns)
    times = Vector{Float64}(undef, nruns)
    for i=1:nruns
        start_time = time_ns()
        traces[i] = do_inference_custom(scene, dt, num_ticks, start, measurements, n)
        times[i] = (time_ns() - start_time)/1e9
    end
    custom_histograms[n] = make_location_histogram(traces, nrows, ncols, scene)
    custom_times[n] = times
end

Next, we write a function to compute the KL divergence between two discrete distributions. We will use the KL divergence to measure the difference between the gold-standard output distribution and the output distributions of the algorithms we are evaluating.

In [None]:
function kl_divergence(dist1::Vector{Float64}, dist2::Vector{Float64})
    sum(dist1 .* (log.(dist1) .- log.(dist2)))
end

default_kls = [kl_divergence(gold_standard_histogram, default_histograms[n]) for n in num_particles_list]
default_median_elapsed = [median(default_times[n]) for n in num_particles_list]

custom_kls = [kl_divergence(gold_standard_histogram, custom_histograms[n]) for n in num_particles_list]
custom_median_elapsed = [median(custom_times[n]) for n in num_particles_list];

Finally, we plot the results. On the left, we plot the error as measured against the gold standard (KL divergence) as a function of the number of particles used in the importance smapling algorithm. On the right we plot the error versus the median time per run of the algorithm

In [None]:
PyPlot.figure(figsize=(8, 4))

PyPlot.subplot(1, 2, 1)
PyPlot.plot(num_particles_list, default_kls, label="default")
PyPlot.plot(num_particles_list, custom_kls, label="custom")
PyPlot.legend()
PyPlot.ylabel("Error")
PyPlot.xlabel("number of particles")

PyPlot.subplot(1, 2, 2)
PyPlot.scatter(default_median_elapsed, default_kls, label="default")
PyPlot.scatter(custom_median_elapsed, custom_kls, label="custom")
PyPlot.legend()
PyPlot.ylabel("Error")
PyPlot.xlabel("Median samples per second")

From the plot on the right, we see that there is a regime where using the custom proposal gives less than half the error over using the default proposal, for the same amount of running time.

## Step 7: Learning a custom proposal

It was not too difficult to design a custom proposal for the speed based on a heuristic estimator. However, sometimes it is not straightforward to manually design a custom proposal. In such cases, we can still design a *sketch* for a custom proposal, and *train* the parameters of the proposal automatically, to fill in our missing knowledge, or to generate code that it would be hard to program by hand.

For example, consider the destination. The default proposal distribution for the destination is the uniform distribution on the scene region. Intuitively, it should be possible to for a short and fast program to narrow down the scene region based on the measurements. However, writing a program that is robust, by hand, is challening. This section shows how to learn a custom proposal for the destination from data simulated from the model. The idea of training proposal distribution on data simulated from a generative model has been called *amortized inference* [3] and *inference compilation* [4], and is also the core of the wake-sleep algorithm [5].


- [3] Stuhlmüller, Andreas, Jacob Taylor, and Noah Goodman. "Learning stochastic inverses." Advances in neural information processing systems. 2013.

- [4] Le, Tuan Anh, Atilim Gunes Baydin, and Frank Wood. "Inference compilation and universal probabilistic programming." arXiv preprint arXiv:1610.09900 (2016).

- [5] Hinton, Geoffrey E., et al. "The" wake-sleep" algorithm for unsupervised neural networks." Science 268.5214 (1995): 1158-1161.

In [None]:
sigmoid(x) = 1 ./ (1 .+ exp.(-x))

In [None]:
function dest_x_neural_net(nn_params, x_first::Real, y_first::Real, x_last::Real, y_last::Real)
    (W1, b1, W2, b2, W3, b3) = nn_params
    input_layer = [x_first, y_first, x_last, y_last]
    hidden_layer_1 = sigmoid(W1 * input_layer .+ b1)
    hidden_layer_2 = sigmoid(W2 * hidden_layer_1 .+ b2)
    output_layer = exp.(W3 * hidden_layer_2 .+ b3)
end

function dest_y_neural_net(nn_params, x_first::Real, y_first::Real, x_last::Real, y_last::Real, dest_x::Real)
    (W1, b1, W2, b2, W3, b3) = nn_params
    input_layer = [x_first, y_first, x_last, y_last, dest_x]
    hidden_layer_1 = sigmoid(W1 * input_layer .+ b1)
    hidden_layer_2 = sigmoid(W2 * hidden_layer_1 .+ b2)
    output_layer = exp.(W3 * hidden_layer_2 .+ b3)
end

@gen function custom_dest_proposal(measurements::Vector{Point}, scene::Scene)
        
    @param x_W1::Matrix{Float64}
    @param x_b1::Vector{Float64}
    @param x_W2::Matrix{Float64}
    @param x_b2::Vector{Float64}
    @param x_W3::Matrix{Float64}
    @param x_b3::Vector{Float64}
    
    @param y_W1::Matrix{Float64}
    @param y_b1::Vector{Float64}
    @param y_W2::Matrix{Float64}
    @param y_b2::Vector{Float64}
    @param y_W3::Matrix{Float64}
    @param y_b3::Vector{Float64}
    
    num_x_bins = length(x_b3)
    num_y_bins = length(y_b3)
    
    x_first = measurements[1].x
    x_last = measurements[end].x
    y_first = measurements[1].y
    y_last = measurements[end].y
    
    # sample dest_x
    x_bounds = collect(range(scene.xmin, stop=scene.xmax, length=num_x_bins+1))
    x_probs = dest_x_neural_net((x_W1, x_b1, x_W2, x_b2, x_W3, x_b3), x_first, y_first, x_last, y_last)
    dest_x = @addr(Gen.piecewise_uniform(x_bounds, x_probs / sum(x_probs)), :dest_x)
    
    # sample dest_y
    y_bounds = collect(range(scene.xmin, stop=scene.xmax, length=num_y_bins+1))
    y_probs = dest_y_neural_net((y_W1, y_b1, y_W2, y_b2, y_W3, y_b3), x_first, y_first, x_last, y_last, dest_x)
    @addr(Gen.piecewise_uniform(y_bounds, y_probs / sum(y_probs)), :dest_y)
    
    nothing
end

num_x_bins = 5
num_y_bins = 5

# architecture of the neural network
num_hidden_1 = 50
num_hidden_2 = 50

import Random
Random.seed!(1)

# set parameters for dest_x_neural_net predictor network
init_param!(custom_dest_proposal, :x_W1, 0.001 * rand(num_hidden_1, 4))
init_param!(custom_dest_proposal, :x_b1, 0.001 * rand(num_hidden_1))
init_param!(custom_dest_proposal, :x_W2, 0.001 * rand(num_hidden_2, num_hidden_1))
init_param!(custom_dest_proposal, :x_b2, 0.001 * rand(num_hidden_2))
init_param!(custom_dest_proposal, :x_W3, 0.001 * rand(num_x_bins, num_hidden_2))
init_param!(custom_dest_proposal, :x_b3, 0.001 * rand(num_x_bins))

# set parameters for dest_y_neural_net predictor network
init_param!(custom_dest_proposal, :y_W1, 0.001 * rand(num_hidden_1, 5))
init_param!(custom_dest_proposal, :y_b1, 0.001 * rand(num_hidden_1))
init_param!(custom_dest_proposal, :y_W2, 0.001 * rand(num_hidden_2, num_hidden_1))
init_param!(custom_dest_proposal, :y_b2, 0.001 * rand(num_hidden_2))
init_param!(custom_dest_proposal, :y_W3, 0.001 * rand(num_y_bins, num_hidden_2))
init_param!(custom_dest_proposal, :y_b3, 0.001 * rand(num_y_bins));

Next, we visualize the proposal density for a given data set. The cell below computes the proposal density at a a grid of points in the scene, and returns a list of `tiles` that can be rendered on top of the scene by the visualization.

In [None]:
function make_proposal_tiles(proposal, measurements::Vector{Point}, scene::Scene, num_x_bins::Int, num_y_bins::Int)
    dest_proposal_tiles = []
    xspan = scene.xmax - scene.xmin
    yspan = scene.ymax - scene.ymin
    w = xspan / num_x_bins
    h = yspan / num_y_bins
    for col=1:num_x_bins
        for row=1:num_y_bins
            x = scene.xmin + (col - 1) * w
            y = scene.ymin + (row - 1) * h
            assmt = DynamicAssignment()
            assmt[:dest_x] = x + (w/2)
            assmt[:dest_y] = y + (h/2)
            (weight, _) = assess(proposal, (measurements, scene), assmt)
            tile = Dict("x" => x, "y" => y, "w" => w, "h" => h, "density" => exp(weight))
            push!(dest_proposal_tiles, tile)
        end
    end
    max_density = maximum([tile["density"] for tile in dest_proposal_tiles])
    for tile in dest_proposal_tiles
        tile["density"] /= max_density
    end
    dest_proposal_tiles
end

First, we show the proposal distribution prior to training:

In [None]:
tiles = make_proposal_tiles(custom_dest_proposal, measurements, scene, 5, 5)
info = Dict("measurements" => measurements, "scene" => scene, "start" => start, "tiles" => tiles)
viz = Viz(server, joinpath(@__DIR__, "overlay-viz/dist"), info)
displayInNotebook(viz)

We see that the entire scene is colored the same color. The untrained proposal is approximately uniform on the entire scene, just like the default proposal.



Next, we train the network using stochastic gradient descent.

In [None]:
function data_generator()
    
    local assmt
    local measurements
    
    # obtain a trace of the model where planning succeeded
    done = false
    while !done
        (assmt, _, retval) = Gen.propose(agent_model, (scene, dt, num_ticks, planner_params))
        (planning_failed, maybe_path) = retval
        done = !planning_failed
    end

    measurements = [Point(assmt[:meas => (i, :x)], assmt[:meas => (i, :y)]) for i=1:num_ticks]
    inputs = (measurements, scene)
    
    constraints = Gen.DynamicAssignment()
    constraints[:dest_x] = assmt[:dest_x]
    constraints[:dest_y] = assmt[:dest_y]
    
    (inputs, constraints)
end

In [None]:
update = ParamUpdate(GradientDescent(0.1, 1000), custom_dest_proposal)
train!(custom_dest_proposal, data_generator, update, 2, 10000, 100, 100; verbose=true)

In [None]:
update = ParamUpdate(GradientDescent(0.1, 1000), custom_dest_proposal)
train!(custom_dest_proposal, data_generator, update, 100, 10000, 100, 100; verbose=true)

We now visualize the trained proposal distribution for the given data set:

In [None]:
tiles = make_proposal_tiles(custom_dest_proposal_tf, measurements, scene, 5, 5)
info = Dict("measurements" => measurements, "scene" => scene, "start" => start, "tiles" => tiles)
viz = Viz(server, joinpath(@__DIR__, "overlay-viz/dist"), info)
displayInNotebook(viz)

For this data set, the proposal distribution seems to make sense.

We now can combine the custom proposal for the destination with the custom proposal for the speed:

In [None]:
@gen function custom_proposal(measurements::Vector{Point}, scene::Scene, guess, num_used)
    @splice(custom_dest_proposal(measurements, scene))
    @splice(propose_speed(guess, num_used))
end

We write a new inference algorithm that uses importance sampling based on this new trained proposal:

In [None]:
function do_inference_trained(scene::Scene, dt::Float64, num_ticks::Int,
                              start::Point, measurements::Vector{Point},
                              num_particles::Int)
    (guess, num_used) = guess_speed(measurements, dt)
    
    observations = DynamicAssignment()
    observations[:start_x] = start.x
    observations[:start_y] = start.y
    for (i, m) in enumerate(measurements)
        observations[:meas => i => :x] = m.x
        observations[:meas => i => :y] = m.y
    end
    
    # use importance sampling with resampling to obtain an inferred trace
    (trace, _) = importance_resampling(model, (scene, dt, num_ticks), observations, 
        custom_proposal, (measurements, scene, guess, num_used), num_particles)
    
    trace
end

Below, we show some samples from this new algorithm.

In [None]:
info = Dict("measurements" => measurements, "scene" => scene, "start" => start)
viz = Viz(server, joinpath(@__DIR__, "overlay-viz/dist"), info)
openInNotebook(viz)
sleep(5)
for i=1:1000
    trace = do_inference_trained(scene, dt, num_ticks, start, measurements, 50)
    putTrace!(viz, i, trace_to_dict(trace))
end
displayInNotebook(viz)

We now generate histograms for the destination distributions for the new algorithm, for various numbers of particles.

In [None]:
trained_histograms = Dict{Int,Vector{Float64}}()
trained_times = Dict{Int,Vector{Float64}}()
num_particles_list = [1, 3, 10, 30, 100]

nruns = 1000
for n in num_particles_list
    
    println("evaluating algorithms with num_particles=$n")
    
    # run importance sampling with default proposal
    traces = Vector{Any}(undef, nruns)
    times = Vector{Float64}(undef, nruns)
    for i=1:nruns
        start_time = time_ns()
        traces[i] = do_inference_trained(scene, dt, num_ticks, start, measurements, n)
        times[i] = (time_ns() - start_time)/1e9
    end
    trained_histograms[n] = make_location_histogram(traces, nrows, ncols, scene)
    trained_times[n] = times
end

We compute KL divergences between the gold standard distribution and these histograms.

In [None]:
trained_kls = [kl_divergence(gold_standard_histogram, trained_histograms[n]) for n in num_particles_list]
trained_median_elapsed = [median(trained_times[n]) for n in num_particles_list];

Below, we plot the results and compare the three importance sampling algorithms developed in this notebook.

In [None]:
PyPlot.figure(figsize=(8, 4))

PyPlot.subplot(1, 2, 1)
PyPlot.plot(num_particles_list, default_kls, label="default")
PyPlot.plot(num_particles_list, custom_kls, label="custom")
PyPlot.plot(num_particles_list, trained_kls, label="trained")
PyPlot.legend()
PyPlot.ylabel("Error")
PyPlot.xlabel("number of particles")

PyPlot.subplot(1, 2, 2)
PyPlot.scatter(default_median_elapsed, default_kls, label="default")
PyPlot.scatter(custom_median_elapsed, custom_kls, label="custom")
PyPlot.scatter(custom_median_elapsed, trained_kls, label="trained")
PyPlot.legend()
PyPlot.ylabel("Error")
PyPlot.xlabel("Median samples per second")

## Using TensorFlow

In [None]:
using PyCall

In [None]:
@pyimport tensorflow as tf

In [None]:
using GenTF

In [None]:
num_x_bins = 5
num_y_bins = 5
num_hidden_1 = 50
num_hidden_2 = 50

import Random
Random.seed!(1)

# dest_x_neural_net predictor network
x_W1 = tf.Variable(0.001 * rand(num_hidden_1, 4))
x_b1 = tf.Variable(0.001 * rand(num_hidden_1))
x_W2 = tf.Variable(0.001 * rand(num_hidden_2, num_hidden_1))
x_b2 = tf.Variable(0.001 * rand(num_hidden_2))
x_W3 = tf.Variable(0.001 * rand(num_x_bins, num_hidden_2))
x_b3 = tf.Variable(0.001 * rand(num_x_bins))
x_nn_input = tf.placeholder(dtype=tf.float64, shape=(4,))
x_nn_hidden_1 = tf.sigmoid(tf.add(tf.squeeze(tf.matmul(x_W1, tf.expand_dims(x_nn_input, axis=1)), axis=1), x_b1))
x_nn_hidden_2 = tf.sigmoid(tf.add(tf.squeeze(tf.matmul(x_W2, tf.expand_dims(x_nn_hidden_1, axis=1)), axis=1), x_b2))
x_nn_output = tf.exp(tf.add(tf.squeeze(tf.matmul(x_W3, tf.expand_dims(x_nn_hidden_2, axis=1)), axis=1), x_b3))
x_nn = TFFunction([x_W1, x_b1, x_W2, x_b2, x_W3, x_b3], [x_nn_input], x_nn_output)

# dest_y_neural_net predictor network
y_W1 = tf.Variable(0.001 * rand(num_hidden_1, 5))
y_b1 = tf.Variable(0.001 * rand(num_hidden_1))
y_W2 = tf.Variable(0.001 * rand(num_hidden_2, num_hidden_1))
y_b2 = tf.Variable(0.001 * rand(num_hidden_2))
y_W3 = tf.Variable(0.001 * rand(num_x_bins, num_hidden_2))
y_b3 = tf.Variable(0.001 * rand(num_x_bins))
y_nn_input = tf.placeholder(dtype=tf.float64, shape=(5,))
y_nn_hidden_1 = tf.sigmoid(tf.add(tf.squeeze(tf.matmul(y_W1, tf.expand_dims(y_nn_input, axis=1)), axis=1), y_b1))
y_nn_hidden_2 = tf.sigmoid(tf.add(tf.squeeze(tf.matmul(y_W2, tf.expand_dims(y_nn_hidden_1, axis=1)), axis=1), y_b2))
y_nn_output = tf.exp(tf.add(tf.squeeze(tf.matmul(y_W3, tf.expand_dims(y_nn_hidden_2, axis=1)), axis=1), y_b3))
y_nn = TFFunction([y_W1, y_b1, y_W2, y_b2, y_W3, y_b3], [y_nn_input], y_nn_output)

@gen function custom_dest_proposal_tf(measurements::Vector{Point}, scene::Scene)
        
    x_first = measurements[1].x
    x_last = measurements[end].x
    y_first = measurements[1].y
    y_last = measurements[end].y
    
    # sample dest_x
    x_bounds = collect(range(scene.xmin, stop=scene.xmax, length=num_x_bins+1))
    x_probs = @addr(x_nn([x_first, y_first, x_last, y_last]), :x_net)
    dest_x = @addr(Gen.piecewise_uniform(x_bounds, x_probs / sum(x_probs)), :dest_x)
    
    # sample dest_y
    y_bounds = collect(range(scene.xmin, stop=scene.xmax, length=num_y_bins+1))
    y_probs = @addr(y_nn([x_first, y_first, x_last, y_last, dest_x]), :y_net)
    @addr(Gen.piecewise_uniform(y_bounds, y_probs / sum(y_probs)), :dest_y)
    
    nothing
end

In [None]:
update = ParamUpdate(GradientDescent(0.001, 1000), x_nn => collect(get_params(x_nn)), y_nn => collect(get_params(y_nn)))
train!(custom_dest_proposal_tf, data_generator, update, 1000, 10000, 100, 100; verbose=true)

In [None]:
tiles = make_proposal_tiles(custom_dest_proposal_tf, measurements, scene, 5, 5)
info = Dict("measurements" => measurements, "scene" => scene, "start" => start, "tiles" => tiles)
viz = Viz(server, joinpath(@__DIR__, "overlay-viz/dist"), info)
displayInNotebook(viz)