In [1]:
; If using Jupyter Lab, run
;   jupyter labextension install @jupyterlab/javascript-extension
; before using this notebook.
(ns tutorial
    (:refer-clojure :only [])
    (:require
        [clojure.repl :refer :all]
        [metaprob.state :as state]
        [metaprob.trace :as trace]
        [metaprob.sequence :as sequence]
        [metaprob.builtin-impl :as impl]
        [metaprob.syntax :refer :all]
        [metaprob.builtin :refer :all]
        [metaprob.prelude :refer :all]
        [metaprob.distributions :refer :all]
        [metaprob.interpreters :refer :all]
        [metaprob.inference :refer :all]
        [metaprob.compositional :as comp]
        [metaprob.examples.tutorial.jupyter :refer :all]
        [metaprob.examples.gaussian :refer :all]))

(enable-trace-plots)

# Welcome to Metaprob!

Metaprob is a new, experimental language for probabilistic programming, embedded in Clojure. In this tutorial, we'll take a bottom-up approach to introducing the language. We'll begin by exploring Metaprob's core data types, including the flexible "trace" data structure at the heart of the language's design. From there, we'll look at the probabilistic semantics of Metaprob programs, and tackle some small statistical inference problems. By the end of this tutorial, we'll be querying a more sophisticated model of real US census data!

# 1. Metaprob values and the trace data structure
The Metaprob language is embedded in Clojure, and inherits several of Clojure's atomic datatypes: Clojure's numbers, keywords, booleans, and strings are all Metaprob values, too. Many of Clojure's functions for manipulating these values have Metaprob versions.

In [2]:
(+ 5 7)

12

If you want to use a Clojure function that is not available in Metaprob, you can fully qualify the procedure's name.

In [3]:
(clojure.core/str "Hello" " " "world")

"Hello world"

(Any Clojure function can be used without modification in Metaprob. For reasons we'll cover a bit later, however, calling impure Clojure functions that have nondeterministic behavior (e.g., because they generate random values during their execution) must be done with care, in order not to break Metaprob's probabilistic semantics.)

At first glance, it appears that Metaprob supports Clojure's vector and list datatypes, too. Here, we apply Metaprob's `length` function to a vector, then to a list.

In [4]:
(length [1 5 2])

3

In [5]:
(length '(1 2 3))

3

But really, lists and vectors are just special cases of Metaprob's "trace" data structure:

In [6]:
(and (trace? [1 2 3]) (trace? '(1 2 3)))

true

In [7]:
(plot-trace [5 2 9] 200 200) ; produce a 200x200 diagram of the given trace

In [8]:
(plot-trace '(5 2 9) 500 100) ; produce a 500x100 diagram of the given trace

A trace is a tree-like data structure. Every trace:
* has a root node (the leftmost gray box in the diagrams above);
* optionally stores a value at the root node (displayed inside that gray box); and
* has zero or more named subtraces (connected to the root by thin edges in the diagrams above). Each subtrace has its own root node and, potentially, its own named subtraces.

Subtraces may be named with strings, numbers, booleans, or the value `nil`. The name of each subtrace appears above its root node in the diagrams above.

So far, you have seen two special kinds of trace:

1. A _vector_ is a trace with no value at its root, and `n` subtraces, labeled by the integers `0` through `n-1`. In a vector, each subtrace stores a value at its root, and has no children.

2. A _list_ is either:
    - The empty trace (a root node with no value or children), or
    - A trace with a value at its root, and a single subtrace, called "rest", which itself is a list.

But most traces are not vectors or lists. Consider the following trace, for example:

In [9]:
(plot-trace (gen [x y] (+ x (* 3 y))))

As you may have guessed, the above value is a Metaprob procedure. (The `gen` macro, which we'll cover in more detail soon, is Metaprob's version of Clojure's `fn`, Scheme's `lambda`, or JavaScript's anonymous `function`.) In Metaprob, all compound data, including procedures, are stored as traces.

# 2. Constructing and manipulating traces

Let's look at some of the tools Metaprob provides for manipulating and creating traces.

We'll start with trace _manipulation_. We'll use the trace depicted above as a running example. Let's give it a name using the Metaprob keyword `define`. Below, we also demonstrate the `pprint` function, which is another way (apart from `plot-trace`) to get a human-readable representation of a trace. You can ignore the word `COMPILED` for now: it just indicates that the Metaprob has produced an efficient Clojure-compiled version of the procedure that this trace represents (something the `gen` macro is also responsible for).

In [10]:
(define example-trace (gen [x y] (+ x (* 3 y))))
(pprint example-trace)

COMPILED "prob prog"
  environment: #namespace[tutorial]
  generative-source: "gen"
    body: "application"
      0: "variable"
        name: "+"
      1: "variable"
        name: "x"
      2: "application"
        0: "variable"
          name: "*"
        1: "literal"
          value: 3
        2: "variable"
          name: "y"
    pattern: "tuple"
      0: "variable"
        name: "x"
      1: "variable"
        name: "y"
  name: "example-trace--550373278"


## 2.1 Reading values in traces

One of the most basic things we can do with a trace is get its root value, using `trace-get`:

In [11]:
(trace-get example-trace)

"prob prog"

Not all traces have values at their roots, and `trace-get` will throw an error if there is no value for it to return. So before you call `trace-get`, it is sometimes useful to call `trace-has?`, which checks if a trace has a root value.

In [12]:
; Recall that a vector is represented
; as a trace with no value at its root,
; and subtraces for each of its elements.
(trace-has? [1 2 3])

false

Both `trace-has?` and `trace-get` accept an optional second argument: an _address_. An address is either:
- the name of a subtrace, or
- a list of names, representing a "path" to the desired subtrace.

This version of `trace-has?` returns false if the given address is invalid for the trace, or if it is valid but the specified subtrace has no value at its root.

In [13]:
; Get the value at the subtrace named 2
(trace-get [5 7 9] 2)

9

In [14]:
; There is no subtrace 4
(trace-has? [5 7 9] 4)

false

In [15]:
; There is no subtrace "rest"
(trace-has? '() "rest")

false

In [16]:
; The subtrace "rest" exists, but has no value at its root
(trace-has? '(1) "rest")

false

In [17]:
; The subtrace exists and has a value (2)
(trace-has? '(1 2) "rest")

true

In [18]:
; Returns the _value_ at the subtrace named "rest" -- 2 -- and not 
; the subtrace itself -- which would be '(2).
(trace-get '(1 2) "rest")

2

In [19]:
; Gets the value in the "environment" subtrace.
(trace-get example-trace "environment")

#namespace[tutorial]

In [20]:
; Gets the _value_ in the 0 subtrace of the "body" subtrace of the "generative-source" subtrace
(trace-get example-trace '("generative-source" "body" 0))

"variable"

In [21]:
; There is no child subtrace called 4 of the '("generative-source" "body") subtrace.
(trace-has? example-trace '("generative-source" "body" 4))

false

### Exercise: `trace-get-maybe` for safe access
In this exercise, you will write your first Metaprob function! 

To define a function, use the form
```
(define func-name
  (gen [arg1 arg2 ...]
    body))
```

In the box below, define `(trace-get-maybe tr adr default)`, which should return the value of `tr` at `adr`, or `default` if no value exists at that address.

In [22]:
; Solution
(define trace-get-maybe
  (gen [tr adr default]
    (if (trace-has? tr adr) (trace-get tr adr) default)))

#'tutorial/trace-get-maybe

### Exercise: `vec-ref`
Write a function `(vec-ref v n)` that takes in a vector (a trace with subtraces labeled 0, 1, ...), and an element number `n`, and returns the `n`th element of the vector. If `n` is out-of-bounds for the vector, return the Clojure keyword `:out-of-bounds`.

As in Clojure, the body can contain multiple expressions, in which case they will be evaluated in turn and the last result will be returned.

In [23]:
; Solution
(define vec-ref
  (gen [v n]
    (trace-get-maybe v n :out-of-bounds)))

#'tutorial/vec-ref

**NB**: `trace-has?` does _not_ check whether a particular address _exists_ in the trace, just whether there is a _value_ at that trace. It could return false for the address `'("a" "b")` but not for `'("a" "b" "c")`. Similarly, `trace-get` does not return an entire subtrace at an address, but just the value at that address. `(trace-get example-trace "generative-source")` returns the string `"gen"`, not the entire subtrace representing the `example-trace` source code.

## 2.2 Reading subtraces

In order to check for or retrieve a _subtrace_ at an address, use `trace-has-subtrace?` and `trace-subtrace`. Unlike `trace-has?` and `trace-get`, these two functions _require_ a second argument, which specifies the address of the subtrace in question.

In [24]:
; Extract the generative-source subtrace of our example.
(plot-trace (trace-subtrace example-trace "generative-source"))

In [25]:
; Extraction at a longer address.
(plot-trace (trace-subtrace example-trace '("generative-source" "body" 2)) 300 200)

### Exercise: N-D arrays
We can represent an $n$-dimensional array in Metaprob using vectors of vectors. For example, a matrix (2D array) would be a vector of _matrix rows_, each of which is itself a vector of numbers:

In [26]:
(define example-2d-matrix [[1 2 3] 
                           [4 5 6] 
                           [7 8 9]])

#'tutorial/example-2d-matrix

Write a function `(n-d-vec-ref mat indices)` which takes in
- an $n$-dimensional matrix `mat`, represented as a vector of vectors (of vectors, of vectors, ...), and
- a Metaprob list of indices
and returns either `:out-of-bounds` (if the indices push beyond the array's bounds) or the value at the given address.

(Food for thought: why doesn't `(trace-get v indices)` suffice to implement this operation? If confused, try using `plot-trace` on an N-D vector.)

In [27]:
; Solution
(define n-d-vec-ref
  (gen [v indices]
    (if (trace-has? indices)
        (if (and (trace? v) (trace-has? v (trace-get indices)))
            (n-d-vec-ref (trace-get v (trace-get indices)) (trace-subtrace indices "rest"))
            :out-of-bounds)
        v)))

#'tutorial/n-d-vec-ref

In [28]:
; Testing: should output true.
(and
    (= 3 (n-d-vec-ref example-2d-matrix '(0 2)))
    (= 7 (n-d-vec-ref example-2d-matrix '(2 0)))
    (= :out-of-bounds (n-d-vec-ref example-2d-matrix '(1 1 2)))
    (= :out-of-bounds (n-d-vec-ref example-2d-matrix '(1 3)))
    (= :out-of-bounds (n-d-vec-ref example-2d-matrix '(3 1))))

true

### Exercise: Refactoring trace access code

In a Metaprob program she is working on, Alice has written the following expression:

```
(trace-subtrace (trace-subtrace (trace-get flip "implementation") "generative-source") "body")
```

This code is correct, but during code review, it strikes Bob as inelegant. He rewrites it as follows, taking advantage of the fact that functions like `trace-get` accept arbitrarily long "paths" of subtrace names as _addresses_:

```
(trace-subtrace flip '("implementation" "generative-source" "body"))
```

He is so pleased with the elegance of his code that he commits without testing. Oops!

Why doesn't Bob's rewrite work, when Alice's original code did? Can Alice's code still be made more concise in another way?

(Note: `flip` is a real Metaprob value, so you can run both snippets -- and your own proposed rewrite -- to test them out.)

In [29]:
; Solution
; `flip` has a subtrace called "implementation", but that subtrace has no children. In particular, "generative-source" is not a child of the "implementation" subtrace. 
; So the address '("implementation" "generative-source") is meaningless for `flip`.
; Instead, the value stored at the root of `flip`'s "implementation" subtrace is itself a trace.
; Alice's code can be simplified somewhat, to:
(trace-subtrace (trace-get flip "implementation") '("generative-source" "body"))
; because '("generative-source" "body") is exactly the address that calling `(trace-subtrace (trace-subtrace ... "generative-source") "body")` (as Alice did) probes.

{0 {"definiens" {"else" {"else" {0 {"name" {:value "tuple"}, :value "variable"}, 1 {0 {"name" {:value "apply"}, :value "variable"}, 1 {"name" {:value "sampler"}, :value "variable"}, 2 {"name" {:value "inputs"}, :value "variable"}, :value "application"}, 2 {"value" {:value 0}, :value "literal"}, :value "application"}, "then" {0 {"name" {:value "tuple"}, :value "variable"}, 1 {0 {"name" {:value "trace-get"}, :value "variable"}, 1 {"name" {:value "target"}, :value "variable"}, :value "application"}, 2 {0 {"name" {:value "scorer"}, :value "variable"}, 1 {0 {"name" {:value "trace-get"}, :value "variable"}, 1 {"name" {:value "target"}, :value "variable"}, :value "application"}, 2 {"name" {:value "inputs"}, :value "variable"}, :value "application"}, :value "application"}, "predicate" {0 {"name" {:value "trace-has?"}, :value "variable"}, 1 {"name" {:value "target"}, :value "variable"}, :value "application"}, :value "if"}, "then" {0 {"name" {:value "tuple"}, :value "variable"}, 1 {0 {"name" {:v

In [30]:
; Run this cell to visualize
(plot-trace (trace-get flip "implementation") 2000 2000)

### Exercise: Traversing nested traces

In the exercise above, you saw a situation in which a trace's root value was itself a trace. Write `flattened-trace-has-subtrace?` and `flattened-trace-subtrace`, which work just like `trace-has-subtrace?` and `trace-subtrace`, with the following exception: if you reach a "dead end" while traversing an address, check to see if the value stored at that dead end is itself a trace (using `trace?`), and if so, continue traversing as if it were a subtrace.

This will enable you to write, e.g., `(flattened-trace-subtrace flip '("implementation" "generative-source" "body"))`, or `(flattened-trace-subtrace example-2d-matrix '(1 1))`.

In [31]:
; Solution

(define flattened-trace-has-subtrace?
  (gen [tr adr]
    (or
        ; address is empty: we are already at the subtrace
        (not (trace-has? adr))
        ; the first piece of the address exists as a direct subtrace,
        ; and the recursive call succeeds
        (and
            (trace-has-subtrace? tr (trace-get adr))
            (flattened-trace-has-subtrace? (trace-subtrace tr (trace-get adr)) (trace-subtrace adr "rest")))
        ; our current node stores a trace, for which (flattened-trace-has-subtrace? . adr) succeeds.
        (and
            (trace-has? tr)
            (trace? (trace-get tr))
            (flattened-trace-has-subtrace? (trace-get tr) adr)))))

(define flattened-trace-subtrace
  (gen [tr adr]
    (cond 
        (not (trace-has? adr))
        tr
        
        (trace-has-subtrace? tr (trace-get adr))
        (flattened-trace-subtrace (trace-subtrace tr (trace-get adr)) (trace-subtrace adr "rest"))
         
        (and (trace-has? tr) (trace? (trace-get tr)))
        (flattened-trace-subtrace (trace-get tr) adr))))

#'tutorial/flattened-trace-subtrace

## 2.3 Setting (and clearing) values and subtraces

Metaprob provides two functions, `(trace-set tr [adr] val)` and `(trace-set-subtrace tr adr sub)`, which return modified versions of traces:
- `(trace-set tr val)` changes the value of the trace `tr`'s root node to `val`.
- `(trace-set tr adr val)` changes the _root value_ of the subtrace at the address `adr` to `val`.
- `(trace-set-subtrace tr adr sub)` changes the entire subtrace of `tr` at `adr` to `sub`.

These functions do not change the original trace, but rather create a _copy_ of the trace with some value or subtrace changed. Metaprob does have mutable traces, and operators like `trace-set!` and `trace-set-subtrace!` that operate on them, but we will not have a need for them in this tutorial.

### Exercise: Predict the results.

Predict the results of each of the following expressions:

In [32]:
(clojure.core/refer-clojure :only '[println])

; 1.
(println "Problem 1")
(pprint (trace-set [5 7 9] 0))

; 2.
(println "Problem 2")
(pprint (trace-set [5 7 9] 0 0))

; 3.
(println "Problem 3")
(pprint (trace-set [5 7 9] 5 2))

; 4.
(println "Problem 4")
(pprint (trace-set-subtrace '(1 2 3 4) '("rest" "rest") '(5 6 7)))

; 5.
(println "Problem 5")
(pprint (trace-set-subtrace (gen [] 5) '("generative-source" "pattern")
                            (trace-subtrace (gen [x y z] (+ x y z))
                                            '("generative-source" "pattern"))))

; 6.
(println "Problem 6")
(pprint (trace-set-subtrace [5 7 9] 2 '(5 7 9)))

; 7.
(println "Problem 7")
(pprint (trace-set [5 7 9] 2 '(5 7 9)))

Problem 1
0
  0: 5
  1: 7
  2: 9
Problem 2
[0 7 9]
Problem 3

  0: 5
  1: 7
  2: 9
  5: 2
Problem 4
(1 2 5 6 7)
Problem 5
"prob prog"
  environment: #namespace[tutorial]
  generative-source: "gen"
    body: "literal"
      value: 5
    pattern: "tuple"
      0: "variable"
        name: "x"
      1: "variable"
        name: "y"
      2: "variable"
        name: "z"
  name: "-547994576"
Problem 6

  0: 5
  1: 7
  2: (5 7 9)
Problem 7
[5
 7
 (5 7 9)]


There is also a `trace-delete` function, which clears the value in a trace's root node. Optionally, it can take an address; in that case, it clears the _root value_ of the subtrace at that address (but does _not_ delete the subtrace).

In [33]:
(plot-trace (trace/trace-delete [5 7 9] 2) 200)

In the above example, notice that although the `2` subtrace has no value, it still exists in the trace.

## 2.4 Constructing traces

`(trace ...)` is Metaprob's all-purpose constructor for traces. If you pass it no arguments, you get the empty trace:

In [34]:
(plot-trace (trace) 100)

One way to construct a bigger trace is to start out with an empty one and use `trace-set` and `trace-set-subtrace` to add values and sub-traces. This can be quite tedious, so `(trace ...)` also supports an argument syntax that is demonstrated in the examples below:

In [35]:
; Trace with a root value
(plot-trace (trace :value 10) 75)

In [36]:
; Trace with subtraces with root values
(plot-trace (trace "Massachusetts" "Boston", "California" "Sacramento", "New York" "New York") 300)

In [37]:
; Trace with root value and subtraces with root values
(plot-trace
    (trace :value "John Doe", "first name" "John", "last name" "Doe", "address" "500 Penn Ave.", "age" 94)
    400)

In [38]:
; Trace with arbitrary named subtrace, using (** ...)
(plot-trace
    (trace "a" (** (trace "b" "c"))) 
  300 100)

In [39]:
; Trace with multiple named subtraces and a value
(plot-trace
    (trace :value "phrases", 
           "green" (** (trace :value "first word",
                              "eggs" (** (trace "and" (** (trace "ham" "!")))),
                              "peace" ".",
                              "ideas" (** (trace "sleep" (** (trace "furiously" "?"))))))))

### Exercise: Binary search tree

As a way to get used to these trace operations, in this exercise, we'll implement binary search trees in Metaprob using traces. For our purposes, a binary search tree is either:
- The empty trace (representing an empty tree)
- A trace with a number at its root, and (optionally) "left" and "right" subtraces which are themselves binary trees. All values in the left subtrace should be less than or equal to the root's value, and all values in the right subtrace should be greater than or equal to the root's value.

For this exercise, write:
1. A manually constructed binary search tree, using the `(trace ...)` function, with the numbers 1 through 10. (There are multiple valid binary search trees containing these numbers; any organization that is consistent with the above rules is fine.)
2. A function `(insert tree i)`, which inserts a number `i` into `tree`.
3. A function `(contains? tree i)` which checks whether a tree contains a given node.

In [40]:
; Solution

(define example-tree
  (trace :value 5
         "left" (** (trace :value  3
                           "left"  (** (trace :value 1 "right" 2))
                           "right" 4))
         "right" (** (trace :value  7
                            "left" 6
                            "right" (** (trace :value 9 "left" 8 "right" 10))))))

(define insert 
  (gen [tree i]
  (if (empty-trace? tree)
      (trace-set tree i)
      (block
        (define branch (if (<= i (trace-get tree)) "left" "right"))
        (trace-set-subtrace
          tree
          branch
          (insert (if (trace-has-subtrace? tree branch)
                      (trace-subtrace tree branch)
                      (trace)) i))))))

(define contains?
    (gen [tree i]
         (and (not (empty-trace? tree))
              (or
                  (= (trace-get tree) i)
                  (block
                      (define branch (if (<= i (trace-get tree)) "left" "right"))
                      (and (trace-has-subtrace? tree branch)
                           (contains? (trace-subtrace tree branch) i)))))))

#'tutorial/contains?

In [41]:
(clojure.core/every? (gen [n] (contains? example-tree (+ n 1))) (range 10))

true

## 2.5 Traversing traces

You've now written several functions that traverse traces recursively, but you always knew the general structure of the trace you were traversing. For cases when you don't know ahead-of-time how many children a trace will have -- or what its subtraces will be called -- the `(trace-keys tr)` function comes in handy. It lists the names of all subtraces of a given trace.

In [42]:
(trace-keys example-trace)

("environment" "generative-source" "name")

Note that Metaprob has a version of Clojure's `map` function, which often comes in handy when dealing with lists of subtrace names:

In [43]:
; Get values at each subtrace
(map (gen [name] (trace-get example-trace name)) (trace-keys example-trace))

(#namespace[tutorial] "gen" "example-trace--550373278")

Metaprob also has `apply`, which applies a procedure to a list of arguments:

In [44]:
; Concatenate the names of a trace's children
(apply clojure.core/str (trace-keys example-trace))

"environmentgenerative-sourcename"

### Exercise: Automatic sizing for trace visualizations
Currently, the `plot-trace` function draws trace diagrams of whatever size the user specifies. In this exercise, you'll write a function `smart-plot-trace` that automatically decides on a size for the trace, based on its breadth and depth:

1. Begin by defining `(trace-depth tr)`, which recursively computes the maximum depth of a trace.
2. Next, define `(trace-breadth tr i)`, which gives the trace's breadth at level `i`. (Hint: sum the widths of the sub-traces at level `i-1`.)
3. Define `(max-trace-breadth tr)` which gives the maximum breadth of a trace at any of its levels. (Note: the implementation we are suggesting in this exercise, which simply calls `trace-breadth` once for each level of the trace, is $O(n^2)$; there are certainly more efficient algorithms!)
4. Write a function `smart-plot-trace` that calls `plot-trace`, passing in reasonable values for diagram width and height (based on the depth and breadth of the trace).

In [45]:
; Bring in some useful Clojure functions/macros
(clojure.core/refer-clojure :only '[empty? max])

In [46]:
; Solution
(define trace-depth
  (gen [tr]
    (if (empty? (trace-keys tr))
        1
        (apply max 
            (map (gen [k]
                     (+ 1 (trace-depth (trace-subtrace tr k)))) (trace-keys tr))))))

(define trace-breadth
   (gen [tr i]
     (if (= i 0)
         1
         (apply 
             + 
             (map (gen [k]
                     (trace-breadth (trace-subtrace tr k) (- i 1))) (trace-keys tr))))))

(define max-trace-breadth
   (gen [tr]
     (apply max (map (gen [i] (trace-breadth tr i)) (range (trace-depth tr))))))

(define smart-plot-trace
   (gen [tr]
     (plot-trace tr (* (trace-depth tr) 100)
                    (* (max-trace-breadth tr) 80))))

#'tutorial/smart-plot-trace

In [47]:
(smart-plot-trace (gen [x y] (+ x (* 3 y))))

In [48]:
(smart-plot-trace (trace-get flip '("implementation")))

In [49]:
(smart-plot-trace
  (nth (infer
        :procedure comp/infer-apply
        :inputs [(gen [] (flip 0.5)) [] (trace) (trace) true]
        :output-trace? true) 1))

### Challenge Exercise: Delete subtrace

Write a function `(delete-subtrace tr adr)` that completely deletes the addressed subtrace from a trace.

### Challenge Exercise: All addresses with values
Write a function `(all-addresses-with-values tr)` which returns a list of addresses at which `tr` has a value. (The built-in function `addresses-of` can already do this.)

In [50]:
(define trace-set-values-at
  (gen [t & pairs]
    (if (empty? pairs)
        t
        (block
          (define addr (first pairs))
          (define value (first (rest pairs)))
          (define others (rest (rest pairs)))
          (apply trace-set-values-at (clojure.core/cons (trace-set t addr value) others))))))

#'tutorial/trace-set-values-at

# 3. Probabilistic modeling, random choices, and trace-based inference

So far, every procedure we've written has been _deterministic_: the outputs are completely determined by the inputs, and are the same every time we run the code. Now, we turn our attention to _non-deterministic_ procedures, with which we can do the sort of probabilistic modeling that Metaprob was made for.

## 3.1 A few simple models

There are many interesting questions that can be phrased using the language of probability:
- How much can I expect my lifetime earnings to improve if I get a college degree?
- What is the probability that this treatment will work for this patient?
- With what confidence can we say that these two documents were produced by the same author?

Before we can even begin to answer these questions, though, we need a _model_: a (sometimes simplified) mathematical description of how we imagine the real-world processes behind these quesitons (the job market, human biology, authorship) actually work. 

One of the key insights in probabilistic programming is that writing a program to simulate a process is a good way to specify our probabilistic models.

A few examples will make this clearer.

### Scenario 1: A tricky coin
Suppose you are interviewing for a position at a financial firm, and your interviewer produces a coin. He flips the coin 100 times, and you observe 100 heads. What is the probability, he asks you, that the next flip, too, will come up heads?

Your answer will depend on your _model_ of the random process you're observing.

Here is one possible model:

In [51]:
(define coin-model-1
  (gen []
    (replicate 100 (gen [] (flip 0.5)))))

#'tutorial/coin-model-1

Here, we have used `replicate`, which takes a number (100, in this case) and a 0-argument procedure, and runs the procedure that many times, returning a list of the results from each run. In this model, we hypothesize that the underlying process at work is as follows: a fair coin is flipped 100 times. We are 100% sure that the coin is fair: hence the constant `0.5`.

Another model might not be so trusting of the interviewer:

In [52]:
(define coin-model-2
  (gen []
    (define p (uniform 0 1))
    (replicate 100 (gen [] (flip p)))))

#'tutorial/coin-model-2

Here, we imagine that the interviewer randomly decided on the coin's weight (a number between 0 and 1) before we entered the room, and procured a biased coin to flip for us.

Perhaps we don't really believe that's possible, though: how does one get a "weighted coin" anyway? Maybe we really believe only three possibilities going in: the coin is fair, a "heads" is painted on both sides, or a "tails" is painted on both sides. Let's try to model that scenario:

In [53]:
(define coin-model-3
  (gen []
    (define p (uniform-sample [0 0.5 1]))
    (replicate 100 (gen [] (flip p)))))

#'tutorial/coin-model-3

### Scenario 2: The Wage Gap

Research shows that men are consistently paid more than women for the same work, and also that no matter their gender, taller people are paid more. We also know that on average, men tend to be taller than women. That research in hand, we may want to answer questions like:
- If all we know about someone is that they are 6' tall, what might we expect their salary to be? Their sex?
- What is the probability that a woman's salary will be within some range?
- and many more...

We can encode the research on the subject into a model (though, beware, this model is not _actually_ based on a comprehensive review of the relevant literature; we use it because it is easy to reason about intuitively):

In [54]:
(define wage-gap-model
  (gen []
    (define sex (uniform-sample [:m :f]))
    (define height-mean (if (= sex :m) 70 64))
    (define height (gaussian height-mean 4))
    (define base-salary-mean (if (= sex :m) 49000 39000))
    (define height-adjusted-salary-mean (+ base-salary-mean (* (- height height-mean) 500)))
    (define salary (gaussian height-adjusted-salary-mean 2000))
    salary))

#'tutorial/wage-gap-model

In [55]:
(wage-gap-model)

39963.03098468274

# TODO: Exercises for defining other models.
# TODO: Real data / citations to back this up.

## 3.2 Monte Carlo estimation for expectations

Many queries can be formed as questions about averages. For example, we might wonder, under our wage gap model, what the average salary is (regardless of gender or height). Or, under our various coin flip models, we might ask what the average number of heads is. More interestingly, we can also ask about averages under some condition: given that the first ten flips were heads, what is the average total number of heads?

In these contexts, the word "average" is not quite accurate. We can take averages of actual lists of numbers, but our models are not lists of numbers: they are stochastic processes that _produce_ numbers (and other data).

What we are really asking about is something called an "expectation." If we have a generative probabilistic model `p`, and a function `f` that takes in samples from `p` and outputs numbers, we say that the _expectation of `f` with respect to the distribution `p`_ is the limit of the average, as our number of samples goes to infinity, of samples of `(f (p))`.

For example, here is a function `count-heads`, which will play the role of `f`. It takes a sample from one of our coin flip models, and returns the number of heads observed.

In [56]:
(define count-heads
  (gen [flips]
    (length (filter (gen [x] x) flips))))

#'tutorial/count-heads

In [57]:
(count-heads (coin-model-1))

54

In [58]:
(count-heads (coin-model-2))

41

In [59]:
(count-heads (coin-model-3))

100

Above, we've taken one sample of `(f (p))` for each coin flip model `p`. What we'd like to do is take the expectation -- the expected average over infinitely many samples. There are ways to calculate this number exactly; indeed, you probably have a good intuition for what it should be in the three cases above. But in more complex models, that can be harder, so we will focus here on estimation.

The simplest technique for estimating an expectation is taking a lot of samples and averaging them. Let's write that up:

In [60]:
(define avg (gen [l] (clojure.core/float (/ (apply + l) (length l)))))
(define mc-expectation
 (gen [p f n]
   (avg (map f (replicate n p)))))

#'tutorial/mc-expectation

In [61]:
(mc-expectation coin-model-1 count-heads 1000)

50.036

In [62]:
(mc-expectation coin-model-2 count-heads 1000)

49.157

In [63]:
(mc-expectation coin-model-3 count-heads 1000)

49.344

We can also use this simple technique to ask about probabilities of events. For example, to estimate the probability under each model of attaining at least 75 heads, we can estimate the expectation of a function that returns one when the condition holds, and zero otherwise. We call this an _indicator_ function:

In [595]:
(define indicator
 (gen [condition] (gen [x] (if (condition x) 1 0))))

#'tutorial/indicator

In [596]:
(mc-expectation coin-model-1 (indicator (gen [l] (>= (count-heads l) 75))) 2000)

0.0

In [597]:
(mc-expectation coin-model-2 (indicator (gen [l] (>= (count-heads l) 75))) 2000)

0.258

In [598]:
(mc-expectation coin-model-3 (indicator (gen [l] (>= (count-heads l) 75))) 2000)

0.333

#### Exercise: Pitfalls of MC estimation

Under what conditions is `mc-expectation` likely to give a misleading answer? (Hint: under `coin-model-1`, is the probability of 75 heads really 0? What is the expected amount won in a lottery?)

Let's try to apply our `mc-expectation` procedure to our wage-gap model, first to answer the question, "Under our model, what is the average person's height?"

We immediately hit a snag: last time, the number of heads was a straightforward transformation of `coin-model-i`'s return value. Here, `wage-gap-model` returns only a salary, from which we cannot recover the height of the person whose salary it is.

We could alter the model code to return more values—say, `[sex height salary]` as a vector—but there's another way.

So far, we've seen only one way of interacting with a Metaprob procedure: running it. But Metaprob provides a lot more flexibility than that, via its powerful `infer` operator.

One of the most basic things we can ask `infer` to do for us is _trace the random choices_ made by a Metaprob procedure:

In [64]:
(infer
  :procedure wage-gap-model
  :output-trace? true)

[38889.26407585003 {0 {"sex" {"uniform-sample" {:value :f}}}, 2 {"height" {"gaussian" {:value 68.21900501506029}}}, 5 {"salary" {"gaussian" {:value 38889.26407585003}}}} 0]

`infer` runs the given procedure and returns three values: the procedure's return value (in this case, a salary), an _execution trace_ that records the random choices made during the procedure's execution, and a "score" (which is zero in this case — we'll ignore it for now).

Let's look closer at the second return value—the execution trace.

In [65]:
(define [salary tr _]
 (infer
  :procedure wage-gap-model
  :output-trace? true))
(smart-plot-trace tr)

`wage-gap-model` makes three random choices during its execution, and as such, its execution trace records three values. The addresses at which they are stored reflect where in `wage-gap-model`'s soure code they were made. For example, the first recorded value is the choice `:f`, at address `'(0 "sex" "uniform-sample")`. This indicates that the choice in question was made on line 0, while defining the variable `sex`, using the procedure `uniform-sample`.

Let's rewrite our `mc-expectation` procedure to run `f` on the _execution trace_ of `p`, rather than its return value:

In [66]:
(define mc-expectation-v2
 (gen [p f n]
  (avg
    (map f (replicate n
      (gen []
       (define [_ tr _] (infer :procedure p :output-trace? true)) tr))))))

#'tutorial/mc-expectation-v2

We can now answer our question about average height:

In [67]:
(define person-height
  (gen [t] (trace-get t '(2 "height" "gaussian"))))

#'tutorial/person-height

In [68]:
(mc-expectation-v2 wage-gap-model person-height 1000)

66.87359

This is quite close to the true answer of 67.

It can be cumbersome to type the addresses manually. Let's give them short names:

In [69]:
(define [sex-adr height-adr salary-adr] (addresses-of tr))

#'tutorial/salary-adr

Estimating an expectation can be useful, but one of the advantages of Bayesian inference is that we can also quantify our uncertainty. By taking a lot of samples and plotting them in a histogram, we can get a sense of what an entire distribution looks like.

In [106]:
(clojure.core/use 'metaprob.examples.tutorial.jupyter :reload)

(histogram "Model 1" 
  (map count-heads (replicate 100 coin-model-1)) [0 100] 1)

In [94]:
(histogram "Model 2" (map count-heads (replicate 100 coin-model-2)) [0 100] 1)

In [111]:
(histogram "Model 3" (map count-heads (replicate 400 coin-model-3)) [0 100] 1)

## 3.3 Expectations under interventions

Although we can sometimes learn interesting things by taking expectations with respect to our models, the most exciting questions usually require _conditioning_ on some piece of data. We might wonder what a person's expected salary is _given that_ she is a woman, or with what probability a person is a woman _given that_ they make $46,000 per year.

Let's look at the simpler of those two questions first. One way to estimate an answer would be to run many simulations in which we force the model to assign to `sex` the value `:f`, then average the resulting salaries. 

We can _intervene on_ our model and force it to make a certain choice by passing an _intervention trace_ to `infer`. `infer` runs the model as usual, but when it encounters a random choice with an address that appears in the intervention trace, instead of generating a new choice, it simply reuses the one in the intervention trace. 

In the code below, notice the modified call to `infer`, and also the manner in which we construct an intervention trace using the address of the choice we wish to control:

In [112]:
; First, create a version of mc-expectation that accepts an intervention trace
(define mc-expectation-v3
 (gen [p intervene f n]
   (avg
    (map f (replicate n
      (gen []
        (define [_ tr _] (infer 
                           :procedure p 
                           :intervention-trace intervene   ; NEW
                           :output-trace? true)) tr))))))

; Create our intervention trace
(define ensure-female
  (trace-set (trace) sex-adr :f))

; Create our accessor (the f to take the expectation of)
(define get-salary
   (gen [t] (trace-get t salary-adr)))

; Run the query
(mc-expectation-v3 wage-gap-model ensure-female get-salary 1000)

39048.8

This is quite a limited technique, however. If we wanted to ask, for instance, about someone's expected height, given that their salary is $46,000, an intervention would not do the trick. Do you see why? Here's what that (wrong) code would look like:

In [113]:
; Create our intervention trace
(define ensure-46k
  (trace-set (trace) salary-adr 46000))

; Create our accessor (the f to take the expectation of)
(define get-height
   (gen [t] (trace-get t height-adr)))

; Run the query
(mc-expectation-v3 wage-gap-model ensure-46k get-height 1000)

67.096085

In [114]:
; Run the query with NO intervention — just calculates average height, regardless of salary
(mc-expectation-v3 wage-gap-model (trace) get-height 1000)

66.930626

Why are these two numbers (almost) the same? Can you characterize the situations when interventions _do_ work as a method of conditioning?

## 3.4 Conditioning with rejection sampling

In order to answer more sophisticated conditional queries, we need to turn to another technique: rejection sampling. In rejection sampling, we run our model over and over until we get a sample that satisfies our condition. To estimate a _conditional expectation_ using rejection sampling, we can do this $n$ times, and average the values `f` takes on the $n$ samples that satisfy our condition.

In [115]:
(define rejection-sample
  (gen [p condition f]
    (define [_ t _] (infer :procedure p :output-trace? true))
    (if (condition t) (f t) (rejection-sample p condition f))))

(define rejection-expectation
  (gen [p condition f n]
    ; Define a modified version of p which tries again
    ; if its first execution doesn't satisfy the condition.
    ; Call it n times
    (avg (replicate n (gen [] (rejection-sample p condition f))))))

#'tutorial/rejection-expectation

Now, suppose we want to know the expected height of someone making $\$56,000$ — a very high salary under our model. If we expressed our condition as "a salary of exactly $\$56,000$", it would take a very long time to get even one sample. So instead, we ask about a small interval around $\$56,000$. It is still quite slow:

In [612]:
(rejection-expectation
    wage-gap-model
    (gen [t] (< 55500 (trace-get t salary-adr) 56500))
    (gen [t] (trace-get t height-adr))
    100)

76.84208

If we make the same query, but for an interval around $\$46,000$, it runs much faster:

In [613]:
(rejection-expectation
    wage-gap-model
    (gen [t] (< 45500 (trace-get t salary-adr) 46500))
    (gen [t] (trace-get t height-adr))
    100)

67.55371

This is because our new condition is much more probable. As such, it takes fewer "tries" to produce a sample that satisfies it.

The two queries below estimate (a) the probability that someone making between $\$45,500$ and $\$46,500$ is female, and (b) the average height of a woman whose salary is within that range. Before you run each one, try to estimate (roughly, qualitatively) how fast or slow it will be.

In [614]:
(rejection-expectation
    wage-gap-model
    (gen [t] (< 45500 (trace-get t salary-adr) 46500))
    (indicator (gen [t] (= (trace-get t sex-adr) :f)))
    100)

0.15

In [615]:
(rejection-expectation
    wage-gap-model
    (gen [t] (and (= (trace-get t sex-adr) :f) (< 45500 (trace-get t salary-adr) 46500)))
    (gen [t] (trace-get t height-adr))
    100)

70.69408

With rejection sampling, we can begin to see the first applications of probabilistic modeling to _data analysis_. The idea is that we _condition_ our model's execution on the actual data we've observed, then ask questions about the _latent variables_ — the pieces we may not have observed directly.

As one example, consider the following model:

In [117]:
(define hybrid-coin-model
  (gen []
    (define which-model (uniform-sample [coin-model-1 coin-model-2 coin-model-3]))
    (which-model)))

#'tutorial/hybrid-coin-model

Given that we saw 100 heads, we can now ask about likely values of `which-model`. This allows us to pose questions like: if I saw 100 heads, which of my three possible explanations was most likely? And what are the chances the coin comes up heads next time?

In [118]:
(define count-heads-in-trace
  (gen [t]
    (apply + (map (gen [adr] (if (and (boolean? (trace-get t adr)) (trace-get t adr)) 1 0))
                  (addresses-of t)))))

#'tutorial/count-heads-in-trace

In [618]:
; Probability of Coin Model 3 if I saw 100 heads:
(rejection-expectation
    hybrid-coin-model
    (gen [t] (= (count-heads-in-trace t) 100))
    (indicator (gen [t] (= coin-model-3 (trace-get t '(0 "which-model" "uniform-sample")))))
    100)

0.96

In [619]:
; Estimated probability of next coin flip being heads if I saw 100 heads
(rejection-expectation
    hybrid-coin-model
    (gen [t] (= (count-heads-in-trace t) 100))
    (gen [t]
      (define actual-model (trace-get t '(0 "which-model" "uniform-sample")))
      (cond
        (= actual-model coin-model-1) 0.5
        (= actual-model coin-model-2) (trace-get t '(1 "which-model" 0 "p" "uniform"))
        (= actual-model coin-model-3) (trace-get t '(1 "which-model" 0 "p" "uniform-sample"))))
    100)

0.9999882

This is (in most runs) _almost_ 1, but indicates that there is still some uncertainty.

### 3.4.1 Aside: The posterior distribution

So far, we have been making "point estimates" to answer our queries, summarizing our answers as a single number. For example, suppose suppose I tell you I have flipped a coin 100 times and it came up heads more than sixty times. I ask you to guess how many times exactly it came up heads. Using the `hybrid-coin-model`, you could compute an expectation, which is one way to generate a single-number answer:

In [620]:
(rejection-expectation
    hybrid-coin-model
    (gen [t] (> (count-heads-in-trace t) 60))
    count-heads-in-trace
    100)

86.19

But one of the main advantages of the sort of probabilistic modeling we're doing is that we have access to much richer information. 

To introduce some terminology, the process of "Bayesian data analysis" consists of the following steps:

*Step 1*. First, express our prior beliefs about a generative process by encoding them into a model. The distribution over traces induced by that model is called the _prior_: it encodes our beliefs _prior_ to seeing any data. In our case, the prior distribution over "number of heads" looks like this:


In [121]:
(histogram "Prior" (replicate 500 (gen [] (count-heads (hybrid-coin-model)))) [0 100] 1)

As we can see by examining this plot, although we believe that any number of heads is theoretically possible, we would be much more surprised to see, say, 75 heads than 0, 50, or 100. We can also plot our prior beliefs about which of the three models is being used, and check that all three are basically equally likely:

In [123]:
(define model-number
 (gen [t] 
      (define which (trace-get t '(0 "which-model" "uniform-sample")))
      (cond 
        (= which coin-model-1) 1
        (= which coin-model-2) 2
        (= which coin-model-3) 3)))

(histogram "Prior on model choice" (replicate 500 (gen [] (model-number (nth (infer :procedure hybrid-coin-model :output-trace? true) 1)))) [1 3] 1)

*Step 2.* Condition on observed data, retrieving the _posterior distribution_ -- like the prior distribution, but updated to reflect our new beliefs after seeing the data. In this case, we can observe that over 60 coins came up heads:

In [124]:
(define plot-posterior-rejection
  (gen [p condition f n [min-val max-val] step]
    (histogram "Posterior" (replicate n (gen [] (rejection-sample p condition f))) [min-val max-val] step)))

#'tutorial/plot-posterior-rejection

In [125]:
(plot-posterior-rejection hybrid-coin-model (gen [t] (> (count-heads-in-trace t) 60)) count-heads-in-trace 100 [0 100] 1)

As we can see, our point estimate above will in most cases be quite misleading!

We can also plot the posterior on `which-model`:

In [126]:
(plot-posterior-rejection hybrid-coin-model (gen [t] (> (count-heads-in-trace t) 60)) model-number 500 [1 3] 1)

Or use it to get a sense of the salary distribution for people over 70 inches tall:

In [133]:
(plot-posterior-rejection
  wage-gap-model
  (gen [t] (> (trace-get t height-adr) 70))
  (gen [t] (trace-get t salary-adr))
  300
  [30000 60000] 500)

## 3.5 Conditioning with likelihood weighting

In the last two sections, we saw two ways of generating _conditional_ samples:

1. Intervening on our model to force the "random choices" of interest to take on the values we want them to. This is fast, but is only correct in select cases, where the random choices we're conditioning on don't depend on any prior random choices.

2. Rejection sampling. This gives exact samples from the posterior; the only problem is how slow it is.

We'd like the speed of (1) with the correctness of (2). As a step toward getting there, let's think a bit more about why exactly (1) is usually incorrect.

Suppose we are trying to estimate the expected salary for someone who is 70 inches tall (5' 10"). We can get a (slow) estimate via rejection sampling:

In [627]:
(rejection-expectation
    wage-gap-model
    (gen [t] (< 69.9 (trace-get t height-adr) 70.1))
    (gen [t] (trace-get t salary-adr))
    100)

47606.812

If we do the same using our intervention trace method, we get a fast but inaccurate answer:

In [628]:
(mc-expectation-v3
    wage-gap-model
    (trace-set (trace) height-adr 70)
    (gen [t] (trace-get t salary-adr))
    100)

45272.16

Why is the answer lower in the intervention trace method? Under our model, men are more likely to be 70 inches tall than women, so if we know someone is 70 inches tall, we might expect it's more likely for them to be a man, and—incorporating the wage gap—we might expect their salary to be higher. The rejection sampling algorithm takes this into account. It simply waits for 100 samples where the height was 70 inches, the majority of which will be men; the salary will have been higher for those samples, so the expected salary will be higher.

The intervention method, on the other hand, is impatient, and immediately sets height to 70 in every run, severing the tie between gender and height. That means that our collection of 100 samples will contain just as many 70-inch-tall women as 70-inch-tall men, which is improbable under our model.

What if we made a compromise? We still artificially set the height of each sample to 70, but also measure how likely the model _would have been_ to choose 70 anyway—a likelihood that will be higher for `sex=:m` samples than for `sex=:f` samples. At the end, we take a _weighted average_ of the salaries, where each salary is weighted by the probability that the employee really would have been 70 inches (given the random choices we'd already made about them: their sex).

To implement this, `infer` has one final trick up its sleeve: we can provide a _target trace_, which works just like an intervention trace, but also causes the constrained choices to be _scored_. For each targeted choice, Metaprob computes the probability that it _would have_ made the same choice under the prior, and accumulates these numbers (as a sum of _log_ probabilities) into a _score_ that is returned, along with the model's output and execution trace, from `infer`.

We can use this feature to implement the likelihood weighting algorithm:

In [134]:
; Note -- we exponentiate the scores, because they are returned as log probabilities
(define weighted-samples
  (gen [proc target f n]
    (define samples (replicate n (gen [] (infer :procedure proc :target-trace target :output-trace? true))))
    (map (gen [[o t s]] [(f t) (exp s)]) samples)))

(define lh-weighting
   (gen [proc target f n]
    (define samples (weighted-samples proc target f n))
    (/ (apply + (map (gen [[v s]] (* v s)) samples))
       (apply + (map (gen [[_ s]] s) samples)))))

#'tutorial/lh-weighting

Let's apply the technique to the question we explored above, to estimate the expected salary for someone who is 70 inches tall:

In [630]:
(lh-weighting wage-gap-model
    (trace-set (trace) height-adr 70)
    (gen [t] (trace-get t salary-adr))
    100)

47230.89141943718

This brings us quite close to the rejection-sampled estimate, and much faster!

### 3.5.1 Importance sampling-resampling
This same idea can be used to produce (and plot) samples from the posterior. The trick is this: to generate a single sample from the (approximate) posterior, we sample some number $n$ of "particles" with likelihood weights. We then choose one of them at random, giving each one probability proportional to its score. This variant of the technique is called "importance sampling-resampling." Let's implement it:

In [135]:
(define importance-sample-resample
 (gen 
   [proc target f n-particles]
    ; Generate particles
    (define particles (weighted-samples proc target f n-particles))
    ; Choose one at random, with prob. proportional to importance weight
    (define which-particle (categorical (map (gen [[o s]] s) particles)))
    ; Return only the sampled value (not the score)
    (define [sampled-value _] (nth particles which-particle))
    sampled-value))

#'tutorial/importance-sample-resample

In [136]:
(define plot-posterior-importance
  (gen [p target f particles-per-sample n [min-val max-val] step]
    (histogram "Posterior (Importance Sampling)" (replicate n (gen [] (importance-sample-resample p target f particles-per-sample))) [min-val max-val] step)))

#'tutorial/plot-posterior-importance

One last time, we consider the problem of estimating someone's salary given that they are 70 inches tall. All our above computations gave an estimate of around $\$47,500$, but plotting the entire posterior distribution makes it clear the situation is a bit more complex.

Let's first plot using only one particle. In this case, we are not getting the posterior at all: we are just sampling from the prior with an intervention (setting height=70).

In [138]:
; 1 particle -- sampling from the (intervened) prior
(plot-posterior-importance 
  wage-gap-model
  (trace-set (trace) height-adr 70)
  (gen [t] (trace-get t salary-adr))
   1 100
  [30000 60000] 500)

Using 20 particles per sample, we get a better approximation to the posterior, which we see is bimodal (has two high-probability regions):

In [139]:
; 20 particles -- better approx to posterior
(plot-posterior-importance 
  wage-gap-model
  (trace-set (trace) height-adr 70)
  (gen [t] (trace-get t salary-adr))
   20 100
  [30000 60000] 500)

To check our work, we can revert to the rejection sampling technique to get exact samples from the posterior:

In [141]:
(plot-posterior-rejection
    wage-gap-model
    (gen [t] (< 69.9 (trace-get t height-adr) 70.1))
    (gen [t] (trace-get t salary-adr))
    100
    [30000 60000] 500)

This looks pretty similar. The upshot is that the importance sampling-resampling method gives us an accurate picture of the posterior in a much more efficient manner.

## 3.6 Putting it all together

In this section, we'll review all the techniques we've learned on a new example model, and look at ways of extending our models with model-specific "custom inference procedures."

The model we're using in this section is a _bivariate Gaussian_. A bivariate Gaussian models pairs of variables that each follow a "bell curve," but are correllated with one another — for example, a person's height and weight. 

Ignoring weight, people's heights are distributed normally, with a high probability at the mean and lower probability the further you get away from the mean. Here we plot samples from a _univariate_ Gaussian with mean 70 and _standard deviation_ 3; standard deviation is a measure of the "spread" of a distribution.

In [150]:
(histogram
    "Gaussian (height)" 
    (replicate 5000 (gen [] (gaussian 70 3)))
    [40 100])

And weight may also be distributed normally, with different mean and standard deviation:

In [151]:
(histogram "Gaussian (weight)" (replicate 5000 (gen [] (gaussian 150 20))) [80 220])

So one idea for how to model a person's height and weight would be to draw them individually from these distributions:

In [145]:
(define person-v1
  (gen [] 
    (define height (gaussian 70 3))
    (define weight (gaussian 150 20))
    [height weight]))

#'tutorial/person-v1

But there's something wrong about this model: the height and weight are totally independent. In real life, a tall person is likely to weigh more; in our model, this is not the case.

Enter the _bivariate Gaussian_: in addition to a pair of means, it also takes a _covariance matrix_, a symmetric 2x2 matrix $\Sigma$, where $\sigma_{11}$ and $\sigma_{22}$ are the variances (squared standard deviations) or variables 1 and 2, but $\sigma_{12} = \sigma_{21}$ is the _covariance_ of the variables with one another. A positive covariance indicates direct correlation; a negative covariance indicates inverse correlation. Zero covariance means the variables are completely independent. (Note: in general, zero covariance does _not_ imply independence, but in a bivariate Gaussian model, it does.)

There are a number of ways to sample from a bivariate Gaussian. Below, we model the distribution as follows: first, we generate the first variable according to its mean and standard deviation (e.g., generate a height). Then, we figure out (based on how far that sample was from the mean, and the covariance), the expectation of the second variable (e.g., given the height we generated, what would we expect the weight to be?). Using the variance and covariance information, we can also calculate how much of a spread we expect to find around that expected value. Now that we have a new mean and covariance, we can sample the second variable (generate a weight). Here's the model:

In [146]:
(define biv-gauss
  (gen [[mu1 mu2] [[sigma_11 sigma_12] [sigma_21 sigma_22]]]
       ; draw the first variable normally 
       (define x1 (gaussian mu1 (sqrt sigma_11)))
       ; calculate the new mean and variance of the second variable
       (define x2_mean (+ mu2 (* (- x1 mu1)  (/ sigma_12 sigma_11))))
       (define x2_var (/ (- (* sigma_11 sigma_22) (* sigma_12 sigma_12)) sigma_11))
       ; draw the second variable
       (define x2 (gaussian x2_mean (sqrt x2_var)))
       ; return both
       [x1 x2]))

#'tutorial/biv-gauss

In [147]:
(define person-v2
  (gen []
    ; 9 = 3^2, 400 = 20^2; we pass variance instead of standard deviation here
    ; setting covariance to 0 (instead of 40) would exactly recover the person-v1 model above.
    (biv-gauss [70 150] [[9 40] [40 400]])))
(define [height-adr weight-adr] (addresses-of (nth (infer :procedure person-v2 :inputs [] :output-trace? true) 1)))

#'tutorial/weight-adr

Let's create a scatter plot to visualize the sorts of samples this generates:

In [162]:
(custom-scatter-plot "Bivariate Gaussian"
  (replicate 100 person-v2)
  "cross" "white" "blue" [[60 80] [90 210]])

We can also write an assessor function that exactly computes the probability density of a specific point. Graphing that function in a contour plot can help us better understand how the idea of a "bell curve" generalizes to two dimensions:

In [164]:
(define biv-gaussian-density
  (gen [[mu1 mu2] [[s11 s12] [s21 s22]]]
    (gen [x y]
      (define x-from-mu (- x mu1))
      (define y-from-mu (- y mu2))
      (/
       (exp
        (/
          (- (+
              (/ (* x-from-mu x-from-mu) s11)
              (/ (* y-from-mu y-from-mu) s22))     
            (/ (* 2 s12 x-from-mu y-from-mu) (* s11 s22)))
          (* -2 (/ (- (* s11 s22) (* s12 s12)) (* s11 s22)))))
        (* 2 3.1415926 (sqrt (- (* s11 s22) (* s12 s12))))))))

(scatter-with-contours
  "Bivariate Gaussian"
  (replicate 100 person-v2); [] ;(replicate 100 (gen [] (biv-gauss [0 0] [[1/5 0] [0 1/5]])))
  (biv-gaussian-density [70 150] [[9 40] [40 400]])
  [[60 80] [90 210]])

In the plot, blue regions correspond to areas of low probability, and red regions to areas of high probability. Equiprobability contours are ellipses. The wider or taller the ellipses, the more uncertainty there is. The "off-center" rotation you see tells us that variables a and b are not independent: if we know one, it changes our gess about the other.

In [173]:
(scatter-with-contours
  "Bivariate Gaussian"
  [] ;(replicate 100 (gen [] (biv-gauss [0 0] [[1/5 0] [0 1/5]])))
  (biv-gaussian-density [70 150] [[15 0] [0 200]])
  [[60 80] [90 210]])

In [643]:
; Changing the means to movement of the circle
; Changing the variances to expand / contract in one dimension
; Changing covariance "rotates" the ellipse.

Now, suppose we know someone's weight and want to guess their height. We could use rejection sampling, but it's slow:

We can fix this with likelihood weighting:

In [644]:
; Get the expected weight for someone 80 inches tall
(lh-weighting
    (gen [] (biv-gauss [70 150] [[9 40] [40 400]]))
    (trace-set (trace) weight-adr 100)
    (gen [t] (trace-get t height-adr))
    100)

65.09907723760845

Or plot the posterior 

## 3.6 MH for more efficient samples

## 3.7 Custom inference procedures

# Draft code, not part of tutorial

In [141]:
(lh-weighting
    wage-gap-model
    (trace-set-values-at (trace) 
                         '(5 "salary" "gaussian") 40000
                         '(2 "height" "gaussian") 64)
    (gen [t] (if (= :f (trace-get t '(0 "sex" "uniform-sample"))) 1 0))
    2000)

0.9957812810110319

## Custom proposals with the Bivariate Gaussian

In [215]:
(define biv-gaussian-model
  (gen []
    (define x (gaussian 0 1))
    (define y (gaussian 0 1))
    (define z (gaussian (+ x y) 1))
    z))

#'tutorial/biv-gaussian-model

In [216]:
(define biv-gaussian-custom
  (gen [[] intervene target o?]
    (define [x-adr y-adr z-adr] ['(0 "x" "gaussian") '(1 "y" "gaussian") '(2 "z" "gaussian")])
    (if (or (trace-has? intervene z-adr) (not (trace-has? target z-adr)))
        (infer :procedure biv-gaussian-model
               :intervention-trace intervene
               :target-trace target
               :output-trace? true)
        (block
            (define z (trace-get target z-adr))
            (define mu (/ z 3))
            (define [var covar] [2/3 -1/3])
            (define conditional-mean
                (gen [other-draw]
                     (+ mu (* (- other-draw mu) (/ covar var)))))
            (define first-draw
                (cond
                    (trace-has? target x-adr) (trace-get target x-adr)
                    (trace-has? target y-adr) (trace-get target y-adr)
                    true (gaussian mu (sqrt var))))
            (define second-mean (+ mu (* (- first-draw mu) (/ covar var))))
            (define second-var  (/ (- (* var var) (* covar covar)) var))
            (define second-draw (gaussian second-mean (sqrt second-var)))
            (define x
                (cond
                    (trace-has? target x-adr) (trace-get target x-adr)
                    (trace-has? intervene x-adr) (trace-get intervene x-adr)
                    (trace-has? target y-adr) second-draw
                    true first-draw))
            (define y
                (cond
                    (trace-has? target y-adr) (trace-get target y-adr)
                    (trace-has? intervene y-adr) (trace-get intervene y-adr)
                    true second-draw))
            
            [z (trace-set-values-at (trace) x-adr x y-adr y z-adr z) 1]))))

#'tutorial/biv-gaussian-custom

In [217]:
(define biv-gaussian-inf
    (inf "biv-gaussian" biv-gaussian-model biv-gaussian-custom))

#'tutorial/biv-gaussian-inf

In [218]:
(infer :procedure biv-gaussian-inf
       :target-trace  (trace-set-values-at (trace) '(2 "z" "gaussian") 3)
       :output-trace? true)

[3 {0 {"x" {"gaussian" {:value 0.34943171527130845}}}, 1 {"y" {"gaussian" {:value 1.2825158373809544}}}, 2 {"z" {"gaussian" {:value 3}}}} 1]

In [241]:
(apply + (replicate 10
  (gen [] ((gen [x] (* x x)) (- 1 (lh-weighting
    biv-gaussian-inf
    (trace-set-values-at (trace) '(2 "z" "gaussian") 3)
    (gen [t] (trace-get t '(0 "x" "gaussian")))
    500))))))

0.029003554761523

In [64]:
(define causal-biv-gauss
  (gen [[mu1 mu2] [[sigma_11 sigma_12] [sigma_21 sigma_22]]]
        (define x1 (gaussian mu1 sigma_11))
        (define x2_mean (+ mu2 (* (- x1 mu1)  (/ sigma_12 sigma_11))))
        (define x2_var (/ (- (* sigma_11 sigma_22) (* sigma_12 sigma_12)) sigma_11))
        (define x2 (gaussian x2_mean x2_var))
       [x1 x2]))

#'tutorial/causal-biv-gauss

In [96]:
(define custom-biv-gauss
  (gen [[[mu1 mu2] [[sigma_11 sigma_12] [sigma_21 sigma_22]]]
          intervene
          target
          out?]
       (if (and (trace-has? target "x2") (not (trace-has? intervene "x1")) (not (trace-has? target "x1")))
           (block
               (define new-target (trace "x1" (trace-get target "x2")))
               (define new-intervene (if (trace-has? intervene "x2") (trace "x1" (trace-get intervene "x2")) (trace)))
               (define [[x2 x1] _ _] 
                   (custom-biv-gauss [[mu1 mu2] [[sigma_11 sigma_12] [sigma_21 sigma_22]]]
                                     new-intervene
                                     new-target
                                     out?))
               [[x1 x2] (trace "x1" x1 "x2" x2) 1])
           (block
               (define x1 (cond
                            (trace-has? intervene "x1") (trace-get intervene "x1")
                            (trace-has? target "x1") (trace-get target "x1")
                            true (gaussian mu1 sigma_11)))
               (define x2_mean (+ mu2 (* (- x1 mu1)  (/ sigma_12 sigma_11))))
               (define x2_var (/ (- (* sigma_11 sigma_22) (* sigma_12 sigma_12)) sigma_11))
               (define x2 (cond
                            (trace-has? intervene "x2") (trace-get intervene "x2")
                            (trace-has? target "x2") (trace-get target "x2")
                            true (gaussian x2_mean x2_var)))
                [[x1 x2] (trace "x1" x1 "x2" x2) 1]))))

#'tutorial/custom-biv-gauss

In [104]:
(custom-biv-gauss [[0 0] [[1 -0.8] [-0.8 1]]] (trace) (trace "x2" 5) true)

[[-3.9548142169206955 5] {"x2" {:value 5}, "x1" {:value -3.9548142169206955}} 1]

In [None]:
(define bivariate-gaussian
  (inf 
    "biv-gaussian"
    causal-biv-gauss
    (gen [[[mu1 mu2] [[sigma_11 sigma_12] [sigma_21 sigma_22]]]
          intervene
          target
          out?]
      (cond
        (and (trace-has? target "x1") (trace-has? target "x2"))  
        [[(trace-get target "x1") (trace-get target)]]  
        )
        (define x1 (gaussian mu1 sigma_11))
        (define x2_mean (+ mu2 (* (- x1 mu1)  (/ sigma_12 sigma_11))))
        (define x2_var (/ (- (* sigma_11 sigma_22) (* sigma_12 sigma_12)) sigma_11))
        (define x2 (gaussian x2_mean x2_var)))))

# 4. Finite Mixture Models and Cross-Cat

In [None]:
Sex, weight, height -- the value of mixture modeling.
CrossCat

In [269]:
(define f (gen [x y] x))
(infer :procedure
       (gen [] 3 (f ((uniform-sample [f f]) (flip 0.2) (flip 0.4)) (f (flip 0.2) (flip 0.4))))
       :output-trace? true)

[false {1 {1 {0 {"uniform-sample" {:value #function[clojure.lang.AFunction/1]}}, 1 {"flip" {:value false}}, 2 {"flip" {:value true}}}, 2 {1 {"flip" {:value false}}, 2 {"flip" {:value true}}}}} 0]

In [491]:
;; (define my-map
;;  (gen [f l]
;;     (if (empty-trace? l)
;;         (empty-trace)
;;         (trace :value (f (first l)) "rest" (** (my-map f (rest l)))))))
(define my-map
   (gen [f l]
     (lift
       (gen []
         (if (empty-trace? l)
           (empty-trace)
           (clojure.core/cons 
              (f (first l))
              (lift my-map [f (rest l)] (list "lift") (list)))))
        [] 
        (list "else" 1 "f") 
        (trace) 
        (list "else" 2 "lift") 
        (list "rest"))))

(define my-map-2
  (customize-trace
    (gen [f l]
      (if (empty-trace? l) 
          (trace) 
          (clojure.core/cons (f (first l)) (my-map-2 f (rest l)))))
      
    (list "else" 1 "f")            (list)
    (list "else" 2 "my-map-2")     (list "rest")))


(define coin-model-1a
 (customize-trace
    (gen [] (replicate 100 (gen [] (flip 0.5))))
    (list "replicate" "map") (list)))

(define coin-model-1b
  (customize-trace
    (gen [] 
         (define p (uniform 0 1))
         (replicate 100 (gen [] (flip p))))
    (list 1 "replicate" "map") (list)
    (list 0 "p" "uniform") (list "p")))

(define coin-model-1c
  (customize-trace
    (gen []
      (define p (uniform-sample [0 0.5 1]))
      (replicate 100 (gen [] (flip p))))
    (list 1 "replicate" "map") (list)
    (list 0 "p" "uniform-sample") (list "p")))

(define hybrid-coin-model-a
 (gen []
    (define which-model (uniform-sample [coin-model-1a coin-model-1b coin-model-1c]))
    (which-model)))

#'tutorial/hybrid-coin-model-a

In [148]:
(plot-posterior-importance hybrid-coin-model-a 
       (clojure.core/reduce 
        (gen [t n] (trace-set t (list 1 "which-model" n "f" "flip") true))
           (trace)
           (range 10))
        count-heads-in-trace
                           ;(gen [t] (define m (trace-get t (list 0 "which-model" "uniform-sample"))) (cond (= m coin-model-1a) 1 (= m coin-model-1b) 2 (= m coin-model-1c) 3))
 10 100 0 100)

CompilerException java.lang.RuntimeException: Unable to resolve symbol: hybrid-coin-model-a in this context, compiling:(/private/var/folders/s4/_p8gnd791cj6nzbd6x5jl34h0000gn/T/form-init8311415647614722019.clj:1:1) 


class clojure.lang.Compiler$CompilerException: 

In [490]:
(define reorganize-trace
  (gen [t pairs]
    (define unaffected-addresses
      (filter (gen [value-adr] 
                   (no-elem? (gen [moved-subtrace-adr] (list-starts-with? value-adr moved-subtrace-adr))
                     (map first pairs)))
              (addresses-of t)))
    
    (define starting-trace 
        (clojure.core/reduce
            (gen [new-t a] (trace-set new-t a (trace-get t a)))
            (trace)
            unaffected-addresses))
    
    (clojure.core/reduce
      (gen [new-t [source-adr target-adr]]
        (if (and (trace-has-subtrace? t source-adr)(not (empty-trace? (trace-subtrace t source-adr))))
            (trace-merge new-t (trace-set-subtrace (trace) target-adr (trace-subtrace t source-adr)))
            new-t))
      starting-trace
      pairs)))


(define customize-trace
  (gen [f & others]
    (define orig-new-pairs (clojure.core/partition 2 others))
    (define new-orig-pairs (map clojure.core/reverse orig-new-pairs))
    (inf (if (trace-has? f "name") (trace-get f "name") "reorganized") 
         f
         (gen [ins interv target out?]
            (define [new-interv new-target] 
              [(reorganize-trace interv new-orig-pairs)
               (reorganize-trace target new-orig-pairs)])
            (define [v o s] (infer :procedure f :inputs ins :intervention-trace new-interv :target-trace new-target :output-trace? true))
            (define new-outer (reorganize-trace o orig-new-pairs))
            [v (if out? new-outer (trace)) s]))))

#'tutorial/customize-trace

In [352]:
(define list-starts-with?
    (gen [l prefix] (or (empty? prefix) (and (not (empty? l)) (= (first l) (first prefix)) (list-starts-with? (rest l) (rest prefix))))))

(define no-elem? (gen [f l] (or (empty? l) (and (not (f (first l))) (no-elem? f (rest l))))))

(define lift
  (inf
    "lift"
    (gen [f is & others] (apply f is))
    (gen [[f i & pairs] interv target out?]
      ; construct new intervention and target traces
      (define new-intervene
        (clojure.core/reduce
            (gen [existing-trace [next-inner-addr next-outer-addr]]
              (if (trace-has-subtrace? interv next-outer-addr)
                  (trace-merge existing-trace (trace-set-subtrace (trace) next-inner-addr (trace-subtrace interv next-outer-addr)))
                  existing-trace))
            (if (trace-has-subtrace? interv "inner") (trace-subtrace interv "inner") (trace))
            (clojure.core/partition 2 pairs)))

      (define new-target
        (clojure.core/reduce
            (gen [existing-trace [next-inner-addr next-outer-addr]]
              (if (trace-has-subtrace? target next-outer-addr)
                  (trace-merge existing-trace (trace-set-subtrace (trace) next-inner-addr (trace-subtrace target next-outer-addr)))
                  existing-trace))
            (if (trace-has-subtrace? target "inner") (trace-subtrace target "inner") (trace))
            (clojure.core/partition 2 pairs)))


       ; call f on i, tracing it
       (define [v o s] (infer :procedure f :inputs i :intervention-trace new-intervene :target-trace new-target :output-trace? true))
     
       ; modify o
       (define pairs-processed (clojure.core/partition 2 pairs))
       (define inners (map first pairs-processed))

       (define unaffected (filter (gen [a] (no-elem? (gen [i] (list-starts-with? a i)) inners)) (addresses-of o)))
        (define new-inner 
           (clojure.core/reduce
               (gen [t a] (trace-set t a (trace-get o a)))
               (trace)
               unaffected))

         (define new-outer
             (clojure.core/reduce
                 (gen [old-outer [next-inner-adr next-outer-adr]]
                    (if (trace-has-subtrace? o next-inner-adr)
                      (trace-merge old-outer (trace-set-subtrace (trace) next-outer-adr (trace-subtrace o next-inner-adr)))
                      old-outer))
                 (trace)
                 pairs-processed))
         
         (define new-output-trace
             (if (empty-trace? new-inner)
                 new-outer
                 (trace-set-subtrace new-outer "inner" new-inner)))
      [v new-output-trace s])))

#'tutorial/lift

In [339]:
(smart-plot-trace
    ((infer :procedure coin-model-2 :output-trace? true) 1))

In [345]:
(define p (gen [] (lift coin-model-2 [] (list 1 "replicate" "map") (list "flips"))))
(smart-plot-trace
    ((infer :procedure p :output-trace? true) 1))

In [440]:
(define p 
  (customize-trace
    (gen [w] (if (< w 0.5) (flip w) (flip (- 1 w)))) 
    (list "then" "flip") (list) 
    (list "else" "flip") (list)))

;(smart-plot-trace (
        (infer :procedure my-map-2 :inputs [p (list 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1)]
               :target-trace (list false true true true true true true true true true false) :output-trace? true) 
 ;   1))

[(false true true true true true true true true true false) {:value false, "rest" {:value true, "rest" {:value true, "rest" {:value true, "rest" {:value true, "rest" {:value true, "rest" {:value true, "rest" {:value true, "rest" {:value true, "rest" {:value true, "rest" {:value false}}}}}}}}}}} -12.757720263816418]

In [368]:
(smart-plot-trace 
    ((infer :procedure (gen [] (flip 0.5)) :inputs [] :output-trace? true) 1)
    )