# Probabilistic Language of Thought

This tutorial performs inference for the FullBoolean grammar from Piantadosi et al. (2016). Parameters are the MAP values, pulled from their appendix (Figure E1).

[Piantadosi, S. T., Tenenbaum, J. B., & Goodman, N. D. (2016). The logical primitives of thought: Empirical foundations for compositional cognitive models. Psychological review, 123(4), 392.](https://doi.org/10.1037/a0039980)

In [1]:
from flippy import infer, condition, keep_deterministic, mem, \
    Categorical
from flippy.distributions import Dirichlet
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from frozendict import frozendict

# The prior over concepts

We start by specifying the prior distribution over concepts. The FullBoolean grammar contains predicates about the object (size, color, shape), and other boolean operations (like logical or, logical and, and logical not).

In [2]:
# The inverse temperature controls the sharpness of the right-hand side
# expansion probabilities. When rules that branch have lower Dirichlet weights
# than rules that do not, a higher inverse temperature leads to a bias towards
# shorter rules.
inv_temp = 1.0

# We set a maximum depth for expression, to avoid the generation of extremely large programs.
max_depth = 5

@mem
def expansion_dist(rhs_probs: dict) -> Categorical:
    dirichlet_weights = [v**inv_temp for v in list(rhs_probs.values())]
    probs = Dirichlet(dirichlet_weights).sample()
    return Categorical(list(rhs_probs.keys()), probabilities=probs)

def simple_boolean():
    def PREDICATE():
        feature_dist = expansion_dist({
            "SIZE": 2.0,
            "SHAPE": 1.4,
            "COLOR": 1.9,
        })
        attribute = feature_dist.sample()
        if attribute == 'COLOR':
            color_dist = expansion_dist({
                "yellow": .6,
                "green": .5,
                "blue": 1.5
            })
            return f"(lambda obj: obj['color'] == '{color_dist.sample()}')"
        elif attribute == 'SHAPE':
            shape_dist = expansion_dist({
                "triangle": 1.0,
                "rectangle": 0.4,
                "circle": 1.9
            })
            return f"(lambda obj: obj['shape'] == '{shape_dist.sample()}')"
        elif attribute == 'SIZE':
            size_dist = expansion_dist({
                "large": 1.0,
                "medium": 0.3,
                "small": 1,
            })
            return f"(lambda obj: obj['size'] == '{size_dist.sample()}')"

    def BOOL(depth):
        condition(depth < max_depth)
        bool_dist = expansion_dist({
            "or": 1.,
            "and": 0.25,
            "not": 2.0,
            "PREDICATE": 1.6,
        })
        symbol = bool_dist.sample()
        if symbol in ("or", "and"):
            expr = f"({BOOL(depth+1)} {symbol} {BOOL(depth+1)})"
        elif symbol == "not":
            expr = f"not {BOOL(depth+1)}"
        elif symbol == "PREDICATE":
            expr = f"{PREDICATE()}(x)"
        # Avoid some trivial cases
        condition("not not" not in expr)
        return expr

    def START(depth):
        return f"lambda x: {BOOL(depth+1)}"

    expr = START(0)
    return expr

We can look at some examples from the above prior.

Note: The above prior can generate very large programs if the probabilities of branching non-terminals like `or`/`and` are high enough. For that reason, we use `LikelihoodWeighting` instead of `SamplePrior`, which we would ordinarily use.

In [3]:
def sorted_dist(dist):
    support, probabilities = zip(*sorted(
        [(s, dist.prob(s)) for s in dist.support],
        key=lambda pair: pair[-1],
        reverse=True))
    return Categorical(support, probabilities=probabilities)

sorted_dist(infer(method="LikelihoodWeighting", samples=30, seed=42)(simple_boolean)())

Unnamed: 0,Element,Probability
0,lambda x: (lambda obj: obj['color'] == 'blue')(x),0.294
1,lambda x: not (lambda obj: obj['color'] == 'blue')(x),0.176
2,lambda x: not (lambda obj: obj['size'] == 'large')(x),0.176
3,lambda x: (lambda obj: obj['size'] == 'medium')(x),0.059
4,lambda x: not (lambda obj: obj['shape'] == 'circle')(x),0.059
5,lambda x: (lambda obj: obj['shape'] == 'circle')(x),0.059
6,lambda x: ((lambda obj: obj['color'] == 'blue')(x) or ((lambda obj: obj['color'] == 'blue')(x) and ((lambda obj: obj['shape'] == 'triangle')(x) and (lambda obj: obj['size'] == 'small')(x)))),0.059
7,lambda x: (lambda obj: obj['size'] == 'large')(x),0.059
8,lambda x: ((lambda obj: obj['size'] == 'small')(x) or (lambda obj: obj['shape'] == 'triangle')(x)),0.059


# Performing inference

We need to specify a likelihood to perform inference. We use a simple scheme, simply requiring a concept to match all examples seen thus far. The Piantadosi et al. (2016) model makes a number of additions to better fit data, like leaky memory.

In [39]:
# To make the concept executable, we need a deterministic transformation of it
@keep_deterministic
def convert_to_executable(func):
    # Because we are returning a function, it similarly must be marked as deterministic.
    return keep_deterministic(eval(func))

@infer(method="MetropolisHastings", samples=2000, seed=None, burn_in=2000)
def posterior(pcfg, data):
    p_correct_label = 0.95
    concept = pcfg()
    executable_concept = convert_to_executable(concept)
    for example, value in data:
        condition(p_correct_label if executable_concept(example) == value else 1 - p_correct_label)
    return concept

The concept with highest probability for this example is `color==yellow`, the intended concept. There are a number of other more complex hypotheses that are also consistent with the data -- you might see `color==yellow and color==yellow` or extraneous predicates like `color==yellow or size==large`. You'll also see simple hypotheses that are inconsistent with the data, like `size==large`.

In [40]:
examples = (
    (frozendict({'color': 'yellow', 'shape': 'circle', 'size': 'small'}), True),
    (frozendict({'color': 'blue', 'shape': 'rectangle', 'size': 'medium'}), False),
    (frozendict({'color': 'yellow', 'shape': 'triangle', 'size': 'large'}), True),
    (frozendict({'color': 'green', 'shape': 'circle', 'size': 'small'}), False)
)
dist = posterior(simple_boolean, examples)

In [41]:
sorted_dist(dist)

Unnamed: 0,Element,Probability
0,lambda x: (lambda obj: obj['color'] == 'yellow')(x),0.791
1,lambda x: (lambda obj: obj['size'] == 'large')(x),0.097
2,lambda x: ((lambda obj: obj['size'] == 'large')(x) or (lambda obj: obj['color'] == 'yellow')(x)),0.058
3,lambda x: (lambda obj: obj['shape'] == 'triangle')(x),0.044
4,lambda x: (lambda obj: obj['color'] == 'blue')(x),0.005
5,lambda x: ((lambda obj: obj['shape'] == 'triangle')(x) or (lambda obj: obj['color'] == 'yellow')(x)),0.003
6,lambda x: (lambda obj: obj['shape'] == 'circle')(x),0.001
7,lambda x: (lambda obj: obj['size'] == 'small')(x),0.001
