Let's solve the famous [taxicab problem](https://en.wikipedia.org/wiki/Representativeness_heuristic#The_taxicab_problem).

As originally stated by Tversky and Kahneman it goes like this:

* A cab was involved in a hit and run accident at night. Two cab companies, the Green and the
Blue operate in the city. 85% of the cabs in the city are Green and 15% are Blue.
* A witness identified the cab as Blue. The court tested the reliability of the witness under the
 same circumstances that existed on the night of the accident and  concluded that the witness
 correctly identified each one of the two colours 80% of the time and failed 20% of the time.
* What is the probability that the cab involved in the accident was Blue rather than Green knowing that this witness identified it as Blue?

The true answer is around 41% — roughly speaking because the base rate of 85% is more informative
 than the likelihood of 80%.

More formally:

$P(\text{TrueCompany}=\mathrm{blue} \vert \text{WitnessCompany}=\mathrm{blue}) = \frac{.15 \times
.8}{.85 \times .2 + .15 \times
.8)}
\approx .4138$

The code for this in Pangolin is barely more complicated than the math.

In [4]:
import pangolin as pg
# 0 means green, 1 means blue
true_company = pg.bernoulli(0.15)
witness_company = pg.bernoulli(0.8 * true_company + 0.2 * (1-true_company))
prob_blue = pg.E(true_company, witness_company, 1)
print(f"{prob_blue=}")

prob_blue=Array(0.4161, dtype=float32)


For the sake of comparison, what's the easiest way of doing this in Numpyro, Tensorflow
Probability, PyMC, or JAGS?

In [2]:
# NumPyro
import numpyro, jax
import numpy as np

def model():
    true_company = numpyro.sample("true company",numpyro.distributions.Bernoulli(0.15))
    witness_company = numpyro.sample("witness company",numpyro.distributions.Bernoulli(0.8 *
                                                                                       true_company + 0.2 * (1-true_company)), obs=1)
kernel = numpyro.infer.MixedHMC(numpyro.infer.HMC(model, trajectory_length=1.2), num_discrete_updates=20)
mcmc = numpyro.infer.MCMC(kernel, num_warmup=1000, num_samples=1000)
mcmc.run(jax.random.PRNGKey(0))
samps = mcmc.get_samples()['true company']
np.mean(samps)

sample: 100%|██████████| 2000/2000 [00:02<00:00, 695.76it/s, 20 steps of size 3.40e+38. acc. prob=1.00] 


Array(0.411, dtype=float32)

In [None]:
# PyMC
import pymc as pm
with pm.Model() as model:
    true_company = pm.Bernoulli("true company",0.15)
    witness_company = pm.Bernoulli("witness company", 0.8 * true_company + 0.2 * (1-true_company), observed=1)
with model:
    trace = pm.sample()
trace.posterior["true company"].mean()

np.mean(np.array(trace.posterior["true company"]))

Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 36 seconds.


0.415

In [4]:
# Tensorflow probability... just trying to install this is typical Google hell.
# from tensorflow_probability.substrates import jax as tfp
# tfd = tfp.distributions
#
# Root = tfd.JointDistributionCoroutine.Root
# def model():
#     true_company = yield Root(tfd.Sample(tfd.Bernoulli(0.15)))
#     witness_company = yield Root(tfd.Sample(tfd.Bernoulli(0.8 * true_company + 0.2 *
#                                                          (1-true_company))))
#