In [1]:
from scipy.special import beta

**Exercise 10.1 [Purpose: to illustrate the fact that models with more distinctive predictions can be more easily discriminated.]** Consider the scenario of Section 10.2.1, in which there were two coin factories, one of which was tail-biased and the other head-biased. Suppose we flip a coin that we know is from one of the two factories but we do not know which factory, and the prior probabilities of the factories are 50/50. The results show $z=7$ heads in $N=10$ flips.

**(A)** If $w_1=0.25$, $w_2=0.75$ and $\kappa=6$, what are the posterior probabilities of the factories?

In [2]:
def analytical_derivation_pdm(w, k, n, z):
    a = w * (k - 2) + 1
    b = (1 - w) * (k - 2) + 1
    pdm = beta(z + a, n - z + b) / beta(a, b)

    print('w = ' + str(w) + ', k = ' + str(k))
    print('a = ' + str(a) + ', b = ' + str(b))
    print('n = ' + str(n) + ', z = ' + str(z))
    print('p(z, n|m) = ' + str(pdm))
    
    return pdm

In [3]:
def analytical_derivation_pmd(pdm1, pdm2, pm1, pm2):
    ratio = pdm1 * pm1 / (pdm2 * pm2)
    pm1d = ratio / (1 + ratio)
    pm2d = 1 - pm1d
    print('p(m=1|D) = ' + str(pm1d))
    print('p(m=2|D) = ' + str(pm2d))

In [4]:
pdm1 = analytical_derivation_pdm(0.25, 6, 10, 7)

w = 0.25, k = 6
a = 2.0, b = 4.0
n = 10, z = 7
p(z, n|m) = 0.000444000444000444


In [5]:
pdm2 = analytical_derivation_pdm(0.75, 6, 10, 7)

w = 0.75, k = 6
a = 4.0, b = 2.0
n = 10, z = 7
p(z, n|m) = 0.001332001332001332


In [6]:
analytical_derivation_pmd(pdm1, pdm2, 0.5, 0.5)

p(m=1|D) = 0.25
p(m=2|D) = 0.75


**(B)** If $w_1=0.25$, $w_2=0.75$ and $\kappa=202$, what are the posterior probabilities of the factories?

In [7]:
pdm1 = analytical_derivation_pdm(0.25, 202, 10, 7)

w = 0.25, k = 202
a = 51.0, b = 151.0
n = 10, z = 7
p(z, n|m) = 3.3220052511454465e-05


In [8]:
pdm2 = analytical_derivation_pdm(0.75, 202, 10, 7)

w = 0.75, k = 202
a = 151.0, b = 51.0
n = 10, z = 7
p(z, n|m) = 0.002048602283091808


In [9]:
analytical_derivation_pmd(pdm1, pdm2, 0.5, 0.5)

p(m=1|D) = 0.015957198625130557
p(m=2|D) = 0.9840428013748694


**(C)** Why are the posterior probabilities so different in parts A and B, even though the modes of the factories are the same?

A higher $\kappa$ value corresponds to a narrower prior distribution i.e. we are more certain about the values of $w_1$ and $w_2$. Therefore, posterior estimations for p(m|D) are less impacted by shrinkage for the same amount of data if we have a higher $\kappa$. 