## Reverend Bayes' legendary coin of wisdom
### A parable

An eminent Professor of History at your university has recently discovered a relic that was once owned by the venerable Reverend Bayes. The item is the legendary "coin of wisdom", said to impart those who possess it with powerful knowledge of statistics. Legend has it that despite appearing to be a fair coin, the coin is biased, with an unequal probability of producing heads or tails. As the Professor of History could barely muster an addition if his life depended on it, he has reached out for collaborative assistance from our group to either confirm or deny the legend.

The coin flipping problem is of course a well-known problem from statistics, and when probabilities of heads or tails (or "probability of success" if we define success to be either heads or tails) are unequal the process can be modeled using the binomial distribution.

You have been tasked with estimating the probability that a flip of the coin produces tails (lets define tails as "success"). To do so, you collect a dataset of three sets of coin flips:
1. Your professor flips the coin five times, producing 1 tails and 4 heads, before realising he is late to an important meeting and handing the coin to a postdoc in the group.
2. The postdoc in the group flips the coin ten times, producing 5 tails and 5 heads, before realising it's time to collect the child from daycare and giving you the coin.
3. You flip the coin 100 times, producing 66 tails and 34 heads, before noticing how hungry you are and setting out to have dinner.

Because each person may have slight different biases in how they flip coins you decide not to pool all of the results, but instead to treat them as independent estimates under the binomial distribution. And for no particular reason, you decide to take the maximum likelihood computed under the binomial at fixed probabilities of tails/success rather than using analytical estimation or some kind of gradient optimisation.

Therefore, you calculate the likelihood of the three observations above when probability of tails ranges from 0.05, 0.1 ... 0.95 in increments of 0.05.

What is the maximum likelihood value of the probability of tails, p_t?

The likelihood that an arbitrary value p_t produced an observed set of coin flips can be calculated using the probability mass function of the binomial.


In [3]:
import scipy.stats as stats
# compute probability mass at each level of fairness for 
# the professor's coin flips
print("Professor:")
for i in range(1,20):
  print("\tp_t = "+str(i/20)+", likelihood="+str(stats.binom.pmf(1, 5, i/20)))

# now do the same for the postdoc's coin flips
print("Postdoc:")
for i in range(1,20):
  print("\tp_t = "+str(i/20)+", likelihood="+str(stats.binom.pmf(5, 10, i/20)))

# and for the student's coin flips
print("Student:")
for i in range(1,20):
  print("\tp_t = "+str(i/20)+", likelihood="+str(stats.binom.pmf(66, 100, i/20)))


Professor:
	p_t = 0.05, likelihood=0.20362656249999997
	p_t = 0.1, likelihood=0.3280499999999999
	p_t = 0.15, likelihood=0.3915046874999999
	p_t = 0.2, likelihood=0.4095999999999999
	p_t = 0.25, likelihood=0.39550781249999994
	p_t = 0.3, likelihood=0.3601499999999999
	p_t = 0.35, likelihood=0.3123859374999999
	p_t = 0.4, likelihood=0.2591999999999999
	p_t = 0.45, likelihood=0.20588906249999994
	p_t = 0.5, likelihood=0.15624999999999994
	p_t = 0.55, likelihood=0.11276718749999994
	p_t = 0.6, likelihood=0.07680000000000001
	p_t = 0.65, likelihood=0.04877031249999996
	p_t = 0.7, likelihood=0.02835
	p_t = 0.75, likelihood=0.014648437499999993
	p_t = 0.8, likelihood=0.006399999999999998
	p_t = 0.85, likelihood=0.002151562500000002
	p_t = 0.9, likelihood=0.00044999999999999945
	p_t = 0.95, likelihood=2.9687500000000112e-05
Postdoc:
	p_t = 0.05, likelihood=6.0935248828125136e-05
	p_t = 0.1, likelihood=0.0014880348000000042
	p_t = 0.15, likelihood=0.008490855786328131
	p_t = 0.2, likelihood=0.

## combining information from the separate trials

We have now obtained the likelihoods of different p_t for each of the three independent coin flip trials. How do we combine them? One approach would be to sum the (log) likelihoods across trials and take the maximum. Does it produce the correct result? Which trial contributes the most to the estimate of p_t?



## A further correspondence from the historian

The historian has come into contact with a colleague in Europe who is in possession of a letter written by the reverend that describes the manufacture of the coin. In it, the reverend indicates the shape and weight of the coin were designed to produce tails 60% of the time. How does that compare to your estimate? How does it compare to the student's estimate?

In [4]:
import pandas as pd
import numpy as np
probabilities = [i/20 for i in range(1,20)]
prof_lk = np.array([stats.binom.pmf(1, 5, i/20) for i in range(1,20)])
postdoc_lk = np.array([stats.binom.pmf(5, 10, i/20) for i in range(1,20)])
student_lk = np.array([stats.binom.pmf(66, 100, i/20) for i in range(1,20)])
#normalise
prof_lk_norm = (prof_lk - np.min(prof_lk)) / (np.max(prof_lk) - np.min(prof_lk))
postdoc_lk_norm = (postdoc_lk - np.min(postdoc_lk)) / (np.max(postdoc_lk) - np.min(postdoc_lk))
student_lk_norm = (student_lk - np.min(student_lk)) / (np.max(student_lk) - np.min(student_lk))
# convert to log
prof_lk_norm_log = np.log(prof_lk_norm) # natural log
postdoc_lk_norm_log = np.log(postdoc_lk_norm)
student_lk_norm_log = np.log(student_lk_norm)
# get series
prof_vals = pd.Series(prof_lk_norm_log,probabilities)
postdoc_vals = pd.Series(postdoc_lk_norm_log,probabilities)
student_vals = pd.Series(student_lk_norm_log,probabilities)
prof_lk_norm

  prof_lk_norm_log = np.log(prof_lk_norm) # natural log
  postdoc_lk_norm_log = np.log(postdoc_lk_norm)
  student_lk_norm_log = np.log(student_lk_norm)


array([0.49709871, 0.80088889, 0.95581879, 1.        , 0.96559275,
       0.87926371, 0.76264378, 0.63278588, 0.50262279, 0.38142489,
       0.27525799, 0.18744111, 0.11900429, 0.0691464 , 0.03569289,
       0.01555365, 0.00518073, 0.00102623, 0.        ])

In [None]:
collected_df = pd.concat([prof_vals,postdoc_vals,student_vals], axis=1)
collected_df

In [None]:
collected_df_sum = collected_df.sum(axis=1)
collected_df_sum

In [7]:
#
# demonstrating the approach of normalisation followed by weighting
#
probabilities = [i/20 for i in range(1,20)]
prof_lk = np.array([stats.binom.pmf(1, 5, i/20) for i in range(1,20)])
postdoc_lk = np.array([stats.binom.pmf(5, 10, i/20) for i in range(1,20)])
student_lk = np.array([stats.binom.pmf(66, 100, i/20) for i in range(1,20)])
# normalise so likelihood values sum to 1 per trial, then weight by the number of independent draws per trial (5, 10 or 100) 
prof_lk_norm = 5 * prof_lk / np.sum(prof_lk)
postdoc_lk_norm = 10 * postdoc_lk / np.sum(postdoc_lk)
student_lk_norm = 100 * student_lk / np.sum(student_lk)
# sum all "weighted scores"
norm_sum = prof_lk_norm + postdoc_lk_norm + student_lk_norm
norm_sum
# select the max weighted score (should be 0.6)
print("Combined:")
for i in range(1,19):
  print("\tp_t = "+str(i/20)+", score="+str(norm_sum[i]))


Combined:
	p_t = 0.05, score=0.5033493340911449
	p_t = 0.1, score=0.637644619203657
	p_t = 0.15, score=0.7635910005674413
	p_t = 0.2, score=0.9181830025090298
	p_t = 0.25, score=1.109674101177159
	p_t = 0.3, score=1.31615914018216
	p_t = 0.35, score=1.4949073944265314
	p_t = 0.4, score=1.6035976797458766
	p_t = 0.45, score=1.8207060752467685
	p_t = 0.5, score=4.928110437014295
	p_t = 0.55, score=20.95642213162671
	p_t = 0.6, score=42.3994500331629
	p_t = 0.65, score=29.84024364498622
	p_t = 0.7, score=5.983057207915847
	p_t = 0.75, score=0.35739517106290675
	p_t = 0.8, score=0.05057251774712528
	p_t = 0.85, score=0.008863461021435037
	p_t = 0.9, score=0.0003799548787536089
