In [3]:
# basic functions
import math
import numpy as np

In [4]:
#visual style
import matplotlib.pyplot as plt

%matplotlib inline

## `Exercises`

*1. A friend finds a coin on the ground, flips it, and gets six heads in a row and then one tails. Give the beta distribution that describes this. Use integration to determine the probability that the true rate of flipping heads is between 0.4 and 0.6, reflecting that the coin is reasonably fair.*

The **Beta Distibution** parameters are **`alpha (observed successes)`** and **`beta (observed failures)`**. Problems using this distribution calculate the rate of success given the data we have.

P(RateOfSuccess|Successes and Failures) = Beta($\alpha$,$\beta$)

For this problem the **Number of Heads = 6** (ie "successes") and the number of failures equals the **Number of Tails = 1**

With this data we can determine the chances that the coin is fair or "that the true rate of flipping heads is between 0.4 and 0.6."

To solve this using Python, use **`from scipy.stats import beta`**.

Another way of thinking about this is "What is the probability the rate of heads is between 0.4 and 0.6?" 

To do that, look at the chunk of probability that falls between those two bookends using the cumulative distribution method or **`cdf`** method.

In [5]:
from scipy.stats import beta

In [6]:
# parameters
heads_q1 = 6
tails_q1 = 1
fair_rate = [0.4,0.6]

In [7]:
result = beta.cdf(fair_rate,heads_q1,tails_q1)
result

array([0.004096, 0.046656])

The result is the cumulative distribution up to the lower bound (0.4) and up to the upper bound (0.6). To get the probability that the true rate of heads falls below that, subtract the lower bound output from the upper bound.

*Note: If either alpha or beta is 0, the result will be nan.*

In [8]:
result[-1] - result[0]

0.04255999999999999

In [9]:
def beta_prob(a=1,b=1,bounds=[0.0,1.0]):
    """
    Calculates beta probability between a lower and upper bound.
    
    Parameters
    ----------
    a: int 
        alpha
        
    b: int
        beta
        
    bounds: list
        lower bound,upper bound
    
    Returns
    -------
    out: np.float64
        probability
    """
    result = beta.cdf(bounds,a,b)
    return result[-1] - result[0]

In [19]:
beta_prob(6,1,[0.4,0.6])

0.04255999999999999

*2. Come up with a prior probability that the coin is fair. Use a beta distribution such that there is at least a 95 percent chance that the true rate of flipping heads is between 0.4 and 0.6.*

Another way to phrase this question would be:

What prior probability do you need to determine we are 95% the true rate of flipping heads is between 0.4 and 0.6?

As Will Kurt points out in the answer, "Any $\alpha$<sub>prior</sub> = $\beta$<sub>prior</sub> will give a 'fair' prior; and the larger those values are, the stronger that prior is."

In other words, for a fair prior the successes should equal the failures. And if you take successes plus failures, the higher the number, the better.

In [25]:
beta_prob(26,21,[0.4,0.6])

0.7208847847634349

In [31]:
steps = [n for n in np.arange(0,111,10)]
steps

[0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110]

In [32]:
for s in steps:
    n = s/2
    print(f'{s}: {beta_prob(6+n,1+n,bounds=[0.4,0.6]):.2f}')

0: 0.04
10: 0.31
20: 0.50
30: 0.63
40: 0.72
50: 0.79
60: 0.84
70: 0.87
80: 0.90
90: 0.92
100: 0.94
110: 0.95


In [35]:
steps_2 = [n for n in np.arange(100,111,1)]
steps_2

[100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110]

In [39]:
for s in steps_2:
    n = s/2
    print(f'{s}: {beta_prob(6+n,1+n,bounds=[0.4,0.6]):.4f}')

100: 0.9399
101: 0.9413
102: 0.9427
103: 0.9441
104: 0.9454
105: 0.9467
106: 0.9480
107: 0.9492
108: 0.9504
109: 0.9516
110: 0.9527


*3. Now see how many more heads (with no more tails) it would take to convince you that there is a reasonable chance that the coin is not fair. In this case, let's say that this means that our belief in the rate of the coin being between 0.4 and 0.6 drops below 0.5.*

This is working off the Beta(55,55) prior. The point of this exercise is to show that even a strong prior belief can be overcome with more data.

In [51]:
beta_prob(61,56,[0.4,0.6])

0.9527469094270735

In [52]:
beta_prob(84,56,[0.4,0.6])

0.4954058980753927

In [64]:
def beta_prior_finder(bounds=[0.0,1.0], a=1,b=1,start=0,finish=10,step=1):
    """
    work in progress, starting with 'fair' prior
    really, fair_beta_prior_finder
    if 'fair', prior returned is divided equally amongst alpha and beta
    returns confidence level for certain prior n
    returns (prior, prob)
    """
    prior_range = [n/2 for n in np.arange(start,finish,step)]
    pass

So this says we'd need a prior of 108 to be 95% certain

In [18]:
x = np.arange(1,7,1)

fig,ax = plt.subplots()
ax.plot(x,beta_pdf
plt.show()

SyntaxError: invalid syntax (<ipython-input-18-4a1c8342ed38>, line 5)

In [None]:
y.shape

## `Notes`

Think of some binary outcomes where this might be useful in your life.