In [1]:
# basic functions
import math
import numpy as np
from scipy.stats import binom
from scipy.stats import beta

In [2]:
# create visuals and visual style
import matplotlib.pyplot as plt
import seaborn as sns

sns.set(style='white', context='notebook', palette='deep')

%matplotlib inline

Hello<sup>I love you won't you tell me your name</sup>

### `The Beta Distribution`

You would use the **`Beta Distribution`** when you've already observed a number of trials and the number of successful outcomes.

### `Distinguishing Probability, Statistics, and Inference`

### `Collecting Data`

### `Calculating the Probability of Probabilities`

Add some explanatory text here related to the header in the book.

P(two coins) = 1/2 *vs.* P(two coins) = 14/41

H<sub>1</sub> is P(two coins) = 1/2

H<sub>2</sub> is P(two coins) = 14/41

For this, we will calculate a binomial pmf for each hypothesis. 

P(D|H<sub>1</sub>) = B(14;41,1/2)

P(D|H<sub>2</sub>) = B(14;41,14/41)

In [3]:
hypoth_one = binom.pmf(14,41,1/2)
hypoth_two = binom.pmf(14,41,14/41)

In [4]:
# Given the data (14 cases of getting two coins out of 41 trials), H2 is almost 10 times more
# probable than H1
f"H1: {hypoth_one:.4f} vs. H2: {hypoth_two:.4f}"

'H1: 0.0160 vs. H2: 0.1305'

In [5]:
hypoth_two/hypoth_one

8.141526269320835

H<sub>3</sub> = P(two coins) = 15/42

In [6]:
# integrate graph later

deci_range = np.arange(0,1,0.05)

for n in deci_range:
    print(binom.pmf(15,42,n))

0.0
7.538470176454329e-10
5.737775715321828e-06
0.0005368781055062878
0.007817635130837315
0.03890004062170131
0.09303828886450415
0.12702146220279298
0.10843748676463123
0.060558643173393135
0.022435512531955866
0.005449717969746779
0.0008357652981006862
7.54628773050255e-05
3.572237217133146e-06
7.31972892977795e-08
4.659673649571707e-10
4.897153233027799e-13
2.031577782838519e-17
3.405963657279598e-25


In [7]:
tt = [binom.pmf(15,42,n) for n in deci_range]

In [8]:
sum(tt)

0.46511627797100064

The beta distribution is listed under [scipy.stats.beta](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.beta.html) in Scipy.

It is an instance of the **`rv_continous`** class.

In [3]:
# q = lower and upper tail
# a,b shape parameters
# x = quantiles

In [4]:
beta.pdf([0.25,0.5],14,27)

array([2.73249066, 0.59098423])

In [10]:
beta.cdf([0.25,0.5], 14,27)

array([0.10323172, 0.98076135])

### the integration range [0,0.5] is important
### probability good idea to explain how continous distributions work

In [6]:
beta.cdf([0.0,0.5],14,27)[1]

0.9807613458578999

In [7]:
beta.pdf([0.0,0.35],14,27)

array([0.        , 5.25263925])

### `Reverse Engineering the Gacha Game`

In [8]:
beta.sf([0.0,0.005],5,1195)[1] # the old survival function
# explain why we use [1]

0.2850559397962503

In [16]:
# interpretation: there is a 28.5% chance the rate of pulling an Efron card is 0.005 or greater

In [35]:
cdf = beta.cdf([0.0,0.005], 5,1195)[1]
cdf

0.7149440602037497

This says, "there is a 71.5% chance that pulling a Bradley Efron card is less than 0.005. Or conversely, that the chance it is greater than 0.005 is `1-cdf` or 28.5%."

Thus, if your friend wants his chances to be 70.0% or greater, he shouldn't try his luck.

There is another way to approach this question.

You've pulled 5 Bradley Efron cards. And your friend really wants one. 

Based on your data, you can make an estimate of how much a Bradley Efron card is worth.

Then you can answer the question: How much should my friend pay to get a Bradley Efron card?

Easy Version: Assume he *only* wants a Bradley Efron card.
Hard Version: Assign values the other cards he collects in his quest to get a Bradley Efron.

### `The Beta Distribution`

### `Summary of Python used:`

combinations: comb(N,k,exact=True)

binomial probability mass function: binom.pmf(k,n,p)

binomial survival function: binom.sf(k,n,p)

binomial cumulative density function: binom.cdf(k,n,p)

### `Questions`

*`1. You want to use the beta distribution to determine whether or not a coin you have is a fair coin--meaning that the coin gives you heads and tails equally. You flip the coin 10 times  and get 4 heads and 6 tails. Using the beta distribution, what is the probability that the coin will land on heads more than 60 percent of the time?`*

In [17]:
1 - beta.cdf([0.0,0.6],4,6)[1]

0.09935257600000003

In [18]:
# chances of getting heads more than 60%
# between 0.6 and 1.0 interval
beta.sf([0.6,1.0],4,6)

array([0.09935258, 0.        ])

#### from `scipy.stats.rv_continous` docs:

If possible, you should override _isf, _sf or _logsf. The main reason would be to improve numerical accuracy: for example, the survival function _sf is computed as 1 - _cdf which can result in loss of precision if _cdf(x) is close to one.

In [19]:
beta.sf(0.6,4,6)

0.09935257600000003

In [20]:
beta.pdf([0.5],4,6)

array([1.96875])

In [21]:
beta.interval(.9,4,6)

(0.168750495987324, 0.6550586340562967)

*`2. You flip the coin 10 more times and now have 9 heads and 11 tails total. What is the probability that the coin is fair, using our definition of fair, give or take 5 percent?`*

In [22]:
# remember "fair" = 0.5 so "give or take" is 0.5 +/- 0.05
fair = beta.cdf([0.45,0.55],9,11)
fair
# the cumulative probability between 0.45% and 0.55%
# Or 0.815 - 0.506 = 0.309

array([0.50601904, 0.81589905])

In [23]:
fair[1] - fair[0]

0.3098800156513042

*`3. Data is the best way to become more confident in your assertions. You flip the coin 200 more times and end up with 109 heads and 111 tails. Now what is the probability that coins is fair, give or take 5 percent?`*

In [24]:
def beta_cdf_prob(interval, a,b):
    """
    interval: array upper and lower limits of the range
    a: int alpha
    b: int beta
    
    returns: cumulative density between an lower limit and upper limit for a beta function
    """
    lower, upper = min(interval), max(interval)
    cdf_range = beta.cdf([lower,upper],a,b)
    return cdf_range[1] - cdf_range[0]

In [25]:
beta_cdf_prob([0.45,0.55], 109,111)

0.8589371426532354

### `Etcetera`
Likely to get deleted later.

# Critique:

I'd like to see the examples and problems set up in math notion form, then make the bridge for 
the reader into code

Create a quick notebook teaching all of this through sports?

In [26]:
beta.cdf([0.45,0.49,0.55],109,111)

array([0.08849826, 0.43595318, 0.9474354 ])

In [27]:
interv = [0.49,0.45,0.55]

In [28]:
beta_cdf_prob(interv,109,111)

0.8589371426532354