#Most people take probability for granted, but in reality it is more naunced and complicated.

A report says, 85% of Cancer patients have said to have been Coffee drinkers. 

Say, in the US the population of Cancer patients is 0.5%
and about 65% of the population drink Coffee.

Given that condition in the report, we should have more than 5% cancer patients, if it were to be True.

Let's say, 

\begin{align} P(Coffee|Cancer) = 0.85 \\ P(Coffee) = 0.65 \\ P(Cancer) = 0.05 \\ P(Cancer|Coffee) = ?\end{align}

Applying Bayes' Theorem we get:

\begin{equation}
P(Cancer|Coffee)= \frac {P(Coffee|Cancer) × P(Cancer)}{P(Coffee)}
\end{equation}

In [1]:
p_coffee_drinker=0.65
p_cancer=0.005
p_coffee_given_cancer=0.85
p_cancer_given_coffee_drinker=(p_coffee_given_cancer*p_cancer)/p_coffee_drinker
p_cancer_given_coffee_drinker

0.006538461538461539

># **Binomial Distributions**
___

Binomial Distributions measures how likely $k$ successes can happen out of $n$ trials, given $p$ probability.

\begin{equation}
{n \choose k}p^k (1-p)^{n-k}
\end{equation}



In [4]:
import matplotlib.pyplot as plt
from scipy.stats import binom
import seaborn as sns
import numpy as np

In [5]:
n=10
p=0.9
ks=np.arange(n+1)
total=0
for k in ks:
  probability=binom.pmf(k,n,p)
  #print(f"{k} - {probability}")
  if k<=8:
    total+=probability
  else:
    break
print(total) # total probability of 8 or fewer successes

0.26390107089999976


So, there is a 26% chance we would see eight or fewer successes even if the underlying success rate is 90%

>## ***Building Binomial Distribution from scratch.***
___

From equation: 

\begin{equation}
{n \choose k}p^k (1-p)^{n-k}\\ \text{Binomial Coefficient}{n \choose k} = \frac{n!}{k!×(n-k)!}
\end{equation}
```python:
def factorial(n: int):
  f = 1
  for i in range(n):
    f*=(i+1)
    return f
def binomial_coefficient(n: int, k: int):
  return factorial(n) /(factorial(k)*factorial(n-k))
def binomial_distribution(k: int, n: int, p: float):
  return binomial_coefficient(n,k) *(p**k) * (1.0 - p)**n-k # refer to equation 1
```

># **Beta Distribution**


---

So far, we've been creating myriad of distributions to answer the question whether or not we are going to see 8 successes out of 10 tests in the engine testing model.
* What if there are more underlying rates of success that yield 8/10 successes besides 90%?
* What if 70% or 30% or 80% underlying success rate yields 8/10 success result?
* When we fix 8/10 successes, can we explore the probabilities of those probabilities?

Simple approach to that would be new type of distribution, _the beta distribution_. It allows us to see the likelihood of didfferent underlying probabilities for an event to occur, given $alpha$ successes and $beta$ failures.



In [4]:
from scipy.stats import beta

a=8 #success
b=2 #failures
p=beta.cdf(.90, a, b) # probability the underlying success of 90%
print(p)

0.7748409780000001


So, according to our calculation, there is a 77.48 % chance the underlying probability of success is 90% or less

How do we calculate the probability of success being 90% or more? Pretty simple just subtract it from `1.0`

In [7]:
1.0-p

0.22515902199999993

Which only means that there is only a 22.5% chance that the underlying success rate is 90% or higher? But there is 77.5% chance that it is less than 90%. Could we gamble on that 22.5% chance of 90% or higher underlying success rate? I don't think so. If we run more tests and after 30 tests we get 6 failures then - 

In [8]:
a = 30
b= 6
p= 1.0 - beta.cdf(.90, a, b)
print(p)

0.13163577484183708


At this point, it might be a good idea to walk away from the tests, unless you want to keep gambling away against the 13.16% chance and hope the peak moves to the right. 

* Question arises how can we calculate an area in the middle? say - the chances of succeeding is between 80% to 90% ?

- That would be to subtract the area behind 80% peak from the area behind 90% 

In [9]:
p = beta.cdf(.90, a, b) - beta.cdf(.80, a , b)
print(p)

0.5962725311986745


So, the probability that our underlying success rate is between 80% and 90% is 59.6%
