# Introduction to Bayesian Models

### Goals for today
1. Core concepts of Bayesian statistics.
2. Interpreting a positive test for a disease.
3. Is it a fair coin?


---
# Interpreting a Positive Test for a Disease

The base rate fallacy occurs when the base rate (prevalence) of a condition is not taken into account when interpreting diagnostic test results. Let's consider an example:

Suppose a disease affects 1% of the population (prevalence). A test for this disease has a:
- 99% sensitivity (true positive rate): the probability that the test is positive given that the disease is present.
- 95% specificity (true negative rate): the probability that the test is negative given that the disease is not present.

Without doing any calculations. Given that a person tests positive, what is the probability that they actually have the disease?


**your answer here**



---
## We need to using Bayesian statistics to answer this properly

Bayesian statistics is an approach to statistical inference that uses Bayes' Theorem to combine prior information with data to form our final belief. In this example, we need to consider the prior probability of a disease given our positive test.

**Bayes' Theorem**: 

$$ P(H|E) = \frac{P(E|H) \cdot P(H)}{P(E)} $$

Where:
- $ P(H|E) $ is the posterior probability: the probability of the hypothesis $ H $ given the evidence $ E $.
- $ P(E|H) $ is the likelihood: the probability of the evidence $ E $ given that hypothesis $ H $ is true.
- $ P(H) $ is the prior probability: the initial probability of the hypothesis before considering the evidence.
- $ P(E) $ is the marginal likelihood: the total probability of the evidence under all possible hypotheses.

**Keep in mind that "data" or "observations" are other ways to refer to evidence. The important point is that they all refer to empirical observations, aka, evidence/data/observations.**


---
### Task 1:

Determine how the sensitivity, specificity, and prevalence of a disease fit into Baye's theorem. In other words, which is the prior, likleihood...

**Your answer here**   

Prior: This is the prevalence of the disease in the population. In our example, it's 1% or 0.01.   
Likelihood: This is the sensitivity of the test, which is the probability of a positive test result given that the person has the disease. In our example, it's 99% or 0.99.   
Posterior probability: This is what we're trying to calculate - the probability that a person has the disease given that they tested positive.   
Marginal Likelihood: This is the overall probability of a positive test result.   


---
### Task 2:   
Now that you know how these probabilities fit into Baye's theorem, implement this equation into code and determine what the probability someone has the disease given a positive test.

In [None]:
# your answer here


---
### Practice problem
- Calculate the probability of having a disease given the following information:
    - The disease effects 0.1% of the population
    - The test has a 95% sensitivity rate
    - The test has a 92% specificity rate

In [None]:
# your answer here


---
# Determining if a Coin is Fair Using Bayes' Theorem
Let's say we want to determine if a coin is a fair coin (has an equal probability of heads vs tails) based on some data we collect with the coin.

Suppose you flipped a coin 10 times and got 7 heads and 3 tails. Does this mean the coin isn't fair since we didn't get 5 heads and 5 tails?


### Lets define our hypotheses
- We want to determine if the coin is fair. 
- We can collect some data, combine it with our prior knowledge, to come up with an updated belief on the probability of the coin being fair.

---
### Task 1
- Formulate Bayes' theorem for our problem. In other words, how do we set this up as a Bayesian problem.
- What is the Prior? Likelihood? Posterior?

**Your answer here**   
Prior: This is our prior belief about the probability of the coin being fair before we observe any data.  
Likelihood: This is the probability of observing our data (7 heads and 3 tails in 10 flips) given that the coin is fair.  
Posterior: The probability that the coin is fair given our observed data.  
Marginal Likelihood: This is the total probability of observing our data under all possible hypotheses.

---
### The likelihood
- In our case the likelihood is the probability of the data we have observed (7 head out of 10 heads) given our hypothesis $p(D|H)$
- Without getting into the details, we can model this as a binomial probability of observing k successes in n trials, using the following equation:
$$p(k;n,p) = \frac{n!}{k!(n-k)!}p^k(1-p)^{n-k}$$
- Where p is the probability of sucess, k is the observed number of heads, and n is the number of coin flips. The portion on the left-hand side (LHS) can be read as "the probability of observing k heads (successes), when we have n trials and the coin's probability of coming up heads is p"
- Remember that the likelihood always answers the question "What is the probability of the data, assuming our hypothesis is true?". In this case, each tested value of p represents a hypothesis about the fairness of the coin. 
- Recall that the $!$ represents the factorial. (https://en.wikipedia.org/wiki/Factorial)


---
### Task 2:
- Write a function that implements this equation. It should take in 3 arguments (p,n,k) and return the likelihood. You can import the `math` libary and use `math.factorial(x)` in your function.

In [None]:
# your answer here
# First, import
from math import factorial

# Write your function below

---
### Task 3: Building the likelihood function
- Now that we have a way of calculating our probabilities you can build your likelihood function.
- Keeping the number of coin flips and heads constant, iterate through the different possible values of p and calculate the likelihood for each. 
- Plot this is a function.

In [None]:
import numpy as np
import matplotlib.pyplot as plt


---
### Task 4:
- Now we have the likelihood, we can use Baye's rule to calculate the posterior.
- For this task, we will start with a uniform prior. You can use the function `np.ones_like()`
- Write code to calculate the posterior distribution, given a uniform prior and the likelihood you have made above.
- Luckily, you don't always have to calculate the marginal likelihood. Since we know that all discrete probability distributions must sum to 1, we can normalize distribution as a last step. 

## Think!

If it seems confusing that the x-axis and y-axis labels both refer to probabilities, that just means you've been paying attention. While it may seem odd, keep in mind that the $p$ quantified on the x-axis refers to the underlying probability that your coin (in technical terms, a "Bernoulli process") comes up heads (or 1, or "success"). Meanwhile, the probability on the y-axis refers to the probability that $p$ takes on its respective value. In other words, we're computing the probability that a coin's probability of coming up heads is whatever the x-axis value is. If we're dealing with a perfectly fair coin, then the y-axis values will be greatest around 0.5. If the coin is a trick coin that always comes up tails, then most of the probability will cluster around $p=0$. Depending on the amount of data we collect and our priors, the clustering around 0.5 (or 0, in the second example) will be more concentrated or more diffuse (i.e., spread out).  


---
### Task 5:
- Create a function to calculate the posterior. It should take in a prior distribution, p, n, and k.

---
### Task 6:
- What happens if we get 10 head out of 16 coin flips?
- Create a figure that shows the prior (uniform), likelihood, and the posterior for this situation

In [None]:
# your answer here

---
### Task 7:
- Previously, we have only looked at what happens when we use a uniform prior. Now we will see what happens when we use a non-uniform prior.
- Without getting into the mathematics, we can use a Beta distribution as our prior.
- For simplicity, we will just import a function to generate this distribution from the scipy library (See example below). The beta distribution takes two parameters, referred to as "shape" parameters
- First, plot the beta distribution with the values of 10 and 10.
- Next, repeat task 6 with this new prior distribution. How does this change our posterior?

In [None]:
from scipy.stats import beta # Needed for this section


In [None]:
# your code here


---
### Task 8:
- Using the beta distribution, if you keep the two parameters the same, the distribution will remain symmetric centered at 0.5. If you increase both parameters, the distribution will become narrower (stronger prior) or if you lower them it becomes wider (weaker prior).
- Play around with changing the strength of the prior and look at how this changes the posterior.