<div style="color:white;
           display:fill;
           border-radius:5px;
           background-color:#5642C5;
           font-size:200%;
           font-family:Arial;letter-spacing:0.5px">

<p width = 20%, style="padding: 10px;
              color:white;">
Bayes Reasoning
              
</p>
</div>

Data Science Cohort Live NYC Feb 2023
<p>Phase 2: Topic 17</p>
<br>
<br>

<div align = "right">
<img src="images/flatiron-school-logo.png" align = "right" width="200"/>
</div>

# Objectives

- Contrast Bayesian and Frequentist philosophies of probability
- Use Bayes' Theorem to update a prior belief

#### Frequentist Statistics and Probability
- Basically what we have been doing up until now

$$ P(X) = \lim_{N \rightarrow \infty}\frac{n_X}{N} $$

E.g., only defined in terms of the theoretical frequency of an outcome in the event space as the number of trials $N$ gets big

- We only test the probability of events $X$ (data):
    - as what the relative frequency of $X$ would be.
    - in this case under a null hypothesis.

#### Taking data and hypothesis testing with frequentist probability:

- Given a set of data, we can compute the probability of data under $H_0$
    - i.e. what would the probabilities be of measured count frequencies and mean, etc be if the null were true?
    - we use Z-test, t-tests, $\chi^2$ tests to figure this out.

Then do not reject or reject the hypothesis:
- based on whether the count frequencies/statistics are improbable under the null.


Critical points: 
- Hypotheses are just statements: can only be accepted or rejected.
- Probabilities can only be defined as relative frequencies of events.

#### Thomas Bayes

- Thought about probability a little differently.

Unwittingly, created one of the more profound theorems in statistics

<div align = "center">
<img src="images/Thomas_Bayes.gif" align = "center" width="200"/>
</div>
<center> Thomas Bayes </center>

#### So what did Bayes say?

- probability encodes **uncertainty** in a proposition
- does not strictly have to be defined in terms of frequencies

- Given acquired data, the probability of an outcome or a proposition is a function of:
    - the data
    -  _prior_ knowledge or belief on the outcome/proposition.

#### Presenting, Bayes' Theorem!

Which results in Bayes' theorem:

$$ P(A|B) = \frac{P(B|A)P(A)}{P(B)} $$

Bayes proved this relationship but also an interpretation for it:

- $P(A)$: *prior*
- $P(A|B)$: *posterior*
- $P(B|A)$: *likelihood*
- $\frac{1}{P(B)}$: *normalization*



$$ P(A|B) = \frac{P(B|A)P(A)}{P(B)} $$

Suppose I am trying to estimate probability of $A$.
- But my estimate of $A$ may depend on knowledge I gain from other factor(s) or measurements ($B$)


E.g.: I have a set of dice. Am told they could be fair or three-loaded.

Pick one die. Is this one fair ($A$ = fair) or a three-loaded dice ($A$ = loaded)?

$B$ would be the result of rolling this dice many times and getting a set of results.

<div align = "center">
<img src="images/RollingDice.jpg" align = "center" width="400"/>
</div>

**The prior**

Before I roll the dice:
- I have twenty dice to chose from
- Am told there was 5 loaded dice tossed in the mix.

- $P(A = loaded)$?
- $P(A = fair)$?

In [1]:
prior_fair = 15/20
prior_loaded = 5/20

Before I know $B$, I might have an estimate on probabilities of $A$ or *prior* :
    
This is $ P(A) $

Note this is totally irrespective of $B$

$$ P(A|B) = \frac{P(B|A)P(A)}{P(B)} $$


- P($A$) represent probability before we have any knowledge of $B$

Now I roll the selected die a bunch of times: 
$$e.g., B = \{3,1,5,2\} $$ 
After the rolls represented by $B$:
- Natural to update our estimate of $P(A)$ with $P(A|B)$.

$$ P(A|B) = \frac{P(B|A)P(A)}{P(B)} $$

$P(A|B)$ as the posterior: (probability of $A$ after determining $B$)

e.g. estimate $P(A = fair | B)$ and $P(A = loaded | B)$

Bayes realized that the formula he derived represented an update formula for probabilities:
- Provided you know $P(B|A)$: the likelihood of B under A
    - the likelihood of set of rolls given A = fair
    - the likelihood of set of rolls given A = three-loaded

$$ P(A|B) = \frac{P(B|A)P(A)}{P(B)} $$

$P(B|A)$ as the likelihood: (probability of generating a set of rolls $B$ under each possibility of the dice $A$)

**The likelihood**
- E.g. we know a fair die throw is (1/6) for each face:
- Can calculate the probability of getting a set of outcomes:
    - e.g. $B = \{3,1,5,2\}$. Get $P(B|A = fair)$

In [2]:
P_likelihood_fair = (1/6)**4
P_likelihood_fair

0.0007716049382716048

And, for this case, are told what the odds of outcomes are for three-loaded dice:

$$P(1)=P(2)=P(3) = \frac{1}{4} $$
$$ P(4)=P(5)=P(6) = \frac{1}{12} $$

Get $P(B|A = loaded)$ for $B = \{3,1,5,2\}$

In [3]:
P_likelihood_loaded = (1/12)*(1/4)**3 

P_likelihood_loaded 

0.0013020833333333333

In [4]:
P_likelihood_fair

0.0007716049382716048

**The final piece of the puzzle: the evidence (aka normalization)**

$P(B)$: this is the probability that I drew my result *regardless* of whether A is fair or loaded

$$ P(B|A) = \frac{P(A|B)P(B)}{P(A)} $$

$P(A)$ as the evidence

$$ P(A) = P(A|B=fair)P(B = fair) + P(A|B=loaded)P(B = loaded) $$

In [5]:
evidence = P_likelihood_fair*prior_fair + P_likelihood_loaded*prior_loaded
evidence

0.0009042245370370369

We have everything we need to calculate posteriors on $A$ given set of rolls $B$:

$$ P(A=fair|B) = \frac{P(B|A = fair)P(A=fair)}{P(B)} $$

$$ P(A=loaded|B) = \frac{P(B|A = loaded)P(A=loaded)}{P(B)} $$

In [6]:
posterior_fair \
= P_likelihood_fair*prior_fair/evidence

print(posterior_fair)
print(prior_fair)

0.6399999999999999
0.75


In [7]:
posterior_loaded \
= P_likelihood_loaded*prior_loaded/evidence

print(posterior_loaded)
print(prior_loaded)

0.36000000000000004
0.25


What if we had $B = \{3, 1,5,2, 6, 1, 1, 3, 2, 1, 2, 4, 3, 2, 3\}$?

In [8]:
P_likelihood_fair2 = (1/6)**15
P_likelihood_loaded2 = (1/4)**12 * (1/12)**3

In [9]:
evidence_2 = P_likelihood_fair2*prior_fair \
+ P_likelihood_loaded2*prior_loaded

In [10]:
posterior_fair2 \
= P_likelihood_fair2*prior_fair/evidence_2

print(posterior_fair2)
print(prior_fair)

0.15610127908915505
0.75


In [11]:
posterior_loaded2 \
= P_likelihood_loaded2*prior_loaded/evidence_2

print(posterior_loaded2)
print(prior_loaded)

0.843898720910845
0.25


**As more data accumulates we shift our prior estimates**

- We are more certain that this die is loaded, given the data coming in.
- But we also factored in our previous knowledge about the world (or in this case, the pile)

**Bayesian reasoning**

- Hypotheses aren't just statements. We can ascertain uncertainties/probabilities on them.
- Data taking/inference is not done in a vacuum:
    - Prior estimates are well-held beliefs/incorporate the structure of the world.

- Can update our priors with data. But priors have weight in estimation.

e.g: **extraordinary claims against a strong prior require extraordinary evidence**

**Frequentist Reasoning**

- A = fair is our *null* hypothesis
- A = loaded is *alternate*

Take data $B$ to ascertain likelihood that data $B$ was generated by *null*.

- That is compute $P(B|A = fair)$

Then either accept or reject the null.

A hypothesis is just a statement.
- Data used to confirm or reject statement
- Probability only defined on the things you measure directly (the rolls).
- Probability of a hypothesis/inference not well defined.

No notion of using prior beliefs on the hypothesis:
- **Let the data speak for itself**

**Frequentists** interpretation of probability: the limit of frequency after many, many trials.

Basically, the Bayesians assign a probability to a hypothesis but Frequentists test a hypothesis and determine probability with repeated trials.

Some days you're a Frequentist, other days you're a Bayesian.

From a practical standpoint:
- Both frequentist and Bayesian inference can be useful
- Depends on the problem, info you have, questions you ask.

### A fun example

#### Why C3P0 is Wrong & Han Solo is a Badass

"Never tell me the odds!" https://www.countbayesie.com/blog/2015/2/18/hans-solo-and-bayesian-priors

Han is a one great pilot, so he likely will get through the asteroid field even if others will fail (crash and blow-up)

<div align = "center">
<img src="images/hansolo_asteroid.jpg" align = "center" width="1000"/>
</div>
<center><i> The odds of surviving an asteroid field are...</i></center>

## The Example: Fan of a Movie 🌙

You: big fan of _Twilight_ and a Twilight evangelist.

- Based on experience, you find about 1 in 5 people (or 20%) are also Twilight groupies.



![](https://upload.wikimedia.org/wikipedia/en/9/93/The_Twilight_Saga-_New_Moon_poster.JPG)

Now you meet Rory:
- they told me they just watched _The Twilight Saga: New Moon_ (the second movie) at a friend's house!

There are two possible scenarios here:

1. Rory is a fan and watched the beloved sequel to _Twilight_. 

- I estimate that if you're a fan, the chances of you watching one the five awe-inspiring movies is about 60%. (It would be more, but we fans get busy.)

2. Rory isn't a fan and watched the sequel because it was just on. 

- I estimate that if you're not a fan, there's only a 5% chance of you watching one of the movies (about 10x less likely).

Let's define two events:

* $A$ = 🌙 fan
* $B$ = watched 🌙

Reviewing Bayes' Theorem, we get:

$P(A) = P(\text{🌙 fan}) = 0.20$

$P(B|A) = P(\text{watched 🌙, given you are a 🌙 fan}) = 0.60$

$P(B|\neg A) = P(\text{watched 🌙, given you are NOT a 🌙 fan}) = 0.05$

$\begin{aligned}
P(B) &= P(\text{watched 🌙})  \\
     &= P(B|A)\cdot P(A) + P(B|\neg A)\cdot P(\neg A) \\
     &= 0.60\cdot 0.20 + 0.05\cdot 0.80 \\
     &= 0.16 
\end{aligned}$

So we get that the chances of someone like Rory is a _Twilight_ fan is:

$$\begin{aligned}
P(A|B) &= P(\text{🌙 fan, given you watched 🌙}) \\ 
       &= \frac{P(B|A)P(A)}{P(B)} \\
       &= \frac{0.60 \cdot 0.20}{0.16} \\
       &= 0.75
\end{aligned}$$

So there's a $75\%$ chance Rory is a fan too!
- Should you start your Twilight preaching?
- Or should you welcome him as a fellow lover of all things good?

# Bayes' Theorem Examples 

## Scenario: Diagnostic Testing 🤢

Pretend we test positive for a rare disease. What are the chances that we actually have the disease?

## The Setup

- The disease is rare: only 0.01% of the population has it
- The test will be correct 99% of the time, whether or not you have the disease 

## Defining Events & Probabilities

$A$ → I have the disease

$B$ → I test positive

----------

$P(A) = P(\text{have the disease}) = 0.0001$

$P(B)$ = $P(\text{test positive})$

$P(A|B)$ = $P(\text{having the disease given that test positive})$







## Use Bayes' Theorem

$P(sick | positive) =  \frac{P(positive | sick)\ P(sick)}{P(positive)}$

But we can be more specific by knowing the probability of testing positive for each case

$P(positive) = P(sick)\ P(positive | sick) + P(not\ sick)\ P(positive|not\ sick)$

This leads to our ultimate equation

$P(sick | positive) =  \frac{P(positive | sick)\ P(sick)}{P(sick)\ P(positive | sick) + P(not\ sick)\ P(positive|not\ sick)}$

## Calculate Posterior Probability

In [12]:
# probability of sick & healthy
p_sick = 0.0001
p_not_sick = 1 - p_sick

# probability of test being correct (accuracy)
p_positive_sick = 0.99
p_positive_not_sick = 1 - p_positive_sick

# probability of positive test (whether or not you are sick)
p_positive = p_sick*p_positive_sick + p_not_sick*p_positive_not_sick

Next, we defining the parts for Bayes' Theorem

In [13]:
# belief before hand
sick_prior = p_sick

# how likely are we to test positive and be sick
pos_sick_likelihood = p_positive_sick 

# Add normalization
pos_norm = p_positive

For convenience, I've created a function that combines all three parts to calculate the **posterior** probability

In [14]:
# handy dandy function
def find_posterior(prior, likelihood, norm):
    return prior * likelihood / norm

Lastly, we use the function to calculate our result

In [15]:
prob_youre_sick = find_posterior(sick_prior, pos_sick_likelihood, pos_norm)

print(f'You have a {prob_youre_sick*100:.2f}% chance of actually being sick')

You have a 0.98% chance of actually being sick


## Interpretation

**Discussion:** Does this make sense?

# Exercises

## Common Diseases

How would the calculation in the Diagnostic Testing scenario change if 2% of the population had the disease?

<details>
    <summary><b><u>Click Here for Answer Code</u></b></summary>

```python
    # probability of sick & healthy
    p_sick = 0.02
    p_not_sick = 1 - p_sick

    # probability of test being correct (accuracy)
    p_positive_sick = 0.99
    p_positive_not_sick = 1 - p_positive_sick

    # probability of positive test (whether or not you are sick)
    p_positive = p_sick*p_positive_sick + p_not_sick*p_positive_not_sick

    # belief before hand
    sick_prior = p_sick

    # how likely are we to test positive and be sick
    pos_sick_likelihood = p_positive_sick 

    # Add normalization
    pos_norm = p_positive

    prob_youre_sick = find_posterior(sick_prior, pos_sick_likelihood, pos_norm)

    print(f'You have a {prob_youre_sick*100:.2f}% chance of actually being sick')
```    
</details>

In [19]:
# Your work 
# probability of sick & healthy
p_sick = 0.02
p_not_sick = 1 - p_sick

# probability of test being correct (accuracy)
p_positive_sick = 0.99
p_positive_not_sick = 1 - p_positive_sick

# probability of positive test (whether or not you are sick)
p_positive = p_sick*p_positive_sick + p_not_sick*p_positive_not_sick

# belief before hand
sick_prior = p_sick

# how likely are we to test positive and be sick
pos_sick_likelihood = p_positive_sick 

# Add normalization
pos_norm = p_positive

prob_youre_sick = find_posterior(sick_prior, pos_sick_likelihood, pos_norm)

print(f'You have a {prob_youre_sick*100:.2f}% chance of actually being sick')

You have a 66.89% chance of actually being sick


## Better Tests

How would the calculation in the original Diagnostic Testingscenario change if the test were 99.9% correct?


<details>
    <summary><b><u>Click Here for Answer Code</u></b></summary>

```python    

# probability of sick & healthy
p_sick = 0.0001
p_not_sick = 1 - p_sick

# probability of test being correct (accuracy)
p_positive_sick = 0.999
p_positive_not_sick = 1 - p_positive_sick

# probability of positive test (whether or not you are sick)
p_positive = p_sick*p_positive_sick + p_not_sick*p_positive_not_sick

# belief before hand
sick_prior = p_sick

# how likely are we to test positive and be sick
pos_sick_likelihood = p_positive_sick 

# Add normalization
pos_norm = p_positive

prob_youre_sick = find_posterior(sick_prior, pos_sick_likelihood, pos_norm)

print(f'You have a {prob_youre_sick*100:.2f}% chance of actually being sick')
```    
</details>

In [22]:
# Your work

# probability of sick & healthy
p_sick = 0.0001
p_not_sick = 1 - p_sick

# probability of test being correct (accuracy)
p_positive_sick = 0.999
p_positive_not_sick = 1 - p_positive_sick

# probability of positive test (whether or not you are sick)
p_positive = p_sick*p_positive_sick + p_not_sick*p_positive_not_sick

# belief before hand
sick_prior = p_sick

# how likely are we to test positive and be sick
pos_sick_likelihood = p_positive_sick 

# Add normalization
pos_norm = p_positive

prob_youre_sick = find_posterior(sick_prior, pos_sick_likelihood, pos_norm)

print(f'You have a {prob_youre_sick*100:.2f}% chance of actually being sick')

You have a 9.08% chance of actually being sick


# Level Up: Harder Exercise

## Two Tests

In the original Diagnostic Testing scenario above, what are the chances that we have the disease if we take the test twice and it is positive both times? (Assume the tests are independent.)


<details>
    <summary><b><u>Click Here for Answer Code</u></b></summary>

```python
# probability of sick & healthy
p_sick = 0.0001
p_not_sick = 1 - p_sick

# probability of 2 tests being correct
p_positive_sick = 0.99
p_2positive_sick = p_positive_sick**2
    
# probability of 2 tests being incorrect
p_positive_not_sick = 1 - p_positive_sick
p_2positive_not_sick = p_positive_not_sick**2

# probability of positive test (whether or not you are sick)
p_2positive = p_sick*p_2positive_sick + p_not_sick*p_2positive_not_sick

# belief before hand
sick_prior = p_sick

# how likely are we to test positive and be sick
pos_sick_likelihood = p_2positive_sick 

# Add normalization
pos_norm = p_2positive

prob_youre_sick = find_posterior(sick_prior, pos_sick_likelihood, pos_norm)

print(f'You have a {prob_youre_sick*100:.2f}% chance of actually being sick')
```    
</details>

In [None]:
Chance have disease if take test twice given positive both times

In [18]:
# Your work
p_sick= 0.0001
p_not_sick = 1 - p_sick

# probability of test being correct (accuracy)
p_positive_sick = 0.99
p_positive_not_sick = 1 - p_positive_sick



# Level Up: The Bayes Factor

Another method for using Bayesian reasoning relies on _odds_ over _probability_. This can feel more intuitive.

If we rearrange Bayes' Theorem, we can use the odds:

$$P(A|B) =  \frac{P(B|A)P(A)}{P(B)}$$

$$O(A|B) =  O(A)\frac{P(B|A)}{P(B|\neg A)}$$


We call this $\frac{P(B|A)}{P(B|\neg A)}$ the **Bayes factor** and can be seen how we update the prior odds.

To use this we, say what we want is the _odds_ that $A$ is true given $B$ is true (observed)

## Using the Twilight Example

* $P(A) = 20\%$ (or $O(A)= \frac{1}{4}$ or {$1:4$}) of all people are _Twilight_ fans 
* $P(B|A) = 60\%$ of all _Twilight_ fans have seen one of the movies in the past month
* $P(B|\neg A) = 5\%$ of all non-_Twilight_ fans have seen one of the movies in the past month

Well, then our odds form turns to:

$$\begin{aligned}
O(A|B) &= O(A)\frac{P(B|A)}{P(B|\neg A)} \\
\\
       &= \frac{1}{4} \cdot \frac{0.60}{0.05} \\
       &= \frac{1}{4} \cdot 12 \\
       &= \frac{12}{4} \\
       &= \frac{3}{1} \\
       & = 3:1 \text{ odds}
\end{aligned}$$

Or simply $3$ out of $4$ people who watched a movie from the _Twilight_ saga is a fan. That's the same $75\%$ we found from before!

Using a pretend population of $50$ random people we still get:

-  $0.2\cdot 50 = 10$ are _Twilight_ fans
    - 4 haven't watched in the past month: 🌙
    - 6 have watched in the past month: 🧛
-  $0.8\cdot 50 = 40$ are not _Twilight_ fans: 
    - 38 haven't watched in the past month: 🙂
    - 2 have watched in the past month: 👁

<pre>

🧛🧛👁👁🙂🙂🙂🙂🙂🙂
🧛🧛🙂🙂🙂🙂🙂🙂🙂🙂
🧛🧛🙂🙂🙂🙂🙂🙂🙂🙂
🌙🌙🙂🙂🙂🙂🙂🙂🙂🙂
🌙🌙🙂🙂🙂🙂🙂🙂🙂🙂

</pre>

## Interpreting further

- There's $1$:$4$ odds you are a fan given no information
- But if you watched a _Twilight_ movie in the past month, we know you're more likely to be a fan by the **Bayes factor** of 12 
- We update our _prior_ belief with that knowledge to find we expect that there are 3 times **more** fans who watched in the past month than non-fans who watched in the past month