# Bayes Theorem

### Introduction

In this lesson, we'll learn about Bayes Theorem.  As we'll see, the theorem is really just an extension of what we previously learned about probability.  Let's review a couple of the greatest hits that we covered so far.  To review, the probability of an event $E$ occurring is: 

$P(E) = \displaystyle{\lim_{n \to \infty}} \frac{n(E)}{n}$

And if we are given information like another event $F$ occurs, then we can update our probability of $E$ to be:

$P(E|F) = \frac{P(EF)}{P(F)}$

Finally, another way of writing the above is to apply the chain rule which states that:

$P(EF) = P(F)P(E|F)$

As we'll see, Bayes theorem is just an extension of these rules.  In this lesson, we'll learn about Bayes theorem by calculating the probability that someone has the flu.

### Flu Test Example

Let's say that we are given the following problem.  We are administering a test to determine if individuals have the flu, and we are told that the following: 

1. The probability of someone from the population having the flu is $.005$.  
2. The test is $.98$ effective at detecting the flu.  This means that if someone has the flu, it detects the flu 98% of the time.  
3. The test has a false positive rate of .01.  This means that even when someone *does not* have the flu, 1% of the time that someone it incorrectly detects that someone has the flu.

Now, we just administered a test that came back positive.  What's the probability that, given the positive test, this patient has the flu?

> It might be surprising to learn that it's only $.32$.  

We'll get into how the above is calculated in a little bit, but the main point to take from the above is this:
> New evidence, does not determine beliefs in a vacuum, but rather should update prior beliefs.

And remember, in this example, only $.005$ of the population has the flu, or 5 individuals out of 1000.

In [34]:
1000*.005

5.0

So while a positive test does increase the probability of someone actually having the flu, it still only takes that probability to $.32$.  In the sections that follow, we'll apply our previous rules of probability to understand how we got to this probability of $.32$.

### Seeing the Formula

Let's begin by defining our events.  We'll let $H$ represent the event of someone having the flu, and $E$ represent the  event of a positive test.  

> Here, the $H$ stands for *hypothesis* and $E$ stands for evidence.

So our ultimate task is to find $P(H|E)$, that is the probability of someone having the flu given a positive test.  And that is the following:

$P(H|E) = \frac{P(EH)}{P(E)} = \frac{P(E|H)*P(H)}{P(EH) + P(EH^c)} $

> So we see that we can unpack $P(EH)$ to equal $P(E|H)*P(H)$, using the chain rule.  But more on that later.

To start, we'll just focus on $P(H)$, called our prior.  As we'll see, this prior is embedded throughout our equation above.

### Unpacking the prior

Given our problem above, we saw that $P(H)  = .005$ This is because the probability of the flu in general is $.005.$  
> $P(H)$ is called the prior.

Understanding the impact of the prior on calculating the probability is the key component to understanding Bayes theorem.

Now, we can see the prior visually by seeing what it looks like when the size of our sample is 1000.  Using a sample size of 1000, this means that we would expect 5 positive observations of our hypothesis $H$ and the rest of our hypothesis being negative $H^c$.

We display this below.

In [29]:
pd.options.display.max_columns = 35

In [1]:
import pandas as pd
import numpy as np

df = pd.DataFrame(np.full((40, 25), 'H^c'))
df.iloc[0, :] = np.concatenate([np.full((1, 5), 'H'), np.full((1, 20), 'H^c')], axis = 1)
df

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,15,16,17,18,19,20,21,22,23,24
0,H,H,H,H,H,H^c,H^c,H^c,H^c,H^c,...,H^c,H^c,H^c,H^c,H^c,H^c,H^c,H^c,H^c,H^c
1,H^c,H^c,H^c,H^c,H^c,H^c,H^c,H^c,H^c,H^c,...,H^c,H^c,H^c,H^c,H^c,H^c,H^c,H^c,H^c,H^c
2,H^c,H^c,H^c,H^c,H^c,H^c,H^c,H^c,H^c,H^c,...,H^c,H^c,H^c,H^c,H^c,H^c,H^c,H^c,H^c,H^c
3,H^c,H^c,H^c,H^c,H^c,H^c,H^c,H^c,H^c,H^c,...,H^c,H^c,H^c,H^c,H^c,H^c,H^c,H^c,H^c,H^c
4,H^c,H^c,H^c,H^c,H^c,H^c,H^c,H^c,H^c,H^c,...,H^c,H^c,H^c,H^c,H^c,H^c,H^c,H^c,H^c,H^c
5,H^c,H^c,H^c,H^c,H^c,H^c,H^c,H^c,H^c,H^c,...,H^c,H^c,H^c,H^c,H^c,H^c,H^c,H^c,H^c,H^c
6,H^c,H^c,H^c,H^c,H^c,H^c,H^c,H^c,H^c,H^c,...,H^c,H^c,H^c,H^c,H^c,H^c,H^c,H^c,H^c,H^c
7,H^c,H^c,H^c,H^c,H^c,H^c,H^c,H^c,H^c,H^c,...,H^c,H^c,H^c,H^c,H^c,H^c,H^c,H^c,H^c,H^c
8,H^c,H^c,H^c,H^c,H^c,H^c,H^c,H^c,H^c,H^c,...,H^c,H^c,H^c,H^c,H^c,H^c,H^c,H^c,H^c,H^c
9,H^c,H^c,H^c,H^c,H^c,H^c,H^c,H^c,H^c,H^c,...,H^c,H^c,H^c,H^c,H^c,H^c,H^c,H^c,H^c,H^c


Take a second to appreciate just how large this disparity is.  

The reason why it matters, is because when we are calculating our P(H|E), we are essentially comparing our false positives to our true positives.  And even though our false positive rate is small, because there are so many negative events, it still means that we see a lot of false positives relative to true positives.


Let's see this mathematically.

### Calculating the probability

So again, our task is to find $P(H|E) = \frac{P(EH)}{P(E)}$.  And this denominator, of cases when there's evidence, is the sum of two components: $P(EH)$  and $P(EH^c)$.

So we can break down our formula futher as:

$P(H|E) = \frac{P(EH)}{P(E)} = \frac{P(EH)}{P(EH) + P(EH^c)}$

Now let's start by calculating the number in the bottom right, the $P(EH^c)$, that is our probability of a false positive.

Via, the chain rule this is:
    
$P(EH^c) = P(H^c)*P(E|H^c) = .995*.01 = 0.00995$

And the probability of a true positive is:

$P(EH) = P(H)*P(E|H) = .005*.98 = 0.0049$

> So we can see that a false positive is about twice as common as a true positive.

And from here we can find the probability of someone having the flu given a positive test.  And this equals:

In [12]:
.0049/(.0049 + .00995)

0.32996632996632996

> So this is the number we got above.

We'll break this calculation down again, but for now let's visually see the comparison of false negatives and true positives.  So we multiply our false negatives and  true positives by 1000 to display this visually:

In [4]:
.995*.01*1000, .005*.98*1000

(9.950000000000001, 4.8999999999999995)

> We show our grid again, but this time we won't show all the true negative examples.

So to answer the question of what is $P(H|E)$, then we restrict our consideration to just the top row where we have $E$.

In [5]:
import pandas as pd
import numpy as np

df = pd.DataFrame(np.full((10, 15), '$E^cH^c$'))
df.iloc[0, :] = np.concatenate([np.full((1, 5), '$EH$'), np.full((1, 10), '$EH^c$')], axis = 1)
df

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14
0,$EH$,$EH$,$EH$,$EH$,$EH$,$EH^c$,$EH^c$,$EH^c$,$EH^c$,$EH^c$,$EH^c$,$EH^c$,$EH^c$,$EH^c$,$EH^c$
1,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$
2,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$
3,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$
4,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$
5,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$
6,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$
7,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$
8,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$
9,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$,$E^cH^c$


We can see that we have more false positives than true positives.

### The Problem via Statistics

Lets begin to wrap up by solving the problem again, but without the visual aid.  We define $E$ to be testing positive and $H$ to be an actual positive event.  Then we were given the following:

* $P(H) = .005$
* $P(E|H^c) = .01$ 
* $P(E|H) = .98$ 

> And this means that the probability of the flu is .005, and that we see a false positive with probability of .01, and that we correctly predict cases where there is the flu with probability .98. 

Now our task is to find the probability of actually having the flu, given a positive test, $P(H|E)$.  And this is just:
    
> $P(H|E) =  \frac{P(EH)}{P(E)}$

Next we use the chain rule to replace $P(EH)$, so we get:

> $P(H|E): \frac{P(H)*P(E|H)}{P(E)}$

> Notice that now we have numerator broken down so we can calculate the probability of a true positive: $.005*.98 = .0049$

Finally, we break down our denominator of evidence into the conditions where the we see the positive test and our hypothesis is present (a true positive) and when we have a positive test and our hypothesis is not positive (a false positive) so we can replace the denominator with:

> $P(H|E): \frac{P(H)*P(E|H)}{P(EH) + P(EH^c)}  $

And we can expand the two terms in our denominator via the chain rule so that we get:

$ P(H|E) = \frac{P(H)*P(E|H)}{P(H)*P(E|H) + P(H^c)*P(E|H^c)}$

Now let's plug our numbers into our formula to find the probability of the flu given a positive test.

$P(H|E) = \frac{.005*.98}{.005*.98 + .995*.01}$

In [64]:
(.005*.98)/(.005*.98 + .995*.01)

0.32996632996632996

So we see that the probability of someone having the flu given a positive test .3299.

### Summary

In this lesson we learned about Bayes theorem.  We saw that the main takeaway from Bayes theorem is that new evidence, does not determine beliefs in a vacuum, but rather should *update* our prior beliefs.  Because certain events are quite rare, a small false positive rate can still lead to $P(H|E)$ being low. 

Statistically, the formula for Bayes theorem derives from our formula for conditional probability:

$P(H|E): \frac{P(EH)}{P(E)} $

Which we then expand using the chain rule:

> $P(H|E): \frac{P(H)*P(E|H)}{P(EH) + P(EH^c)} = \frac{P(H)*P(E|H)}{P(H)*P(E|H) + P(H^c)*P(E|H^c)} $

### Resources

[ML Mastery Naive Bayes Classifier](https://machinelearningmastery.com/naive-bayes-classifier-scratch-python/)

[Naive Bayes - Python Data Science](https://jakevdp.github.io/PythonDataScienceHandbook/05.05-naive-bayes.html)

[Sklearn Naive Bayes](https://scikit-learn.org/stable/modules/naive_bayes.html)