# Bayes Theorem
__Aim__: SWBAT explain Bayesian versus frequentist thinking and use Bayes Theorem to calculate probabilities. 

Breaking down today's lecture:
1. Motivation: Why Bayes? (5 min)
2. Explanation: So What Do These Bayesians Say? (15 min)
3. Application: Bayes' Theorem - Understanding It and Applying It (20 min)

*Notebook based on Flatiron DS Online instructor Victor Geislinger's Bayes Theorem lecture.*

# Why Bayes? 

Suppose you're an amateur meteorologist and you have some instruments that measure things like barometric pressure, dew point, and temperature. How would you decide if it's going to rain today?

## What did Bayes think about this?

We know in general how likely it is to rain on any given day (in NYC 121 days/year - [source](https://weather-and-climate.com/average-monthly-Rainy-days,New-York,United-States-of-America)), but we have some additional info (pressure, dew point, etc) - we can and should use this to our advantage! Given what we know about the past, we can bring in a hypothesis we have about today!

Side note: He thought this way of thinking was so obvious that it wasn't even worth writing down! It wasn't until after he died that Richard Price, another mathematician, read through his notes and uncovered Bayes' way of thinking about probability and published it for the world to see. 

# So What Do These Bayesians Say?

## Basically C3P0 is wrong & Hans Solo is a badass

https://www.countbayesie.com/blog/2015/2/18/hans-solo-and-bayesian-priors

**Bayesian interpretation of probability**: the expected outcome based on some _prior_ knowledge or belief

## Other non-Star Wars examples: 

> Every time I flipped this coin it landed heads, so next time I flip it it'll likely land on heads again.

> 7-up: Based on what I know about my classmates, I'm pretty sure Lucy picked me.

> Hans is a one great pilot, so he likely will get through the asteroid field even if others will fail (crash and blow-up)

## The other faction - Frequentists

**Frequentists** interpretation of probability: If we repeat an experiment under the proposed condition, many, many times, what probability doe the aggregated results tend to? 

Basically, the Bayesians assign a probability to a hypothesis but Frequentists test a hypothesis and determine probability with repeated trials.

Some days you're a Frequentist, other days you're a Bayesian.

<img src='https://imgs.xkcd.com/comics/frequentists_vs_bayesians_2x.png' width=400>

Basically, because it's unlikely that the detector is lying, the frequentist decides it's telling the truth. However, the Bayesian knows that it's _super_ unlikely that the sun exploded, so they decide that, given their prior knowledge, the detector is lying. Here's a [lengthy interpretation](https://www.explainxkcd.com/wiki/index.php/1132:_Frequentists_vs._Bayesians) of this comic, if you're interested!


## Pondering Time 🤔

#### When we conduct hypothesis tests to make inferences about a population's mean are we acting as frequentists or Bayesians?

Generally thought of as frequentist! We assume our sample to be one of many possibilities that could be obtained over infinite samplings. We take on no prior knowledge about the situation, just the data given. Then, we suppose our population mean is equal to our sample mean and get the probability that we observe these values, given this assumption. A frequentist views this as one trial of what could be many, and interprets the probability within that context. 

Bayesians, on the other hand, would represent the uncertainty probabalistically, defining a probability distribution over the possible values of the mean and use sample data to update this distribution. The interpretation of this would not be in the context of infinite trials, but rather, within the context of the knowledge you had before sampling, and after sampling updating your beliefs. Note: Bayesian inference is a whole other field you can study! (If interested, the [German Tank Problem](https://en.wikipedia.org/wiki/German_tank_problem) is a problem that, when answered using frequentist inference methods, has a different answer than when answered using Bayesian inference methods).

[Source 1](https://www.probabilisticworld.com/frequentist-bayesian-approaches-inferential-statistics/) and [Source 2](https://cxl.com/blog/bayesian-frequentist-ab-testing/)

## Application

See [this video on discovering lost treasure](https://abcn.ws/2tsQs6l) for an application of Bayesian inference IRL!

# Bayes' Theorem: Understanding It and Applying It

## Math Time! 🤓

### Recall: Conditional Probability

#### The "and" and "or" rules

The words "and" and "or" dictate what operation we use when calculating overall probability. We'll talk about this in terms of a classic probability problem - marbles. Suppose we have a bag of 7 blue marbles, 3 white marbles, and 4 pink marbles.  
- "and" denotes multiplication
    - The probability of drawing a pink marble and a blue marble if we draw twice with replacement is 4/14 * 7/14, or about 14%
- "or" denotes addition
    - The probability of drawing a pink marble or a blue marble on our first draw is 4/14 + 7/14, or about 79%

#### Law of Total Probability

This law says that if we can achieve an event in several distinct ways, we can break down that problem into those sub-events and add those probabilities. Using our example before, suppose we want to calculate the probability of not drawing a pink marble. This can be broken down into two distinct events -- either you draw a blue marble, or you draw a white marble. 

P(not pink) = P(blue) + P(white) = 7/11 + 3/11 = 10/11

#### Conditional Probability

Sometimes we know stuff has already happened, as discussed before. So, we can limit our sample space to just the events in which we've already witnessed the first set of outcomes. 
- Notation: P(A | B) is read as _the probability of A given that B has happened_. 
- Two events are independent if P(A | B) = P(A)
- One way to calculate P(A | B) is computing P(A and B)/P(B)
    - But we don't always have this info! Cue Bayes!

### Drumroll please 🥁… Presenting, Bayes' Theorem!

$$P(A|B) = \frac{P(B|A)}{P(B)}\ P(A) $$

$$P(A|B) =  \frac{P(B|A)P(A)}{P(B)}$$

Each part (note, depending how you approach this, you might group different parts together):

- $P(A)$: *prior*
- $P(A|B)$: *posterior*
- $\frac{P(B|A)}{P(B)}$: *likelihood*

> That wasn't so bad at all!!

## Working through an example - you tested positive for a disease 🤢

### The setup

Let's assume some things to get quantify:

- Disease is rare; only 0.01% of the population has it
- Test has a sensitivity of .99 
    + Correctly identifies 99% of the time if you are sick

### Q: How do we rewrite our Bayes Theorem in the context of our problem?

#### ANSWER:

##### Use Bayes' Theorem to fill it out:

$$P(sick | positive) =  \frac{P(positive | sick)\ P(sick)}{P(positive)}$$

##### But we can expand P(positive) to be easier to calculate using what we know! (hint: use the law of total probability!)

$$P(positive) = P(sick)\ P(positive | sick) + P(not\ sick)\ P(positive|not\ sick)$$

##### Finally, our ultimate equation

$$P(sick | positive) =  \frac{P(positive | sick)\ P(sick)}{P(sick)\ P(positive | sick) + P(not\ sick)\ P(positive|not\ sick)}$$

### Work it out

#### Defining the probabilities

In [1]:
# probability of sick & healthy
p_sick = 0.0001
p_not_sick = 1 - p_sick

In [2]:
# probability of test being correct (accuracy)
p_positive_sick = 0.99
p_positive_not_sick = 1 - p_positive_sick

In [3]:
# probability of positive test (whether or not you are sick)
p_positive = p_sick*p_positive_sick + p_not_sick*p_positive_not_sick

#### Defining the parts for Bayes' Theorem

In [4]:
# belief before hand
prior = p_sick

# how likely are we to test positive and be sick
likelihood = p_positive_sick / p_positive

In [5]:
# handy dandy function
def find_posterior(prior, likelihood):
    return likelihood * prior

#### Final result

In [6]:
prob_youre_sick = find_posterior(prior, likelihood)

# multiplying by 100 just to give us a percent (rather than a decimal probability)
print(f'You have a {prob_youre_sick*100:.2f}% chance of actually being sick')

You have a 0.98% chance of actually being sick


#### Q: Interpretation? Does this make sense?
Let's work out a branching diagram.

## Knowledge Check! 🧠 

> Pretend we test positive for a disease twice (independent tests!)

### Back to the code!

In [7]:
# knowledge from having the disease
prior = prob_youre_sick

# Why doesn't likelihood not change?
# Answer: same situation! only our KNOWLEDGE has changed 🧠

In [8]:
prob_sick_after_2_tests = find_posterior(prior, likelihood)
print(f'''
    You have a {prob_sick_after_2_tests*100:.2f}% chance of actually being 
    sick after two positive tests
''')


    You have a 96.12% chance of actually being 
    sick after two positive tests

