In [6]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns

# ![](https://ga-dash.s3.amazonaws.com/production/assets/logo-9f88ae6c9c3871690e33280fcf557f33.png) Introduction to Bayesian Statistics
Week 10 | Lesson 1.1

### LEARNING OBJECTIVES

After this lesson, you will be able to:

- Understand the building blocks of probability theory
- Calculate joint probability
- Derive and explain Bayes' theorem

## Let's talk a little about basic probability concepts.



## There are three axioms of probability

**Nonnegativity**

For any event $A$, the probability of the event must be greater than or equal to zero.

### $$ 0 \le P(A) $$


**Unit measure**

The probability of the entire sample space is 1.

### $$ P(S) = 1 $$

**Additivity**

For mutually exclusive, or in other words "disjoint" events $E$, the probability of any of the events occuring is equivalent to the sum of their probabilties.

### $$ P\left(\cup_{i=1}^{\infty}\; E_i \right) = \sum_{i=1}^{\infty} P(E_i) $$

> Check: what's an example of this?

We can see this geometrically...
<img src="./assets/images/disjoint.png" width=800px>


## There are a few more key concepts

**The probability of no event**

The probability of the empty set, denoted $\emptyset$, is zero.


### $$ P\left(\emptyset \right) = 0 $$

**The probability of A or B occuring (union)**

The probability of event $A$ or event $B$ occuring is equivalent to the sum of their individual probabilities minus the intersection of their probabilities (the probability they both occur).

### $$ P(A \cup B) = P(A) + P(B) - P(A \cap B)$$

We can see this geometrically...

<img src = "./assets/images/union.png" width = 800px>

**Conditional probability**

The probability of an event conditional on another event is written using a vertical bar between the two events. The probability of event $A$ occuring _given_ event $B$ occurs is calculated:

### $$ P(A | B) = \frac{P(A \cap B)}{P(B)} $$

Meaning the probability of both $A$ and $B$ occuring divided by the probability that $B$ occurs at all.

**Joint probability**

The joint probability of two events $A$ and $B$ is a reformulation of the above equation.

### $$ P(A \cap B) = P(A|B) \; P(B) $$

Verbally, if we want to know the probability that both $A$ and $B$ happen, we can multiply the probability that $B$ happens by the probability that $A$ happens given $B$ happens.


**Joint probability**

What if we assume that events A and B are independent?  What does that mean? And what impact does it have on the formula for the joint probability?



If A and B are independent, then $P(A|B) = P(A)$ so:

### $$ P(A \cap B) = P(A) \; P(B) $$


## OK we're ready for Bayes now!

But first.... let's talk random variables.  

What is a random variable?

## Here's what the internet says...

- A random variable is a set of possible values from a random experiment.

- A quantity having a numerical value for each member of a group, especially one whose values occur according to a frequency distribution

- A variable quantity whose value depends on possible outcomes

- A random variable is a variable whose value is unknown or a function that assigns values to each of an experiment's outcomes.

- A numerical characteristic that takes on different values due to chance

## Types of random variables

- discrete 
- continuous 

> What is an example of each?

## Discrete random variables

We tend to find it easier to think about probability for discrete random variables.

For example, what is the probability of rolling a 2 if you have a fair sided die?

Answer: 1/6


## Continuous random variables

For continuous random variables, it's a little trickier.

For example, what is the probability of a person being 6'2?

Answer: 0

Since the random variable can take any value between $0$ and $\infty$, the probability of a given value is
$$ 1 / \infty$$

## Er... so now what?

The probability of a continuous random variable being exactly a given number is 0.  But the probability of being in a given interval is non-zero.

For example, the probability of a person being between 6' and 6'2 has a non-zero value.

So how do we calculate that?

## MATH

...cue scary music...

## Probability density distribution

"A function of a continuous random variable, whose integral across an interval gives the probability that the value of the variable lies within the same interval." [wiki](https://en.wikipedia.org/wiki/Probability_density_function)


<img src="./assets/images/norm.png" width =600 align="center">


## OK, time for Bayes

### Bayes Rule

Bayes Rule relates the probability of $A$ given $B$ to the probability of $B$ given $A$. This rule is critical for performing statistical inference, as we shall see shortly. It is formulated as:

### $$ P(A|B) = \frac{P(B|A)\;P(A)}{P(B)} $$



## An example from the first few weeks

We were sitting in this room, and the firm alarm went off.  We all stopped momentarily, but didn't get up because we didn't think there was really a fire.  

Why were we so sure?

## BAYES!  

(we do it instinctively!)

## Let's apply this to the fire alarm case

### $$ P(fire|alarm) = \frac{P(alarm|fire)\;P(fire)}{P(alarm)} $$

Let's put some numbers to this


## So based on our previous experiences, we have a belief about the probability of an event occurring.

## We can apply this same concept to models!

We'll do a basic introduction to this today, and then revisit tomorrow.

## Bayes' theorem in the context of statistical modeling


### $$P\left(\;model\;|\;data\;\right) = \frac{P\left(\;data\;|\;model\;\right)}{P(\;data\;)} P\left(\;model\;\right)$$

Or in plain english:

**What is the probability of our model being true, given the data we have? This depends on the likelihood of the observed data given our model and the data itself, as well as our prior belief that this model is true.**


**Terminology**

### $$P\left(\;model\;|\;data\;\right) = \frac{P\left(\;data\;|\;model\;\right)}{P(\;data\;)} P\left(\;model\;\right)$$

where:

$P\left(\;model\;|\;data\;\right)$ is the **posterior probability**

$P\left(\;data\;|\;model\;\right)$ is the **likelihood,** which is the probability of what we observed  given our prior belief about the model. 

${P(\;data\;)}$ is the **marginal probability** of the observed data. This is also known as the **evidence** or the **normalization constant.**

$P\left(\;model\;\right)$ is the **prior probability** belief. It is what you thought the model was before observing the events.

---

Let's take a very simple example to illustrate the basics.  

You have two coins in a bag.  One you know is fair, and one you know is biased.

    coin FAIR has a 50% chance of flipping heads.
    coin RIGGED has 99% chance of flipping heads.
    
Your friend chooses one of the two coins at random. He flips the coin and gets heads. 

What is the probability that the coin flipped was **FAIR**?

### $$P\left(\;model\;|\;data\;\right) = \frac{P\left(\;data\;|\;model\;\right)}{P(\;data\;)} P\left(\;model\;\right)$$

** The prior**

Our "models" here are the two coins.  

i.e., one is a model in which there is a 50% chance of getting heads, and one is a model in which there is a 99% chance of getting heads.

Given that there are only two models, and we pick at random, 

$P(model) = 0.5$

** The likelihood**

Let's assume that we believe that we got the fair coin.  So if we got the fair coin, the likehlihood is:

$$P(data|model) = 0.5$$

because with the fair coin, we have a 50% chance of getting heads.

**The evidence**

The evidence is simply the probability of getting heads in one toss, regardless of which model we have (or in other words, the weighted average between the models).

$$P(data) = 0.5*0.5 + 0.5 * 0.99 = 0.745$$

In [3]:
# Our hypothesis is our belief that the coin flipped was fair before we saw the outcome. 
# 0.5 since he chose at random.
hypothesis_fair = 0.5

# probability that we would get heads given our hypothesis was true, that the coin is the fair one:
prob_flip_given_fair = 0.5

# total probability of getting heads:
prob_heads = (0.745)

# solve for the probability our hypothesis is true given the flip:
hypothesis_true = (prob_flip_given_fair * hypothesis_fair) / prob_heads


print hypothesis_true

0.335570469799


> Check what is this in plain English?

Based on what we know about probability, what is probabilty that the coin was in fact the biased one?

Since we believe we can only have one of two models here, it is simply:

$$ 1 - 0.3356 = 0.6644 $$

Let's do this on the board to prove it.

## That was a preview to models with the Bayesian approach!  We'll come back to that tomorrow!  For now, to reinforce the idea of Bayes...

## The Monty Hall Problem (Let's Make a Deal)

This is a classic brain-teaser, based on a game show, that has a solution that often seems counter-intuitive.  You're going to use Bayes to solve it!

Here are the rules of the game
- There are three doors.  Behind one door is a brand-new car.  Behind each of the other two is a goat
- The host asks you to pick a door.  Without revealing what's behind the door you picked, the host opens one of the other two doors and shows you a goat
- Then you have a choice to make.  Stick with your original pick, or switch doors

What should you do?

## CHANGE DOORS EVERY TIME

For the lab, your task is to prove this using Bayes!

References and sources modeled off of:

http://ipython-books.github.io/featured-07/

http://stats.stackexchange.com/questions/31867/bayesian-vs-frequentist-interpretations-of-probability

http://jakevdp.github.io/blog/2014/03/11/frequentism-and-bayesianism-a-practical-intro/

https://simple.wikipedia.org/wiki/Bayes%27_theorem

https://en.wikipedia.org/wiki/Central_limit_theorem

http://www.cogsci.ucsd.edu/classes/SP07/COGS14/NOTES/binomial_ztest.pdf

https://en.wikipedia.org/wiki/Prior_probability#Uninformative_priors

https://arbital.com/p/bayes_rule/?l=1zq

https://betterexplained.com/articles/an-intuitive-and-short-explanation-of-bayes-theorem/

http://www.yudkowsky.net/rational/bayes/

http://people.stern.nyu.edu/wgreene/MathStat/Notes-2-BayesianStatistics.pdf

http://stats.stackexchange.com/questions/58564/help-me-understand-bayesian-prior-and-posterior-distributions

http://pages.uoregon.edu/cfulton/posts/bernoulli_trials_bayesian.html

http://chrisstrelioff.ws/sandbox/2014/12/11/inferring_probabilities_with_a_beta_prior_a_third_example_of_bayesian_calculations.html

https://www.chrisstucchio.com/blog/2013/magic_of_conjugate_priors.html

http://stats.stackexchange.com/questions/58564/help-me-understand-bayesian-prior-and-posterior-distributions

---