# Bayes's Theorum

$$P(A|B) = \frac{P(A)P(B|A)}{P(B)}$$

As an example, we used data from the General Social Survey and Bayes’s Theorem to compute conditional probabilities. But since we had the complete dataset, we didn’t really need Bayes’s Theorem. It was easy enough to compute the left side of the equation directly, and no easier to compute the right side.

But often we don’t have a complete dataset, and in that case Bayes’s Theorem is more useful. In this chapter, we’ll use it to solve several more challenging problems related to conditional probability.

In [1]:
import pandas as pd

In [2]:
df = pd.read_csv("gss_bayes.csv")
df.head()

Unnamed: 0,caseid,year,age,sex,polviews,partyid,indus10
0,1,1974,21.0,1,4.0,2.0,4970.0
1,2,1974,41.0,1,5.0,0.0,9160.0
2,5,1974,58.0,2,6.0,1.0,2670.0
3,6,1974,30.0,1,5.0,4.0,6870.0
4,7,1974,48.0,1,5.0,4.0,7860.0


## The Cookie Problem
Suppose there are two bowls of cookies.

- Bowl 1 contains 30 vanilla cookies and 10 chocolate cookies.

- Bowl 2 contains 20 vanilla cookies and 20 chocolate cookies.

Now suppose you choose one of the bowls at random and, without looking, choose a cookie at random. If the cookie is vanilla, what is the probability that it came from Bowl 1?

What we're trying to find is $P(B_1 | V)$ which is the probabililty of the cookie being from bowl 1 given it's vanilla.

$$P(B_1 | V) = \frac{P(B_1)P(V|B_1)}{P(V)}$$

- $P(B_1)$ the prob(bowl_1) is .5 since bowl 1 has 40 and bowl 2 has 40
- $P(V|B_1)$ is cond(vanilla, given=bowl_1) is 3/4 since of bowl1 vanilla is 3/4
- $P(V)$ is 5/8 since there are 50 vanilla cookies total out of 80 total cookies

In [3]:
p_b1 = .5
p_v_given_b1 = .75
p_v = 5/8

In [4]:
p_b1_given_vanilla = p_b1 * p_v_given_b1 / p_v
p_b1_given_vanilla

0.6

In [5]:
import pandas as pd

In [6]:
def prob(a):
    return a.mean()

In [7]:
def conditional(proposition, given):
    """Probability of A conditioned on given."""
    prob(proposition[given])

## Diachronic Bayes
- "Dia" "chronos" = through time
- Another way of thinking of Bayes's theorum: gives us a way to update the probability of a hypothesis, H, given some body of data, D


$$P(H|D) = \frac{P(H)P(D|H)}{P(D)}$$

$$Posterior = \frac{prior*likelihood}{total  probability}$$

- $P(H)$ is the *prior*, the probability of the hypothesis before we see the data
- $P(H|D)$ is the *posterior*, the probability of the hypothesis after we see the data
- $P(D|H)$ is the *likelihood*, the probability of the data under the hypothesis. "The probability of the data given the hypothesis"
- $P(D)$ is the *total probability of the data* under any hypothesis
- Remember that $P(D)$ is $\sum_i$ P(H_i)(P|H_i)$ 
- This means the set of hypotheses are mutually exclusive and collectively exhaustive

In some cases the prior is subjective; that is, reasonable people might disagree, either because they use different background information or because they interpret the same information differently.

The likelihood is usually the easiest part to compute.

Computing the total probability of the data can be tricky. It is supposed to be the probability of seeing the data under any hypothesis at all, but it can be hard to nail down what that means.

Most often we simplify things by specifying a set of hypotheses that are:

Mutually exclusive, which means that only one of them can be true, and

Collectively exhaustive, which means one of them must be true.

In [8]:
table = pd.DataFrame(index=["Bowl1", "Bowl2"])
table["prior"] = 1/2, 1/2
table["likelihood"] = 3/4, 1/2
table

Unnamed: 0,prior,likelihood
Bowl1,0.5,0.75
Bowl2,0.5,0.5


Priors NEED to add to 1.

Likelihoods don't need to.

You might notice that the likelihoods don’t add up to 1. That’s OK; each of them is a probability conditioned on a different hypothesis. There’s no reason they should add up to 1 and no problem if they don’t.

In [9]:
# next step is to multiply priors by the likelihoods
# "unnormalized posteriors"
table["unnorm"] = table.prior * table.likelihood
table

Unnamed: 0,prior,likelihood,unnorm
Bowl1,0.5,0.75,0.375
Bowl2,0.5,0.5,0.25


What we did was calculate the numerator of Bayes's Theorum:
$$P(B_i)P(D|B_i)$$

Now if we sum them up, we get the denominator
$$P(B_1)P(D|B_1) + P(B_2)P(D|B_2)$$

In [10]:
# P(D) is 5/8, which the probability of seeing the data from any bowl
prob_data = table['unnorm'].sum()
prob_data

0.625

In [11]:
# Get the posterior 
# P(H|D)
# The probability of the hypothesis given the data
table["posterior"] = table["unnorm"] / prob_data

In [12]:
# posteriors P(H|D) need to sum to 1
# mutually exclusive and collectively exhaustive
table

Unnamed: 0,prior,likelihood,unnorm,posterior
Bowl1,0.5,0.75,0.375,0.6
Bowl2,0.5,0.5,0.25,0.4


When we add up the unnormalized posteriors and divide through, we force the posteriors to add up to 1. This process is called “normalization”, which is why the total probability of the data is also called the “normalizing constant”.

## The Dice Problem
- A Bayes table can solve problems with more than two hypotheses:

> Suppose we have a box with a d6 die, a d8 die, and a d12 die. If we choose one die at random, roll it, and report the outcome is a 1, what is the probability that I chose the 6-sided die?]

p(1) on d6 is 1/6
p(1) on d8 is 1/8
p(1) on d12 is 1/12

In [13]:
from fractions import Fraction

In [14]:
# The index is the set of hypotheses
# We'll use 6, 8, and 12 to represent the sides of each die.
dice_table = pd.DataFrame(index=[6, 8 , 12])

In [15]:
# P(H) is our prior, probability of seeing of the hypothesis before seeing the data
dice_table["prior"] = Fraction(1, 3)

dice_table["likelihood"] = Fraction(1, 6), Fraction(1, 8), Fraction(1, 12)
dice_table

Unnamed: 0,prior,likelihood
6,1/3,1/6
8,1/3,1/8
12,1/3,1/12


In [16]:
def update(table):
    """Compute the posterior probabilities"""
    table["unnorm"] = table.prior * table.likelihood
    prob_data = table.unnorm.sum()
    table["posterior"] = table.unnorm / prob_data
    return prob_data

In [17]:
prob_data = update(dice_table)

In [18]:
dice_table

Unnamed: 0,prior,likelihood,unnorm,posterior
6,1/3,1/6,1/18,4/9
8,1/3,1/8,1/24,1/3
12,1/3,1/12,1/36,2/9


- P(1 on the d6 | we roll a one on one of 3 dice d6, d8,d12) = 4/9
- P(1 on the d8 | we roll a one on one of 3 dice d6, d8,d12 = 1/3
- P(1 on the d12 | we roll a one on one of 3 dice d6, d8,d12) = 2/9

## Monte Hall Problem

#### Introduction
- The host, Monte Hall, shows us 3 closed doors 1, 2, and 3. There's a prize behind each door.
- One perize is valuable like a car adn the other two are less valuable like goats
- The object of the game is to guess the door w/ the most valuabe prize. If you guess correctly, you keep it.


#### Game Setup
To answer this question, we have to make some assumptions about the behavior of the host:

Monty always opens a door and offers you the option to switch.

He never opens the door you picked or the door with the car.

If you choose the door with the car, he chooses one of the other doors at random.

In [20]:
monte = pd.DataFrame(index=["Door 1", "Door 2", "Door 3"])

monte["prior"] = Fraction(1, 3)
monte

Unnamed: 0,prior
Door 1,1/3
Door 2,1/3
Door 3,1/3


The data is that Monty opened Door 3 and revealed a goat. So let’s consider the probability of the data under each hypothesis:

If the car is behind Door 1, Monty chooses Door 2 or 3 at random, so the probability he opens Door 3 is 1/2.

If the car is behind Door 2, Monty has to open Door 3, so the probability of the data under this hypothesis is 1.

If the car is behind Door 3, Monty does not open it, so the probability of the data under this hypothesis is 0.



In [21]:
monte["likelihood"] = Fraction(1, 2), 1, 0
monte

Unnamed: 0,prior,likelihood
Door 1,1/3,1/2
Door 2,1/3,1
Door 3,1/3,0


In [22]:
update(monte)

Fraction(1, 2)

In [23]:
monte

Unnamed: 0,prior,likelihood,unnorm,posterior
Door 1,1/3,1/2,1/6,1/3
Door 2,1/3,1,1/3,2/3
Door 3,1/3,0,0,0


As this example shows, our intuition for probability is not always reliable. Bayes’s Theorem can help by providing a divide-and-conquer strategy:

1. First, write down the hypotheses and the data.

2. Next, figure out the prior probabilities.

3. Finally, compute the likelihood of the data under each hypothesis.

_The Bayes table does the rest._

## Exercise 1

> Suppose you have two coins in a box. One is a normal coin with heads on one side and tails on the other, and one is a trick coin with heads on both sides. You choose a coin at random and see that one of the sides is heads. What is the probability that you chose the trick coin?




In [24]:
coinbox = pd.DataFrame(index=["Fair Coin", "Trick Coin"])
coinbox["prior"] = Fraction(1, 2), Fraction(1, 2)
coinbox["likelihood"] = Fraction(1, 2), 1
update(coinbox)

Fraction(3, 4)

In [25]:
coinbox

Unnamed: 0,prior,likelihood,unnorm,posterior
Fair Coin,1/2,1/2,1/4,1/3
Trick Coin,1/2,1,1/2,2/3


## Exercise 2

> Suppose you meet someone and learn that they have two children. You ask if either child is a girl and they say yes. What is the probability that both children are girls?

Hint: Start with four equally likely hypotheses.

In [29]:
# prior = 2 offspring
offspring = pd.DataFrame(index=["Girl, Girl", "Girl, Boy", "Boy, Girl", "Boy, Boy"])
offspring["prior"] = Fraction(1, 4)
offspring["likelihood"] = 1, 1, 1, 0
update(offspring)

Fraction(3, 4)

In [30]:
offspring

Unnamed: 0,prior,likelihood,unnorm,posterior
"Girl, Girl",1/4,1,1/4,1/3
"Girl, Boy",1/4,1,1/4,1/3
"Boy, Girl",1/4,1,1/4,1/3
"Boy, Boy",1/4,0,0,0


## Exercise 3
> There are many variations of the Monty Hall problem.
For example, suppose Monty always chooses Door 2 if he can, and only chooses Door 3 if he has to (because the car is behind Door 2).

> If you choose Door 1 and Monty opens Door 2, what is the probability the car is behind Door 3?

> If you choose Door 1 and Monty opens Door 3, what is the probability the car is behind Door 2?





In [32]:
# If car is Door 1, Monte will pick Door 2
# If the car is behind Door 2, Monte will pick door 3
# If the car is behind Door 3, Monte will always open door 2
monte2 = pd.DataFrame(index=["Door 1", "Door 2", "Door 3"])
monte2["prior"] = Fraction(1, 3)
monte2["likelihood"] = 1, 0, 1
update(monte2)

Fraction(2, 3)

In [33]:
monte2

Unnamed: 0,prior,likelihood,unnorm,posterior
Door 1,1/3,1,1/3,1/2
Door 2,1/3,0,0,0
Door 3,1/3,1,1/3,1/2


## Exercise 4
> M&M’s are small candy-coated chocolates that come in a variety of colors.
Mars, Inc., which makes M&M’s, changes the mixture of colors from time to time. In 1995, they introduced blue M&M’s.

- In 1994, the color mix in a bag of plain M&M’s was 30% Brown, 20% Yellow, 20% Red, 10% Green, 10% Orange, 10% Tan.

- In 1996, it was 24% Blue , 20% Green, 16% Orange, 14% Yellow, 13% Red, 13% Brown

> Suppose a friend of mine has two bags of M&M’s, and he tells me that one is from 1994 and one from 1996. He won’t tell me which is which, but he gives me one M&M from each bag. One is yellow and one is green. What is the probability that the yellow one came from the 1994 bag?

_Hint: The trick to this question is to define the hypotheses and the data carefully._



In [39]:
# A: yellow from 94, green from 96
# B: yellow from 96, green from 94

candy = pd.DataFrame(index=["Yellow94, Green96", "Yellow96, Green94"])
candy["prior"] = Fraction(1, 2)

# .2*.2 is p(1994 yellow) * p(1996 green)
# .14 * .1 is p(1994 green) * p(1996 yellow)
candy["likelihood"] = 0.2 * 0.2, 0.14*0.1
update(candy)

0.027000000000000003

In [40]:
candy

Unnamed: 0,prior,likelihood,unnorm,posterior
"Yellow94, Green96",1/2,0.04,0.02,0.740741
"Yellow96, Green94",1/2,0.014,0.007,0.259259
