# Lecture 2.2: Bayes Theorem

## Outline

* Conditional probablity
* Bayes Theorem
* Chain rule for probability

## Problem

You have a database of 100 emails.

* 60 of those 100 emails are spam
    * 48 of those that are spam have the word "buy"
    * 12 of those that are spam don't have the word "buy"
* 40 of those 100 emails aren't spam
    * 4 of those that aren't spam have the word "buy"
    * 36 of those that aren't don't have the word "buy"

What is the probability that an email is spam if it has the word "buy"? 

### Solution

There are 48 emails that are spam and have the word "buy".
And there are 52 emails that have the word "buy": 48 that are spam plus 4 that aren't spam.
So the probability that an email is spam if it has the word "buy" is 48/52 = 0.92

If you understood this answer then you have understood the Bayes Theorem (in a nutshell).

## Some Definitions:

### Joint Probability

**Q.** Consider two events $A$ & $B$. How can we characterize the **intersection** of these events?

**A.** With the ** joint probability ** of $A$ and $B$, written $P(A \cap B)$.

e.g.  
$P(A) =$ probability of being female  
$P(B) =$ probability of having long hair

$P(A \cap B)$ = the probability of being female AND having long hair 

### Conditional Probability

**Q.** Suppose event $B$ has occurred. What quantity represents the probability of $A$ given this information about $B$?

**A.** The intersection of $A$ & $B$ divided by region $B$.

This is called the ** conditional probability ** of $A$ given $B$ - the probability that event A occurs, given that event B has occurred.

It is written $P(A|B)$, and is given by

$$ P(A|B) = \frac{P(A\cap B)}{P(B)} $$

Read $P(A|B)$ as "probability of $A$, given $B$".

Notice, with this we can also write $P(A \cap B) = P(A|B) \times P(B)$.

### More on Conditional Probability

* Conditioning on event $B$ means changing the sample space to $B$


* Think of $P(A|B)$ as the chance of getting an $A$, from the set of $B$'s only


* The symbol $P(\ \ |B)$ should behave exactly like the symbol $P$

    For example,  
    
$$P(C \cup D) = P(C) + P(D) - P(C \cap D)$$   
    
$$P(C \cup D|B) = P(C|B) + P(D|B) - P(C \cap D|B)$$

### Independent Events

Events A and B are ** _independent_ ** if and only if: $$P(A \cap B) = P(A)P(B)$$

Equivalently,

$$ P(A|B) = P(A) $$
$$ P(B|A) = P(B) $$


**Q.** Victor is watching a roulette table in a casino and notices that the last five outcomes were black. He figures that the chances of getting black six times in a row is very small (about 1/64) and puts his paycheck on red. What is
wrong with his reasoning?

## Probability with the help of small animals

### Sample Space

<img src="images/probability1.jpg" width="700">

**Q**: If we randomly pick one small animal, what is the probablity that we get a puppy?

<img src="images/probability2.jpg" width="700">

#### $P(puppy) = 4/9$

**Q**: If we randomly pick one small animal, what is the probablity that it is dark-colored?

<img src="images/probability3.jpg" width="700">

#### $P(dark) = 5/9$

**Q**: Given that we picked a dark-colored animal, what is the probability that it is a puppy?
<img src="images/probability4.jpg" width="700">



<img src="images/probability5.jpg" width="700">

#### $P(puppy|dark) = \frac{P(\text{dark } \cap \text{ puppy})}{P(\text{dark})} = \frac{^2/_9}{^5/_9} = \frac{.222}{.556} = .4$

### New statement of law of total probability

If $B_1, \dots, B_k$ partition $S$, then for any event A,

$$ P(A) = \sum_{i = 1}^k P(A \cap B_i) = \sum_{i = 1}^k P(A | B_i) P(B_i) $$

### Example:

Tom gets the bus to campus every day. The bus is on time with probability 0.6, and late with probability 0.4. The buses are sometimes crowded and sometimes noisy, both of which are problems for Tom as he likes to use the bus journeys to do his Stats assignments. When the bus is on time, it is crowded with probability 0.5. When it is late, it is crowded with probability 0.7. The bus is noisy with probability 0.8 when it is crowded, and with probability 0.4 when it is not crowded.

1. Formulate events C and N corresponding to the bus being crowded and noisy. Do the events C and N form a partition of the sample space? Explain why or why not.

2. Write down probability statements corresponding to the information given above. Your answer should involve two statements linking C with T and L, and two statements linking N with C. (T = "on time", L = "late")

3. Find the probability that the bus is crowded.

4. Find the probability that the bus is noisy.

* Let $C = “crowded”$, $N =“noisy”$. $C$ and $N$ do NOT form a partition of. It is possible for the bus to be noisy
when it is crowded, so there must be some overlap between $C$ and $N$.

* $P(T) = 0.6$; $P(L) = 0.4$; $P(C | T) = 0.5$; $P(C | L) = 0.7$; $P(N | C) = 0.8$; $P(N | C^c) = 0.4$.

* $P(C) = P(C | T)P(T) + P(C | L)P(L) = 0.5 \times 0.6 + 0.7 \times 0.4 = 0.58$

* $P(N) = P(N | C)P(C) + P(N | C^c)P(C^c) = 0.8 \times 0.58 + 0.4 \times (1 - 0.58) = 0.632$

## Bayes' Theorem

$P(A \cap B) = P(A|B) \times P(B)\qquad$

$P(B \cap A) = P(B|A) \times P(A)\qquad$ by substitution

And we know that $P(A \cap B) = P(B \cap A)\qquad$

$\Rightarrow P(A|B) \times P(B) = P(B|A) \times P(A)\>$	by combining the above  

$\Rightarrow P(A|B) = \frac{P(B|A) \times P(A)}{P(B)}\>$	by rearranging last step

This result is called **Bayes’ theorem**. Here it is again:
### $$ P(A|B) = \frac{P(B|A) \times P(A)}{P(B)} $$

By law of total probability,
### $$ P(A|B) = \frac{P(B|A) \times P(A)}{P(B|A) \times P(A) + P(B|A^c) \times P(A^c)} $$

### Problem

<!--small>*this example taken from [the Cartoon Guide to Statistics](http://www.harpercollins.com/browseinside/index.aspx?isbn13=9780062731029) by Larry Gonick & Woollcott Smith*</small-->  
Suppose a disease infects one out of every 1000 people in a population...

and suppose that there is a good, but not perfect, test for this disease: if a person has the disease, the test comes back positive 99% of the time. One the other hand, the test also produces some *false positives*. About 2% of uninfected patients also test positive.

You just tested positive. What are your chances of having the disease?

### Solution

We have two events to work with:

**A** : patient has the disease  
**B** : patient tests positive  

The information about the test's effectiveness can be written:

$ P(A) = .001 $

$ P(B|A) = .99 $

$ P(B|A^c) = .02 $


$ P(A|B) = ?$

#### Let's draw a tree..

(tree diagram)

|            | &nbsp; &nbsp; &nbsp;  &nbsp; &nbsp; &nbsp; A &nbsp; &nbsp; &nbsp;  &nbsp; &nbsp; &nbsp; | &nbsp; &nbsp; &nbsp;  &nbsp; &nbsp; not A &nbsp; &nbsp; &nbsp;  &nbsp; &nbsp; |  &nbsp; &nbsp; sum  &nbsp; &nbsp; |
| ----------:| ------------ | ------------------ | ---- |
|     **B** | $P(A\cap B)$  | $P(A^c \cap B)$    | $P(B)$ |
| **not B** | $P(A \cap B^c)$ | $P(A^c \cap B^c)$ | $P(B^c)$ 
|           | $P(A)$           | $P(A^c)$           | $1$

$ P(A \cap B) = P(B|A)P(A) = (.99)(.001) = .00099 $  
$ P(A^c \cap B) = P(B|A^c)P(A^c) = (.02)(.999) = .01998 $


Let's fill in some of these numbers:  

|            | &nbsp; &nbsp; &nbsp;  &nbsp; &nbsp; &nbsp; A &nbsp; &nbsp; &nbsp;  &nbsp; &nbsp; &nbsp; | &nbsp; &nbsp; &nbsp;  &nbsp; &nbsp; not A &nbsp; &nbsp; &nbsp;  &nbsp; &nbsp; |  &nbsp; &nbsp; sum  &nbsp; &nbsp; |
| ----------:| ------------ | ------------------ | ---- |
|     **B** | $.00099$        | $.01998$             | $.02097$    |
| **not B** | $P(A \cap B^c)$   | $P(A^c \cap B^c)$   | $P(B^c)$ |
|           | $.001$          | $.999$               | $1$         |

arithmetic gives us the rest:  

|           | A              | not A              | sum  |
| ---------:| -------------- | ------------------ | ---- |
|     **B** | $.00099$       | $.01998$           | $.02097$ |
| **not B** | $.00001$       | $.97902$           | $.98903$ 
|           | $.001$         | $.999$             | $1.0$

From which we directly derive: <BR /><BR />
$$ P(A|B) = \frac{P(A \cap B)}{P(B)} = \frac{.00099}{.02097} = .0472 $$

## Chain Rule for Probability

We can write any joint probability as incremental product of conditional probabilities,

$ P(A_1 \cap A_2) = P(A_1)P(A_2 | A_1) $


$ P(A_1 \cap A_2 \cap A_3) = P(A_1)P(A_2 | A_1)P(A_3 | A_2 \cap A_1) $


In general, for $n$ events $A_1, A_2, \dots, A_n$, we have  

$ P (A_1 \cap A_2 \cap \dots \cap A_n) = P(A_1)P(A_2 | A_1) \dots P(A_n | A_{n-1} \cap \dots \cap A_1) $