In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
import matplotlib
import numpy as np
from numpy import mean, std
from matplotlib.mlab import csv2rec
from pylab import poly_between

# stats60 specific

from code import dice, marbles, monty_hall, examples
figsize = (8,8)

## Probability

### Frequency definition of chances

* If you flip a fair coin many times, the long-run proportion of heads will be 50%.
* Rolling a fair 6-sided die,
will result in a long-run proportion of 1’s of 
$$1/6=16 \frac{2}{3}\%.$$

In [None]:
dice.roll_one_die.trial()
dice.roll_one_die

In [None]:
dice.roll_one_die.sample(10)

### Probability as frequency

In [None]:
dice_sample = dice.roll_one_die.sample(5000)
sum([d == 1 for d in dice_sample]) / 5000., 1/6.

## Some rules of probability

- *Range of values:* Chances are between 0 % and 100 %.

- *Opposites:* The chance of something equals 100 % minus the chance of the opposite thing. 

### Example: 

The chance of not getting a 1 when rolling a die is $(100 - 16 \frac{2}{3})\% = 83 \frac{1}{3} \%$.


## Drawing marbles from a box

- Box #1 :30 blue, 20 red

- Box #2: 3 blue, 2 red

- If you have to draw a blue marble to win, which box would you choose?

In [None]:
large = marbles.Marbles(['B']*30 + ['R']*20, grid=(5,10))
large

In [None]:
large.trial()

In [None]:
print large.trial()
large.last_draw()

In [None]:
small = marbles.Marbles(['B']*3 + ['R']*2)
small

In [None]:
small.trial()

In [None]:
print small.trial()
small.last_draw()

##  They both have the same chance of winning: 0.6

In [None]:
sum([d == 'B' for d in small.sample(10000)]), sum([d == 'B' for d in large.sample(10000)])

## Drawing marbles

- When drawing one marble, the important number was 
$$\frac{\# \ \text{blue marbles}}{\# \ \text{marbles}} = 60\%$$
- What if you draw 5 marbles without replacement?
- With replacement?

## Sampling with replacement

Let's do with replacement first

In [None]:
small = marbles.Marbles(['B']*3 + ['R']*2)
small.sample(5)

In [None]:
small.current_state()

## Sampling without replacement

In [None]:
small_noreplace = marbles.Marbles(['B']*3 + ['R']*2)
small_noreplace.replace = False
small_noreplace

In [None]:
small_noreplace.sample(3)

In [None]:
small_noreplace.current_state()

In [None]:
large_noreplace = marbles.Marbles(['B']*30 + ['R']*20)
large_noreplace.replace = False
large_noreplace

In [None]:
large_noreplace.sample(3)

In [None]:
large_noreplace.current_state()

## Drawing five marbles from Box # 1 or # 2

* If drawing without replacement, Box # 2 will be easier to win.
* We’re even guaranteed to win with Box # 2. **Why?**

## Example

Suppose our experiment consists of drawing a ticket out of a hat with 20
tickets in it. We are going to draw 3 tickets.

- Describe the hat after each draw if we draw *with replacement*. 
What are the possible outcomes of the experiment?

         Before and after each draw, the hat has 20 tickets in it.
         The possible outcomes are triples of numbers from 1 to 20:
         (1,1,4),(2,3,4), etc.

-  Describe the hat after each draw if we draw *without replacement*. 
What are possible outcomes of the experiment?

         After the first draw, the hat has 19 tickets in it,
         after the second 18, and after the third 17.
         The possible outcomes are all triples of numbers from
         1 to 20 but there can be no ties: (1,1,4) is impossible.

## Conditional probability

* Observing some information can *change*
   the chances of something.
* We already saw this in the marble example. If drawing without replacement, suppose the first draw was red. What are the chances a blue marble is drawn on the second draw?
* What if we draw with replacement?
* In this examples, we are *given*
   that the first draw was red. These chances are *conditioned*
   on knowing the first draw was red.

## Multiplication rule

The chance that two things will both happen equals the chance that the first will happen, multiplied by the chance that the second will happen *given* the first has happened.

### Example

- *In the box with 3 blue and 2 red marbles, what is the probability the first blue marble drawn is on the second draw when drawing without replacement?*
- If the first blue marble drawn was the second, then we know
     - the first was red;
     - the second was blue.
- By the multiplication rule 
$$\mathrm{chances} = \frac{2}{5} \times \frac{3}{4} = \frac{3}{10}$$

## Mathematical notation

* "first blue drawn is on the second draw" is called an *event*
  ;
* "first draw is red" and "second draw is blue" are also events;
* We usually write $P$ for "chances". For an event $E$ 
$$ P(E) = \mathrm{chances} \ E \ \mathrm{occurs}. $$
* Conditional probability of an event $A$ given $B$, i.e.  
the chances $A$ occurs given $B$ occurs, is
written as 
$$
P(A \vert B).
$$

- Multiplication rule can be written as 
$$P(A \cap B) = P(A \, \text{and} \, B) = P(A \vert B) * P(B). $$

## Law of total mass

* The chances of *something* occuring are 100%.

* Example: when we drew marbles, the chances we draw a marble whose color is blue or red is 100 %.
* In mathematical notation, we often use $S$ for "something" or the "sample space" $$P(S) = 100\% \qquad (= 1)$$.


In [None]:
small = marbles.Marbles(['B']*3 + ['R']*2)
small.sample_space

In [None]:
small.mass_function

Let's take a trial without replacement.

In [None]:
small = marbles.Marbles(['B']*3 + ['R']*2, replace=False)
small.trial()

In [None]:
small.current_state()

In [None]:
small.sample_space

In [None]:
small.mass_function

## Mass function

- What is `mass_function`? 

- It is a description of the probabilities of the various possible
outcomes.

- Later, the book will call this a *probability histogram*.

## Law of total mass

- When drawing from `small` without replacement, we will draw a blue ball within the first three draws. 
$$P(\text{one of the first three balls is blue}) = 100 \%$$

- Let's verify the law of total mass 
 $$\begin{aligned}
     P(\text{first blue ball is on 1st draw }) &= \frac{3}{5} \\
     P(\text{first blue ball is on 2nd draw}) &= \frac{2}{5} \times \frac{3}{4} = \frac{3}{10}  \\
     P(\text{first blue ball is on 3rd draw}) &= \frac{2}{5} \times \frac{1}{4} = \frac{1}{10}  \\
  \end{aligned}$$
- Summing the probablities $$\frac{3}{5} + \frac{3}{10} +
 \frac{1}{10} = 1.$$

## Addition rule

**When can we add probabilities of different events?**


We can add probabilities of events when the events are *disjoint*
   or *mutually exclusive*

### Example 

When rolling a die, the events $E_1= \,${roll is 4} , $E_2=\, ${roll is 3} are mutually exclusive because the result of the roll cannot be and simultaneously.

### Mutually exclusive events

In [None]:
%%capture
disjoint = plt.figure(figsize=figsize)
cir = matplotlib.patches.Circle
ax = plt.gca()
E1 = cir((0.5,0.5), 0.4,ec="black", facecolor='yellow',lw=2, alpha=0.4)
E2 = cir((-0.2,-0.2), 0.4,ec="black", facecolor='blue',lw=2, alpha=0.4)
ax.add_patch(E1)
ax.add_patch(E2)
ax.set_xticks([])
ax.set_yticks([])
ax.set_xlim([-0.7,1])
ax.set_ylim([-0.7,1])
ax.set_title('Two mutually exclusive events. Addition OK.', fontsize=15, color='green')

In [None]:
disjoint

### Non-mutually exclusive events

In [None]:
%%capture
nondisjoint = plt.figure(figsize=figsize)
cir = matplotlib.patches.Circle
ax = plt.gca()
E1 = cir((0.5,0.5), 0.4,ec="black", facecolor='yellow',lw=2, alpha=0.4)
E2 = cir((0.2,0.2), 0.4,ec="black", facecolor='blue',lw=2, alpha=0.4)
ax.add_patch(E1)
ax.add_patch(E2)
ax.set_xticks([])
ax.set_yticks([])
ax.set_xlim([-0.7,1])
ax.set_ylim([-0.7,1])
ax.set_title('Two non-mutually exclusive events. Addition not OK.', 
             fontsize=15, color='red')


In [None]:
nondisjoint

## Addition rule

- If the events $E_1, E_2$ are mutually exclusive, then 
$$ P(E_1 \ \mathrm{or} \ E_2 \  \mathrm{occurs}) = P(E_1) + P(E_2).$$
- This rule works for more than two: if $[E_1, \dots, E_n]$ are mutually exclusive, then $$ P(E_1  \ \mathrm{or} \  E_2  \ \mathrm{or} \  \dots  \ \mathrm{or} \  E_n) = \sum_{i=1}^nP(E_i).$$

- The events $E_1, E_2$ are mutually exclusive if $E_1 \cap E_2$ is empty.
- We often write "$E_1$ or $E_2$" as $E_1 \cup E_2$ and 
"$E_1$ and $E_2$" as $E_1 \cap E_2$.

- From the Venn diagram, we can deduce the general form of the addition rule
$$P(E_1 \cup E_2) = P(E_1) + P(E_2) - P(E_1 \cap E_2).$$

- There are also rules that involve more than 2 events.

In [None]:
%%capture
three_events = plt.figure(figsize=figsize)
ax = three_events.gca()
E1 = cir((0.5,0.5), 0.4,ec="black", facecolor='yellow',lw=2, alpha=0.4)
E2 = cir((0.2,0.2), 0.4,ec="black", facecolor='blue',lw=2, alpha=0.4)
E3 = cir((0.2,0.5), 0.4,ec="black", facecolor='red',lw=2, alpha=0.4)
ax.add_patch(E1)
ax.add_patch(E2)
ax.add_patch(E3)
ax.set_xticks([])
ax.set_yticks([])
ax.set_xlim([-0.3,1])
ax.set_ylim([-0.3,1])
ax.set_title('Three intersecting events.', 
             fontsize=15)


In [None]:
three_events

$$
\begin{aligned}
P(A \cup B \cup C) &= P(A) + P(B) + P(C) \\
& - P(A \cap B) - P(A \cap C) - P(B \cap C) \\
& + P(A \cap B \cap C)
\end{aligned}
$$

## A more complicated Venn diagram: four events

<img src="https://pbs.twimg.com/media/BkihFcZIcAA0E5D.jpg"/>

## A simple bound on probabilities

Because chances (or probabilities) are between 0% and 100%,
     we can deduce
     $$
    \begin{aligned}
    P(E_1 \cup E_2) &\leq P(E_1) + P(E_2) \\
    P\left(\cup_{i=1}^n E_i \right) & \leq \sum_{i=1}^n P(E_i).
    \end{aligned}
$$
     

## Multiplication rule & independence

* Intuitively, an event $A$ is independent of $B$ if given $B$, the odds of $A$ are unaffected.
* In mathematical notation, we express this notion as 
$$P(A \vert B)=P(A)$$
* If this is true, we say $A$ and $B$ are *independent*.
* Otherwise, $A$ and $B$ are *dependent*.
* The multiplication rule, combined with independence tells us 
$$P(A \cap B) = P(A \vert B) * P(B) = P(A) * P(B).$$

## Example

Let's go back to drawing marbles from a box.

* When drawing marbles *with replacement*
   the events $$\begin{aligned}
 A &= \mathrm{first \ draw \    is \ red} \\
 B &= \mathrm{second \ draw \ is \ blue}
 \end{aligned}$$ are *independent*
  
* We can even conclude that the draws are independent in this case.
* When drawing *without replacement*
   the events $A$ and $B$ are dependent. **Show this.**

## Counting and probability

* When performing an experiment where each outcome is equally likely, we can compute probabilities by counting.
* Example: when rolling two dice, what is the probability of obtaining a sum of 9?
* We call these counting problems *combinatorial*
  .
* For such experiments $$P(E) = \frac{\# E}{\# S}$$ where $S$ is the set of all possible outcomes (our *sample space*).

## Example

Consider rolling a pair of dice. 

**What is the sample space?**



In [None]:
examples.dice['pair of dice']

### Example

What are the chances the sum will be equal to 9?


In [None]:
%%capture
examples.dice['sum to 9'] = dice.dice_example(event_spec = lambda ij: ij[0]+ij[1]==9)


In [None]:
examples.dice['sum to 9']


There are 4 outcomes whose sum is 9. Therefore, the chances are $\frac{4}{36}=\frac{1}{9}$.

In [None]:
examples.dice['sum to 9'].trial()
examples.dice['sum to 9']

In [None]:
examples.dice['sum to 9'].mass_function

In [None]:
examples.dice['sum to 9'].sample(10)

In [None]:
mean(examples.dice['sum to 9'].sample(10000))

## Example

What are the chances the sum will be greater than or equal to 7?

In [None]:
%%capture
examples.dice['sum geq 7'] = dice.dice_example(event_spec = lambda ij: ij[0]+ij[1]>=7)


In [None]:
examples.dice['sum geq 7']

There are 21 outcomes whose sum is greater than or equal to 7. Therefore, the chances are $\frac{21}{36}=\frac{7}{12}$.

In [None]:
examples.dice['sum geq 7'].mass_function

## Example

What are the chances the summ will be less than 7?

- The chances that the sum is greater than or equal to 7 are $\frac{7}{12}$. 
Therefore, by the "opposite" rule, the chances are chances are 
$$1−\frac{7}{12}=\frac{5}{12}.$$

In [None]:
%%capture
examples.dice['sum lt 7'] = dice.dice_example(event_spec = lambda ij: ij[0]+ij[1]<7)


In [None]:
examples.dice['sum lt 7']

In [None]:
examples.dice['sum lt 7'].mass_function

## Complement of an event

* Formally, the "opposite" rule is the rule of *complements*.
* We write the complement of an event $E$ as $E^c$ 
$$P(\mathrm{not} \, E) = P(E^c).$$
* The rule of *complements*
   says $$P(E^c) = 1 - P(E)$$

### An event $E$ and its complement $E^c$

In [None]:
%%capture
complement = plt.figure(figsize=figsize)
ax = complement.gca()
ax.set_alpha(0.4)
ax.patch.set_facecolor('#cccccc')
E1 = cir((0.5,0.5), 0.4,
         ec="black", facecolor='yellow', lw=2, alpha=0.4)
ax.add_patch(E1)

ax.set_xticks([])
ax.set_yticks([])
ax.set_xlim([-0.3,1])
ax.set_ylim([-0.3,1])

t = ax.text(0.5, 0.5, '$E$', size=40, color='black', va='center', ha='center')
t = ax.text(-0.05, -0.05, r'$E^c$', size=40, color='black', va='center', ha='center')
complement.text(0.8, 0.04, '$S$', size=40, ha='center', va='center')

In [None]:
complement

## Properties of complements

* For any event $E$, $E$ and $E^c$ are mutually exclusive.
* For any event $E$, $S = E \cup E^c$.
* For any two events $A, B$ 
$$\begin{aligned}
     B &= B \cap S \\
     &=   B \cap (A \cup A^c) \\
     &=   (B \cap A) \cup (B \cap A^c)
     \end{aligned}$$ where $B \cap A$ and $B \cap A^c$ are mutually exclusive.

### Non-mutually exclusive events

In [None]:
%%capture
partition = plt.figure(figsize=figsize)
ax = partition.gca()
# add a circle
E1 = cir((0.5,0.5), 0.4,ec="black", facecolor='yellow',lw=2, alpha=0.4)
E2 = cir((0.2,0.2), 0.4,ec="black", facecolor='blue',lw=2, alpha=0.4)
ax.text(0.35,0.35, r'$B \cap A$', va='center', ha='center', fontsize=15)
ax.text(0.10,0.10, r'$B \cap A^c$', va='center', ha='center', fontsize=15)
ax.add_patch(E1)
ax.add_patch(E2)
ax.set_xticks([])
ax.set_yticks([])
ax.set_xlim([-0.3,1])
ax.set_ylim([-0.3,1])

In [None]:
partition

### Using complements

* For the `small` box, if we draw with replacement, what are the chances it will take less than 5 draws to draw 1st blue marble?
* If $E$={takes less than 5 draws to draw 1st blue marble}, then 
$$\begin{aligned}
     E^c &=\{\text{takes 5 or more draws to draw 1st blue marble}\} \\
     &=\{\text{first 4 draws are red}\} \\
     \end{aligned}$$
* By independence, $$
     P(\text{first 4 draws are red}) = \left(\frac{2}{5}\right)^4
     $$
* Therefore, $$P(\text{takes less than 5 draws to draw 1st blue marble}) = 1 -  \left(\frac{2}{5}\right)^4 = 97\%$$

## Bayes’ theorem

- Credited to [Reverend Thomas Bayes](http://en.wikipedia.org/wiki/Thomas_Bayes)
- The foundation of important sub-branch of statistics: *Bayesian statistics.*
- Given two events $A$ and $B$ $$\begin{aligned}
     P(A \vert B) &= \frac{P(B \, \mathrm{and} \,  A)}{P(B)} \\
     &= \frac{P(A \cap B)}{P(B)} \\
     &= \frac{P(B \vert A)\times P(A)}{P(B)}
     \end{aligned}$$
- The formula is a direct consequence of the multiplication rule.
- Even though it is used in *Bayesian statistics*, it is just part of 
the calculus of probability so even non-Bayesians use it.

### Alternate versions

* By the properties of complements $$\begin{aligned}
     P(B) &= P(B \cap A) + P(B \cap A^c) \\
     &=  P(B \vert A) \times P(A) + P(B \vert A^c) \times P(A^c)
     \end{aligned}$$
* Another version of Bayes’ theorem $$\begin{aligned}
     P(A \vert B) &= \frac{P(B \vert A) \times P(A)}{P(B \vert A) \times P(A) + P(B \vert A^c) \times P(A^c)     } \\
     \end{aligned}$$

## Drawing marbles without replacement from `small` box

* Let $$\begin{aligned}
     A&=\{\text{draw a red marble on first draw}\} \\
     B&=\{\text{draw a blue marble on second draw}\} \\
     \end{aligned}$$
     **Compute $P(A \vert B)$.**

* What do we know? $$\begin{aligned}
     P(A) &= \frac{2}{5} \\
     P(B \vert A) &= \frac{3}{4} \\
     \end{aligned}$$

* We need $$P(B) = P(B \vert A) \times P(A) + P(B \vert A^c) \times P(A^c).$$


* Note that $A^c=\{\text{draw a blue marble on first draw}\}$.
* We know $$\begin{aligned}
     P(A^c) &= \frac{3}{5} \\
     P(B \vert A^c) &= \frac{1}{2} \\
     \end{aligned}$$
* Therefore, $$\begin{aligned}
     P(B) &= \frac{3}{4} \times \frac{2}{5}  + \frac{1}{2} \times \frac{3}{5} = \frac{3}{10}     \\
     P(A \vert B) &= \frac{ \frac{3}{4} \times \frac{2}{5}}{\frac{3}{4} \times \frac{2}{5}  + \frac{1}{2} \times \frac{3}{5}} = \frac{1}{2}
     \end{aligned}$$

## Diagnostic testing

* Suppose a patient from some population is tested for a disease based on some diagnostic test.
* The prevalence of the disease is 0.1% in the population.
* If a patient has the disease, the test result is positive with probability 95 %. (*True positive*
  )
* If a patient does not have the disease, the test result is positive with probability 1 %. (*False positive*
  ).
* What is the probability a patient has the disease given a positive test result? What if the false positive rate were 0.1%?


* Let $$\begin{aligned}
     D &= \{\text{patient has disease}\}     \\
     T^+ &= \{\text{test result is positive}\}     \\
     \end{aligned}$$
* We are given $$\begin{aligned}
     P(D) &= 0.001 \\
     P(T^+ \vert D) &= 0.95 \\
     P(T^+ \vert D^c) &= 0.01 \\
     \end{aligned}$$
* We want to compute $P(D \vert T^+)$.

* By Bayes’ theorem $$\begin{aligned}
     P(D \vert T^+) &= \frac{P(T^+ \vert D) \times P(D)}{P(T^+ \vert D) \times P(D) + P(T^+ \vert D^c) \times P(D^c)} \\
     &= \frac{0.95 \times 0.001}{0.95 \times 0.001 + 0.01 \times 0.999} \\
     &= 8.7 \%
     \end{aligned}$$
* If the test makers improve their false positive rate to 0.001 then $$\begin{aligned}
     P(D \vert T^+)
     &= \frac{0.95 \times 0.001}{0.95 \times 0.001 + 0.001 \times 0.999} \\
     &= 48.7 \%
     \end{aligned}$$