In [1]:
# This is the lecture notebook for week 2
import random
#import random as r
import itertools as i
import numpy as np
import matplotlib.pyplot as plt
from math import comb

# Conditional Probability

$$ P(A|B) = \frac{P(A \cap B)}{P(B)} $$

In [62]:
c = ['H', 'T']
W = {f"{x}{y}{z}" 
     for x in c 
     for y in c 
     for z in c}

A = {w for w in W if w == "TTT"}
B = {w for w in W if w[0] == 'T'}

# P(A|B)
P_AgivenB = len(set.intersection(A, B)) / len(B)
P_AgivenB

0.25

In [63]:
# Sample space
c = ['H', 'T']
W = {f"{x}{y}{z}" 
     for x in c 
     for y in c 
     for z in c}

A = {w for w in W if w == "THH"}
B = {w for w in W if w[0] == 'T'}

P_A = len(A) / len(W)
P_AgivenB = len(set.intersection(A, B)) / len(B)
f"P_A: {P_A*100}%, P_B: {P_AgivenB*100}%"

'P_A: 12.5%, P_B: 25.0%'

# Law of Total Probability
## Conditional Probability
$$ P(A|B) = \frac{P(A\cap B)}{P(B)}$$
## Multiplication Rule
$$ P(A \cap B)= P(A|B) \cdot P(B) $$
often written
$$ P(A, B)= P(A|B) \cdot P(B) $$

## Law of Total Probability
If $B_1 \dots B_n$ are a proper partition of $\Omega$, then
$$ P(A) = P(A \cap B_1) + P(A \cap B_2) \dots P(A \cap B_n)$$
$$ P(A) = P(A|B_1)P(B_1) + P(A|B_2)P(B_2) \dots P(A|B_n)P(B_n)$$

We'll see this later in Bayes Rule

# The Ancient Geeks and Their Urns

In [21]:
balls = ['r', 'g']
#urn = balls[0]*5
urn = ['r', 'r', 'r', 'r', 'r', 'g', 'g']
chc = random.choices(urn, k=10)   # With replacement
# NOT THE SAME AS 
samp = random.sample(urn, k=3)   # Without replacement
#urn[1:3]

# Dependent and Independent Events

$$ A \perp\!\!\!\perp B \textrm{ if } P(A|B)=P(A)P(B)$$
which makes sense because for this to be true $P(A|B)=P(A)$, ie $B$ occuring doesnt change prob of $A$ occuring.

# Bayes Rule
## Derivation from total probability

$$P(B|A)P(A)= P(A \cap B) = P(A|B)P(B)$$
rearranging
$$P(B|A) = \frac{P(A|B)P(B)}{P(A)} = \frac{P(B)P(A|B)}{P(A)}$$ 
or
$$P(A|B) = \frac{P(B|A)P(A)}{P(B)} = \frac{P(A)P(B|A)}{P(B)}$$ 

## Why this matters
Jumping to statistics for a second this is often arranged like this:
$$P(\theta|D) = \frac{P(\theta)P(D|\theta)}{P(D)}$$
where $D$ is the data and $\theta$ is the paramaters of the model

each term has a special name  

$P(\theta|D)$ is called the posterior  
$P(\theta)$ is the prior  
$P(D|\theta)$ is the likelihood  
$P(D)$ is the evidence  

## How to think about it
So the way to think about the rule in words is: "The probability of the parameters in light of the data is the probability of the parameters before you saw the data, multiplied by the probability of the data given those parameters, normalized by a magic factor to make it all sum to 1"

# Monty Hall

In [94]:
def monty_hall_simulation(num_trials=1000):
    stick_wins = 0
    switch_wins = 0
    
    for _ in range(num_trials):
        # All doors
        doors = {1, 2, 3}

        # Randomly place a car behind one door
        car_position = random.choice(list(doors))
        
        # The contestant makes a random choice
        contestant_choice = random.choice(list(doors))
        
        # Doors Monty can open to ensure he shows a smelly goat
        monty_can_open = doors - {contestant_choice, car_position}
        monty_opens = random.choice(list(monty_can_open))
        
        # The door that the contestant switches to if they choose to switch
        switch_choice = (doors - {contestant_choice, monty_opens}).pop()
        
        # Check the outcomes
        if contestant_choice == car_position:
            stick_wins += 1
        if switch_choice == car_position:
            switch_wins += 1

    print(f"Probability of winning if you stick: {stick_wins/num_trials}")
    print(f"Probability of winning if you switch: {switch_wins/num_trials}")

monty_hall_simulation()


Probability of winning if you stick: 0.335
Probability of winning if you switch: 0.665


$$P(W|S) = P(W|S,C_1)P(C_1) + P(W|S, \bar{C}_1)P(\bar{C})_1$$
$$P(W|S) = 1/3*0 + 2/3*1 = 2/3 $$