# Bayes' Theorem

### Joint probability

$P(\mathcal A, \mathcal B)$ is the joint probability of events $\mathcal A$ and
$\mathcal B$ happening at the same time

Example: Two dice with RVs $X_1$ and $X_2$. Then e.g. $P(X_1 = 3, X_2 = 5) = \frac16 \cdot \frac16 = \frac{1}{36}$

### Conditional probability

$P(\mathcal{A} \mid \mathcal{B})$ is the condintional probability of $\mathcal A$ given $\mathcal B$.

Connection with joint probability: $P(\mathcal A, \mathcal B) = P(\mathcal A \mid \mathcal B) \, P(\mathcal B)$

Statistical independence: Events $\mathcal A$ and $\mathcal B$ are independent, iff $P(\mathcal A, \mathcal B) = P(\mathcal A) \, P(\mathcal B)$

##### Example

Meaning of the terms in $P(\mathcal A, \mathcal B) = P(\mathcal A \mid \mathcal B) \, P(\mathcal B)$

Given a standard deck of 52 cards. Let $\mathcal A$ represent drawing an Ace and $\mathcal B$ represent drawing a heart.

- $P(\mathcal B)$: Have 13 hearts in a deck of 52 cards, so $P(\mathcal B) = \frac{13}{52} = \frac{1}{4}$
- $P(\mathcal A \mid \mathcal B)$: There is only 1 Ace in the 13 hearts, so $P(\mathcal A \mid \mathcal B) = \frac{1}{13}$
- $P(\mathcal A, \mathcal B)$: There is only one Ace of hearts in a deck, so $P(\mathcal A, \mathcal B) = \frac{1}{52}$

Comparison to above definition: $P(\mathcal A, \mathcal B) = P(\mathcal A \mid \mathcal B) \, P(\mathcal B)$: $\;$ $\frac{1}{52} = \frac{1}{13} \times \frac{1}{4}$.



### Bayes' Theorem

Bayes' Theorem provides a way to update our probability estimates based on new evidence. It is expressed as:

$\displaystyle P(\mathcal{A} \mid \mathcal{B}) = \frac{P(\mathcal{B} \mid \mathcal{A}) \, P(\mathcal{A})}{P(\mathcal{B})}$

Where:

- $P(\mathcal{A} \mid \mathcal{B})$ is the probability of $\mathcal{A}$ given $\mathcal{B}$
- $P(\mathcal{B} \mid \mathcal{A})$ is the probability of $\mathcal{B}$ given $\mathcal{A}$
- $P(\mathcal{A})$ is the prior probability of $\mathcal{A}$
- $P(\mathcal{B})$ is the probability of $\mathcal{B}$

##### Example

Consider a medical test for some disease (say, e.g., the Dengue fever). 

Let $\mathcal{A}$ represent having the disease , and $\mathcal{B}$ represent testing
positive for it.

Interested in: $\displaystyle P(\mathcal{A} \mid \mathcal{B}) \stackrel{\text{Bayes}}{=} \frac{P(\mathcal{B} \mid
\mathcal{A}) \, P(\mathcal{A})}{P(\mathcal{B})}$

Known from studies:
- Prevalence of the disease in the population: $P(\mathcal{A}) = 0.3\%$
- Probability of a positive test given subject is sick (sensitivity): $P(\mathcal{B}
  \mid \mathcal{A}) = 99.8\%$
- Probability of a negative test given subject is not sick (specificity): $P(\neg
  \mathcal{B} \mid \neg \mathcal{A})= 99.7\%$

What's left: The normalizer $P(\mathcal{B})$, i.e. overall probability of testing
positive.

Compute it via $P(\mathcal{B}) = P(\mathcal{B} \mid \mathcal{A}) \cdot P(\mathcal{A}) +
  P(\mathcal{B} \mid \neg \mathcal{A}) \cdot P(\neg \mathcal{A})$ with
- $P(\mathcal{B} \mid \neg \mathcal{A}) = 1 - P(\neg \mathcal{B} \mid \neg \mathcal{A}) = 1 - 0.997 = 0.003$
- $P(\neg \mathcal{A}) = 1 -  P(\mathcal{A}) = 0.997$

Hence $P(\mathcal{B}) = 0.998 \cdot 0.003 + 0.003 \cdot 0.997 = 0.5985\%$

So finally $\displaystyle P(\mathcal{A} \mid \mathcal{B}) = \frac{P(\mathcal{B} \mid
\mathcal{A}) \, P(\mathcal{A})}{P(\mathcal{B})} = \frac{0.998 \times 0.003}{0.005985} \approx 50.03\%$

Let's visualize this:

In [None]:
import numpy as np
import matplotlib.pyplot as plt

def get_posterior(p_a, p_b_given_a, p_not_b_given_not_a):
    p_b_given_not_a = 1 - p_not_b_given_not_a
    p_not_a = 1 - p_a
    p_b = p_b_given_a * p_a + p_b_given_not_a * p_not_a
    p_a_given_b = p_b_given_a * p_a / p_b
    return p_a_given_b

# vary p_a
print("Prevelance: The more people are infected, the more reliable the test result")
p_as = np.linspace(0, 0.2, 100)
p_a_given_b = [get_posterior(p_a, 0.998, 0.997) for p_a in p_as]
plt.plot(p_as, p_a_given_b)
plt.xlabel('$P(\mathcal{A})$')
plt.ylabel('$P(\mathcal{A} | \mathcal{B})$')
plt.grid(axis='y')
plt.title('Vary $P(\mathcal{A})$, $P(\mathcal{B} \mid \mathcal{A}) = 0.998$, $P(¬ \mathcal{B} \mid ¬ \mathcal{A}) = 0.997$')
plt.show()

# vary p_b_given_a
print("Sensitivity: The lower the infection rate the higher quality the test has to have to be reliable")
p_b_given_as = np.linspace(0, 1, 100)
p_a_given_b = [get_posterior(0.0003, p_b_given_a, 0.9999) for p_b_given_a in p_b_given_as]
plt.plot(p_b_given_as, p_a_given_b)
p_a_given_b = [get_posterior(0.003, p_b_given_a, 0.9999) for p_b_given_a in p_b_given_as]
plt.plot(p_b_given_as, p_a_given_b)
p_a_given_b = [get_posterior(0.03, p_b_given_a, 0.9999) for p_b_given_a in p_b_given_as]
plt.plot(p_b_given_as, p_a_given_b)
plt.legend(['$P(\mathcal{A}) = 0.0003$', '$P(\mathcal{A}) = 0.003$', '$P(\mathcal{A}) = 0.03$'])
plt.xlabel('$P(\mathcal{B} \mid \mathcal{A})$')
plt.ylabel('$P(\mathcal{A} | \mathcal{B})$')
plt.grid(axis='y')
plt.title('$P(\mathcal{A}) = 0.003$, Vary $P(\mathcal{B} \mid \mathcal{A})$, $P(¬ \mathcal{B} \mid ¬ \mathcal{A}) = 0.9999$')
plt.show()

# vary p_not_b_given_not_a
print("Specificity: Major impact on the reliability of the test result")
p_not_b_given_not_as = np.linspace(0.9, 1, 100)
p_a_given_b = [get_posterior(0.003, 0.9999, p_not_b_given_not_a) for p_not_b_given_not_a in p_not_b_given_not_as]
plt.plot(p_not_b_given_not_as, p_a_given_b)
plt.xlabel('$P(¬ \mathcal{B} \mid ¬ \mathcal{A})$')
plt.ylabel('$P(\mathcal{A} | \mathcal{B})$')
plt.grid(axis='y')
plt.title('$P(\mathcal{A}) = 0.003$, $P(\mathcal{B} \mid \mathcal{A}) = 0.9999$, Vary $P(¬ \mathcal{B} \mid ¬ \mathcal{A})$')
plt.show()


### Example: Monty Hall Problem

**Game demo**: [Monty Hall Problem Simulator](https://montyhall.io/)

**Million dollar question**: What's the winning strategy, switch, stay or both equally good?

### Simulation

In [None]:
import random

def play_game(switch):
    # pick random door
    choice = random.randint(1, 3)

    if switch:
        # reveal a goat at random
        revealed_door = random.choice(tuple({2, 3} - {choice}))

        # make the switch
        choice = ({1, 2, 3} - {revealed_door, choice}).pop()

    # car is always behind door 1 (without loss of generality!)
    return choice == 1


In [None]:
def play_batch(n_games, switch):
    return sum(play_game(switch) for _ in range(n_games))


In [None]:
n_games = 100000
wins_stay = play_batch(n_games, False)
wins_switch = play_batch(n_games, True)

print(f"#wins when switching: {100 * wins_switch / n_games:.2f}%")
print(f"#wins with staying:   {100 * wins_stay / n_games:.2f}%")


### Analytical solution


Proof via Bayes's theorem plenty, e.g. see
https://blogs.cornell.edu/info2040/2022/11/10/the-monty-hall-problem-using-bayes-theorem/