# Conditional Probability

Open in Google Colab: [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/febse/stat2025/blob/main/04-Conditional-Probability-Class.ipynb)

![War planes](https://cdn.prod.website-files.com/5f64a4eb5a48d21969aa774a/5fa6a415ed706185b15a5fa9_Survivorship-bias-1024x536.png)

:::{#def-conditional-probability}

Let $A$ and $B$ be two events. The conditional probability of $A$ given $B$ is defined as

$$
P(A|B) = \frac{P(A \cap B)}{P(B)}
$$

We read $P(A | B)$ as "the probability of $A$ given $B$".
:::

Let's consider a game of dice where two identical (and fair) dice are rolled.

$$
\Omega = \left\{
    \begin{align*}
    (1,1), (1,2), (1, 3), (1, 4), (1, 5), (1, 6) \\
    (2, 1), (2, 2), (2, 3), (2, 4), (2, 5), (2, 6) \\
    (3, 1), (3, 2), (3, 3), (3, 4), (3, 5), (3, 6) \\
    (4, 1), (4, 2), (4, 3), (4, 4), (4, 5), (4, 6) \\
    (5, 1), (5, 2), (5, 3), (5, 4), (5, 5), (5, 6) \\
    (6, 1), (6, 2), (6, 3), (6, 4), (6, 5), (6, 6) \\
    \end{align*}    
\right\}
$$

You want to bet on the event $B$ that the sum of the two dice is equal to 7. Before placing your bet, a friend tells you that the first die shows a 6.
Is this information useful to you in deciding whether to bet on $B$? Next, consider another tip: your friend tells you that the sum of the two dice is 10 (event $C$). 
What is the probability that the first die shows a 6 _given_ that the sum of the two dice is 10?


## Independence of Two Events

:::{#def-independence-AB}
## Independence of Two Events

Two events $A$ and $B$ are said to be independent if

$$
P(A \cap B) = P(A) \cdot P(B)
$$

:::

The definition in @def-independence-AB is equivalent to the following, but has the advantage of being symmetric in $A$ and $B$:

:::{#exr-independence-AB}
## Independence and Conditional Probability

Show that the definition of independence in @def-independence-AB is equivalent to the following:

1. $P(A|B) = P(A)$
2. $P(B|A) = P(B)$
:::



:::{#exr-independence-AB-implication}
## Implications of Independence

Show that if $A$ and $B$ are independent, then the following statements are true:

1. $A$ and $B^c$ are independent
2. $A^c$ and $B$ are independent
3. $A^c$ and $B^c$ are independent

:::


:::{#exr-independence-AB-disjoint}
## Disjoint Events and Independence

Let $A$ and $B$ be two events such that $P(A) > 0$ and $P(B) > 0$. Show that if $A$ and $B$ are disjoint, then they are not independent.

:::


## Mutual Independence

The definition of independence can be extended to more than two events. However, an attempt to extend the definition 
by saying that $A$, $B$, and $C$ are independent if $P(A \cap B \cap C) = P(A) \cdot P(B) \cdot P(C)$ does not ensure that $A$, $B$, and $C$ are _pairwise_ independent in the sense of @def-independence-AB. If the events are pairwise independent, there is no guarantee that the probability of their intersection
is the product of their probabilities.



:::{#def-mutual-independence}
## Mutual Independence

Events $A_1, A_2, \ldots, A_n$ are said to be mutually independent if for every subset $I \subseteq \{1, 2, \ldots, n\}$ we have

$$
P\left(\bigcap_{i \in I} A_i\right) = \prod_{i \in I} P(A_i)
$$

For example for $n=3$ we have that $A_1$, $A_2$, and $A_3$ are mutually independent if _all_ of the following conditions hold:

$$
\begin{align*}
P(A_1 \cap A_2) &= P(A_1) \cdot P(A_2) \\
P(A_1 \cap A_3) &= P(A_1) \cdot P(A_3) \\
P(A_2 \cap A_3) &= P(A_2) \cdot P(A_3) \\
P(A_1 \cap A_2 \cap A_3) &= P(A_1) \cdot P(A_2) \cdot P(A_3) \\
\end{align*}
$$

:::




:::{#exr-mutual-independence-contradicition}
## Mutual Independence and Pairwise Independence

Consider a sample space with 9 equally likely outcomes, each consisting of a triple $(i, j, k)$ where $i, j, k \in \{1, 2, 3\}$.

$$
\Omega = \left\{
    \begin{align*}
    (1, 1, 1), (2, 2, 2), (3, 3, 3) \\
    (1, 2, 3), (1, 3, 2), (3, 2, 1) \\
    (3, 1, 2), (2, 1, 3), (2, 3, 1) \\
    \end{align*}
\right\}
$$

Consider the following events $A_i = \{\text{i-th place of the outcome is 1}\}$. Are the events $A_1$, $A_2$, and $A_3$ pairwise independent? Are they mutually independent?
:::


:::{#exr-total-probability-complements}
## Total Probability and Complements

Let $A$ and $B$ be some events such that $P(A) > 0$ and $P(B) > 0$. Show that

$$
P(B) = P(B|A) \cdot P(A) + P(B|A^c) \cdot P(A^c)
$$
:::

Note that the sets $A$ and $A^c$ form a partition of the sample space because $A \cap A^c = \emptyset$ and $A \cup A^c = \Omega$.

:::{#thm-total-probability}
## Total Probability Theorem

Let $A_1, A_2, \ldots, A_n$ be a partition of the sample space $\Omega$. For any event $B$ such that $P(B) > 0$, we have

$$
P(B) = \sum_{i=1}^n P(B|A_i) \cdot P(A_i)
$$
:::
:::{.proof}

The intersection of $B$ with the sample space is $B \cap \Omega = B$ as $B$ is a subset of $\Omega$ (as any event).
Note that the sets $B \cap A_i$ are disjoint because the $A_i$ are disjoint. Therefore, we have

$$
B = B \cap \Omega
$$

Because the $A_i$ are a partition of the sample space, their union is the sample space:

$$
\Omega = \bigcup_{i=1}^n A_i
$$

In this way we can represent $B$ as the union of the intersections of $B$ with the $A_i$:

$$
B = B \cap \Omega = B \cap \left( \bigcup_{i=1}^n A_i \right) = \bigcup_{i=1}^n (B \cap A_i)
$$

The last step follows from the distributive property of the intersection over the union. Now we can use the additivity of the probability measure to write

$$
P(B) = P\left( \bigcup_{i=1}^n (B \cap A_i) \right) = \sum_{i=1}^n P(B \cap A_i) = \sum_{i=1}^n P(B|A_i) \cdot P(A_i)
$$

:::


## Bayes' Theorem

![Thomas Bayes, 1701 – 1761](https://upload.wikimedia.org/wikipedia/commons/d/d4/Thomas_Bayes.gif)

We have already seen that the conditional probability of $A$ given $B$ is defined as

$$
P(A|B) = \frac{P(A \cap B)}{P(B)}
$$

We can also write this as

$$
P(A \cap B) = P(A|B) \cdot P(B)
$$

and similarly

$$
P(B \cap A) = P(B|A) \cdot P(A)
$$

Because the first two terms are equal, we have

$$
\begin{align*}
P(A|B) \cdot P(B) = P(B|A) \cdot P(A) \\
P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)}
\end{align*}
$$

This last equation is known as **Bayes' Theorem** and it turns out that it holds in a more general way than here.

:::{#thm-bayes-theorem}

## Bayes' Theorem

Let $A$ and $B$ be two events such that $P(A) > 0$ and $P(B) > 0$. Then

$$
P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)}
$$

Furthermore, let $A_1, A_2, \ldots, A_n$ be a partition of the sample space $\Omega$. A partition is a collection of disjoint events such that their union is the entire sample space:

$$
A_i \cap A_j = \emptyset \quad \text{for all } i \neq j \quad \text{and} \quad \bigcup_{i=1}^n A_i = \Omega
$$

For any event $B$ such that $P(B) > 0$, we have

$$
P(A_i|B) = \frac{P(B|A_i) \cdot P(A_i)}{\sum_{j=1}^n P(B|A_j) \cdot P(A_j)}
$$

To understand the derivation of Bayes' Theorem, see that it holds for a very simple partition. The collection of events $A$ and $A^c$ is a partition of the sample space. In this case, Bayes' Theorem becomes

$$
P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B|A) \cdot P(A) + P(B|A^c) \cdot P(A^c)}
$$

:::


:::{#exr-narco-tests}
## Roadside Drug Tests

Imagine a city called Virtue City with 10,000 drivers in any day. Out of all drivers 100 of them use drugs while driving.

The police department introduces a new test for narcotics that has a 95 percent probability of correctly identifying a person who has used narcotics (true positive). There is a one percent probability of falsely identifying a person who has not used narcotics as having used them (false positive).

The police stop a driver, his test comes back positive, and the driver's car is confiscated. The police claim that there is a 95 percent probability that the driver is a drug user. The driver objects and says that the probability of him being a drug user is much lower. Who is right?

Approach this problem in two ways:

- Counting
    - How many people will have a positive test result if all 100,000 citizens are tested?
    - What proportion of the positive tests belong to actual drug users?
- Apply Bayes' Theorem
:::

:::{#exr-quality-control}
## Quality Control

A factory produces light bulbs. To ensure the quality of the bulbs shipped to customers, the factory has a quality control system that checks a sample of the bulbs before they are shipped. Consider a shipment of 100 bulbs. You take a sample (without replacement) of 5 bulbs and stop the shipment if you find that at least one of the sampled bulbs are defective. Suppose that a shipment contains 20 defective bulbs. What is the probability that the shipment will be stopped?

:::

In [4]:
# Simulate a sample from a shipment of 100 light bulbs
import numpy as np

sample_size = 5
repetitions = 10

# 0 represents a good light bulb, 1 represents a defective light bulb
shipment = np.array([0] * 80 + [1] * 20)

print("The shipment of light bulbs")
print(shipment)
print("\n")

rejections = 0

print("Some of the selected samples")

for i in range(repetitions):
    sample = np.random.choice(shipment, size = sample_size, replace = False)
    
    # Print the sample once every 1000 repetitions
    if i % 1000 == 0:
        print(sample)
        
    if np.sum(sample) > 0:
        rejections += 1

print("\n")
print(f"The proportion of shipments rejected is {rejections / repetitions}")

The shipment of light bulbs
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]


Some of the selected samples
[0 1 0 0 0]


The proportion of shipments rejected is 0.7


:::{.callout-note collapse="true"}
## Solution (click to expand)

We need to compute the probability that at least one of the sampled bulbs is defective, given that there are 20 defective bulbs in the shipment of 100 bulbs. We can define the event $A$ as the event that at least one of the sampled bulbs is defective.

$$
A = \text{at least one defective bulb in the sample}
$$

Let $A_1, A_2, \ldots, A_{5}$ be the events that the $i$-th bulb in the sample is defective. Then we have

$$
A = A_1 \cup A_2 \cup \ldots \cup A_{5}
$$

It is easy to compute the probability of the complement of $A$ using de Morgan's law

$$
P(A) = 1 - P(A^c) = 1 - P(A_1^c \cap A_2^c \cap \ldots \cap A_{5}^c)
$$

The chain rule gives us a way to decompose the probability of the intersections of the complements

$$
P(A_1^c \cap A_2^c \cap \ldots \cap A_{5}^c) = P(A_1^c) \cdot P(A_2^c | A_1^c) \cdot P(A_3^c | A_1^c \cap A_2^c) \cdot \ldots \cdot P(A_{5}^c | A_1^c \cap A_2^c \cap \ldots \cap A_{4}^c)
$$

The probability of the first bulb not being defective is

$$
P(A_1^c) = \frac{80}{100}
$$

The probability of the second bulb not being defective given that the first bulb is not defective is

$$
P(A_2^c | A_1^c) = \frac{79}{99}
$$

Continuing in this way, we find that

$$
P(A) = 1 - \frac{80}{100} \cdot \frac{79}{99} \cdot \frac{78}{98} \cdot \frac{77}{97} \cdot \frac{76}{96}
$$

:::

In [5]:
# Compute the probability of a shipment being rejected



:::{#exr-simulation-dogs}
## A Bifurcation in the Road

A tourists stands at a bifurcation in the road and does not know which is the correct way to go. He asks two passersby. Each passerby points him in the correct direction with probability 0.7 independently of each other.
We want to propose a strategy for the tourist to decide which way to go.

- Strategy 1: If the two passersby point in the same direction, the tourist follows that direction. If they point in different directions, the tourist chooses randomly.
- Strategy 2: The tourist decides to follow the direction of the first passerby only.
- Strategy 3: The tourist decides at random.

- Which strategy would you recommend? Decide this by calculating the probability of the tourist going in the correct direction for each strategy.
- Simulate the three strategies and estimate the probability of the tourist going in the correct direction. For the sake of the simulation, assume that the correct direction is the first one.
:::

In [6]:
import pandas as pd
import numpy as np

# Setup the simulation
repetitions = 10

# Simulate the third strategy

df = pd.DataFrame({
    "passerby1": np.random.choice([1, 2], size = repetitions, p = [0.7, 0.3]),
    "passerby2": np.random.choice([1, 2], size = repetitions, p = [0.7, 0.3]),
    "random_choice": np.random.choice([1, 2], size = repetitions, p = [0.5, 0.5])
})

# Checks if the two passerby agree on the direction. If yes, then take the direction of the first passerby 
# (both are equal). If not, then take the random direction

df["strategy1"] = np.where(
    df["passerby1"] == df["passerby2"],
    df["passerby1"],
    df["random_choice"]
)

df.head()

Unnamed: 0,passerby1,passerby2,random_choice,strategy1
0,1,2,1,1
1,1,1,2,1
2,2,1,1,1
3,1,1,2,1
4,2,2,1,2


In [7]:
print("Strategy 1 success rate: ", (df["strategy1"] == 1).sum() / repetitions)
print("Strategy 2 success rate: ", (df["passerby1"] == 1).sum() / repetitions)
print("Strategy 3 success rate: ", (df["random_choice"] == 1).sum() / repetitions)

Strategy 1 success rate:  0.8
Strategy 2 success rate:  0.7
Strategy 3 success rate:  0.5


In [2]:
0.0001 / (0.001 + 0.00000001 * 0.999)

0.09999900100997991

:::{.callout-note collapse="true"}
## Solution (click to expand)

Let us consider the scenarios leading to the correct choice, assuming the passersby are independent and point in the right direction with probability $p$ and in the wrong direction with probability $1-p$.

- The passersby agree and are correct with probability $p^2$.
- The first passerby is right and the the tourist follows his or her advice $p(1 - p)/2$.
- The second passerby is right the tourist follows his or her advice:  $(1 - p)p /2$.

These events are mutually exclusive and exhaustive, so the probability of the tourist going in the correct direction under Strategy 1 is

$$
p^2 + \frac{p(1 - p)}{2} + \frac{(1 - p)p}{2} = p^2 + p(1 - p) = p
$$

:::

:::{#exr-monty-hall}
## The Monty Hall Game

![The Monty Hall Game](https://upload.wikimedia.org/wikipedia/commons/thumb/3/3f/Monty_open_door.svg/300px-Monty_open_door.svg.png)


Imagine a game with three doors. Behind one of the doors is a car and behind the other two are goats. The game is played in two stages.

- First you choose a door.
- The host, _who knows what is behind each door_, opens one of the other two doors to reveal a goat.
- You are then given the opportunity to switch to the other unopened door or stay with your original choice.

Which strategy has a higher probability of winning the car? Should you switch or should you stay with your original choice? Compute the probabilities of winning the car under each strategy.

:::

In [8]:
# Simulation of the Monty Hall problem

# Setup the simulation
repetitions = 10

# Simulate the choices of the player

df = pd.DataFrame({
    "first_choice": np.random.choice([1, 2, 3], size = repetitions),    
})

df["opened_door"] = np.where(
    # If the player chose door 1 (the door with the car)
    (df["first_choice"] == 1),
    # , then the opened door is randomly chosen from doors 2 and 3
    np.random.choice([2, 3], size = repetitions),
    # Otherwise
    np.where(
        # If the player chose door 2
        (df["first_choice"] == 2),
        # , then the opened door is 3 (the host does not open the door with the car)
        3,
        # Otherwise the opened door is
        2
    )
)

df["switch_choice"] = np.where(
    # If the opened door was 2 and the player chose 1
    (df["opened_door"] == 2) & (df["first_choice"] == 1),
    # , then the switched choice is 3
    3,
    # Otherwise
    np.where(
        # If the opened door was 3 and the player chose 1
        (df["opened_door"] == 3) & (df["first_choice"] == 1),
        # , then the switched choice is 2
        2,
        # Otherwise the switched choice is 1
        1
    )
)

df.head(n = 20)

Unnamed: 0,first_choice,opened_door,switch_choice
0,1,3,2
1,2,3,1
2,2,3,1
3,3,2,1
4,1,2,3
5,2,3,1
6,3,2,1
7,3,2,1
8,1,2,3
9,2,3,1


In [9]:
print("No switching wins", (df["first_choice"] == 1).mean())
print("Switching wins", (df["switch_choice"] == 1).mean())

No switching wins 0.3
Switching wins 0.7



:::{.callout-note collapse="true"}
## Solution (click to expand)

Under the first strategy of no switching, the probability of winning the car is $1/3$ because the car is equally likely to be behind any of the three doors.

Under the second strategy of switching, the probability of winning the car is $2/3$. To see this, consider the following:

- The probability of the car being behind the door you chose is $1/3$.
- The probability of the car being behind one of the other two doors is $2/3$.
- The host will always open a door with a goat behind it. Therefore, the car is behind one of the other two doors, so the probability of winning the car by switching is $2/3$.

:::

:::{#exr-prisoners}
## The Executive's Dilemma

Consider the following problem: A struggling company need to reduce its executive workforce from three executives to only one. The company has three executives: Alice, Bob, and Charlie. The company president decides to leave the decision of whom to keep to chance.

The president does not want to tell the executives who is remaining until the decision is announced. Alice insists, however, and asks the president to tell her at least the name of one person who is sacked. The president reminds Alice that he will reveal the name of the person remaining and also tells her that Bob is sacked.

Alice is happy because she thinks that the probability of her remaining given this information is $1/2$ instead of $1/3$ (without the information). Is Alice's reasoning correct?

The example here is a reformulation of the famous [Prisoner's Dilemma](https://en.m.wikipedia.org/wiki/Three_prisoners_problem).
:::
