# Conditional Probability

[Open in Colab](https://colab.research.google.com/github/febse/stat2024/blob/main/05-Conditional-Probability.ipynb)

Until now we have seen some basic properties of probability. In this part of the course we will learn how to account
for partial information when calculating probabilities.

Suppose that you are playing a game of two dice and you want to bet that the first die will show a 6. A fried of yours rolls the dice and tells you that the sum of the two dice is 7. What is the probability that the first die shows a 6 _given_ that the sum of the two dice is 7? Would this additional information help you when deciding whether to bet on the first die showing a 6?

This is an example of a _conditional probability_ problem.

:::{#def-conditional-probability}

Let $A$ and $B$ be two events. The conditional probability of $A$ given $B$ is defined as

$$
P(A|B) = \frac{P(A \cap B)}{P(B)}
$$

We read $P(A | B)$ as "the probability of $A$ given $B$".
:::

Similar to the definition of a probability law that we have seen before, with equally likely outcomes in the sample 
space, we count the number of outcomes in the event $A$ that are also in the event $B$ and divide by the number of outcomes in $B$.

In the dice example we have $6^2=36$ equally likely outcomes in the sample space. 

$$
\Omega = \left\{
    \begin{align*}
    (1,1), (1,2), (1, 3), (1, 4), (1, 5), \mathbf{(1, 6)} \\
    (2, 1), (2, 2), (2, 3), (2, 4), \mathbf{(2, 5)}, (2, 6) \\
    (3, 1), (3, 2), (3, 3), \mathbf{(3, 4)}, (3, 5), (3, 6) \\
    (4, 1), (4, 2), \mathbf{(4, 3)}, (4, 4), (4, 5), (4, 6) \\
    (5, 1), \mathbf{(5, 2)}, (5, 3), (5, 4), (5, 5), (5, 6) \\
    \mathbf{(6, 1)}, (6, 2), (6, 3), (6, 4), (6, 5), (6, 6) \\
    \end{align*}    
\right\}
$$

The event $B$ (sum equal to 7) consists of the following elements:

$$
B = \left\{
    \begin{align*}
    (1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1) \\
    \end{align*}
    \right\}
$$

The event $A$ (first die shows a 6) consists of the following elements:

$$
A = \left\{
    \begin{align*}
    (6, 1), (6, 2), (6, 3), (6, 4), (6, 5), (6, 6) \\
    \end{align*}
    \right\}
$$

Now you know that one of the outcomes in B has occurred (otherwise the sum will not be equal to 7). There are 6 outcomes in $B$ and one of them is also in $A$. Therefore, the conditional probability of $A$ given $B$ is $1/6$.

What about the _unconditional_ probability of $A$?

$$
P(A) = \frac{6}{36} = \frac{1}{6}
$$

In this case the conditional probability of $A$ given $B$ is the same as the unconditional probability of $A$, meaning that knowing that the sum was 7 does not help you in deciding whether to bet on the first die showing a 6 (it does not change the odds). We say that the two events are **indepedent** (under $P$).

Consider, however, that the friend tells you that the sum of the two dice is 10. What is the probability that the first die shows a 6 _given_ that the sum of the two dice is 10? Let's denote this event as $C$.

In this case the event $C$ consists of the following elements:

$$
B = \left\{
    \begin{align*}
    (4, 6), (5, 5), (6, 4) \\
    \end{align*}
    \right\}
$$

The elements in $A$ have not change, they are the same, but we are looking for the elements which are both in $A$ and $C$. 
There is only one element that is in both $A$ and $C$:

$$
A \cap C = \left\{
    \begin{align*}
    (6, 4) \\
    \end{align*}
    \right\}
$$

So the _conditional_ probability of $A$ given $C$ is $1/3$.
$$
P(A|C) = \frac{\text{number of elements in } A \cap C}{\text{number of elements in } C} = \frac{1}{3}
$$

So this time the knowledge that the sum is 10 is informative.


## Independence of Two Events

:::{#def-independence-AB}
## Independence of Two Events

Two events $A$ and $B$ are said to be independent if

$$
P(A \cap B) = P(A) \cdot P(B)
$$

:::

The definition in @def-independence-AB is equivalent to the following, but has the advantage of being symmetric in $A$ and $B$:

:::{#exr-independence-AB}
## Independence and Conditional Probability

Show that the definition of independence in @def-independence-AB is equivalent to the following:

1. $P(A|B) = P(A)$
2. $P(B|A) = P(B)$
:::

:::{#exr-independence-AB-implication}
## Implications of Independence

Show that if $A$ and $B$ are independent, then the following statements are true:

1. $A$ and $B^c$ are independent
2. $A^c$ and $B$ are independent
3. $A^c$ and $B^c$ are independent

:::

:::{#exr-independence-AB-disjoint}
## Disjoint Events and Independence

Let $A$ and $B$ be two events such that $P(A) > 0$ and $P(B) > 0$. Show that if $A$ and $B$ are disjoint, then they are not independent.

:::


## Mutual Independence

The definition of independence can be extended to more than two events. However, an attempt to extend the definition 
by saying that $A$, $B$, and $C$ are independent if $P(A \cap B \cap C) = P(A) \cdot P(B) \cdot P(C)$ does not ensure that $A$, $B$, and $C$ are _pairwise_ independent in the sense of @def-independence-AB. As an exercise, try the following exercise.


:::{#exr-mutual-independence-contradicition}
## Mutual Independence and Pairwise Independence

Consider a sample space with 9 equally likely outcomes, each consisting of a triple $(i, j, k)$ where $i, j, k \in \{1, 2, 3\}$.

$$
\Omega = \left\{
    \begin{align*}
    (1, 1, 1), (2, 2, 2), (3, 3, 3) \\
    (1, 2, 3), (1, 3, 2), (3, 2, 1) \\
    (3, 1, 2), (2, 1, 3), (2, 3, 1) \\
    \end{align*}
\right\}
$$

Consider the following events $A_i = \{\text{i-th place of the outcome is 1}\}$. Are the events $A_1$, $A_2$, and $A_3$ pairwise independent? Are they mutually independent?
:::


:::{#def-mutual-independence}
## Mutual Independence

Events $A_1, A_2, \ldots, A_n$ are said to be mutually independent if for every subset $I \subseteq \{1, 2, \ldots, n\}$ we have

$$
P\left(\bigcap_{i \in I} A_i\right) = \prod_{i \in I} P(A_i)
$$

For example for $n=3$ we have that $A_1$, $A_2$, and $A_3$ are mutually independent if _all_ of the following hold:

$$
\begin{align*}
P(A_1 \cap A_2) &= P(A_1) \cdot P(A_2) \\
P(A_1 \cap A_3) &= P(A_1) \cdot P(A_3) \\
P(A_2 \cap A_3) &= P(A_2) \cdot P(A_3) \\
P(A_1 \cap A_2 \cap A_3) &= P(A_1) \cdot P(A_2) \cdot P(A_3) \\
\end{align*}
$$

:::


## Bayes' Theorem

We have already seen that the conditional probability of $A$ given $B$ is defined as

$$
P(A|B) = \frac{P(A \cap B)}{P(B)}
$$

We can also write this as

$$
P(A \cap B) = P(A|B) \cdot P(B)
$$

and similarly

$$
P(B \cap A) = P(B|A) \cdot P(A)
$$

Because the first two terms are equal, we have

$$
\begin{align*}
P(A|B) \cdot P(B) = P(B|A) \cdot P(A) \\
P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)}
\end{align*}
$$

This last equation is known as **Bayes' Theorem** and it turns out that it holds in a more general way than here.

:::{#thm-bayes-theorem}

## Bayes' Theorem

Let $A$ and $B$ be two events such that $P(A) > 0$ and $P(B) > 0$. Then

$$
P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)}
$$

Furthermore, let $A_1, A_2, \ldots, A_n$ be a partition of the sample space $\Omega$. A partition is a collection of disjoint events such that their union is the entire sample space:

$$
A_i \cap A_j = \emptyset \quad \text{for all } i \neq j \quad \text{and} \quad \bigcup_{i=1}^n A_i = \Omega
$$

For any event $B$ such that $P(B) > 0$, we have

$$
P(A_i|B) = \frac{P(B|A_i) \cdot P(A_i)}{\sum_{j=1}^n P(B|A_j) \cdot P(A_j)}
$$

To understand the derivation of Bayes' Theorem, see that it holds for a very simple partition. The collection of events $A$ and $A^c$ is a partition of the sample space. In this case, Bayes' Theorem becomes

$$
P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B|A) \cdot P(A) + P(B|A^c) \cdot P(A^c)}
$$

:::


:::{#exr-total-probability-complements}
## Total Probability and Complements

Let $A$ and $B$ be some events such that $P(A) > 0$ and $P(B) > 0$. Show that

$$
P(B) = P(B|A) \cdot P(A) + P(B|A^c) \cdot P(A^c)
$$
:::

Note that the sets $A$ and $A^c$ form a partition of the sample space because $A \cap A^c = \emptyset$ and $A \cup A^c = \Omega$.

:::{#thm-total-probability}
## Total Probability Theorem

Let $A_1, A_2, \ldots, A_n$ be a partition of the sample space $\Omega$. For any event $B$ such that $P(B) > 0$, we have

$$
P(B) = \sum_{i=1}^n P(B|A_i) \cdot P(A_i)
$$
:::
:::{.proof}

The intersection of $B$ with the sample space is $B \cap \Omega = B$ as $B$ is a subset of $\Omega$ (as any event).
Note that the sets $B \cap A_i$ are disjoint because the $A_i$ are disjoint. Therefore, we have

$$
B = B \cap \Omega
$$

Because the $A_i$ are a partition of the sample space, their union is the sample space:

$$
\Omega = \bigcup_{i=1}^n A_i
$$

In this way we can represent $B$ as the union of the intersections of $B$ with the $A_i$:

$$
B = B \cap \Omega = B \cap \left( \bigcup_{i=1}^n A_i \right) = \bigcup_{i=1}^n (B \cap A_i)
$$

The last step follows from the distributive property of the intersection over the union. Now we can use the additivity of the probability measure to write

$$
P(B) = P\left( \bigcup_{i=1}^n (B \cap A_i) \right) = \sum_{i=1}^n P(B \cap A_i) = \sum_{i=1}^n P(B|A_i) \cdot P(A_i)
$$

:::


:::{#exr-narco-tests}
## Roadside Drug Tests

Imagine a city called Virtue City has 100,000 inhabitants. All inhabitants are drivers and 100 of them are drug users.

The police department introduces a new test for narcotics that has a 95 percent probability of correctly identifying a person who has used narcotics (true positive). There is a 2 percent probability of falsely identifying a person who has not used narcotics as having used them (false positive).

The police stop a driver, his test comes back positive, and the driver's car is confiscated. The police claim that there is a 95 percent probability that the driver is a drug user. The driver objects and says that the probability of him being a drug user is much lower. Who is right?

:::

:::{#exr-quality-control}
## Quality Control

A factory produces light bulbs. To ensure the quality of the bulbs shipped to customers, the factory has a quality control system that checks a sample of the bulbs before they are shipped. Consider a shipment of 1000 bulbs. You take a sample of 50 bulbs and stop the shipment if you find that more than 5 of the sampled bulbs are defective. Suppose that a shipment contains 20 defective bulbs. What is the probability that the shipment will be stopped?

:::

:::{#exr-simulation-dogs}
## A Bifurcation in the Road

A tourists stands at a bifurcation in the road and does not know which is the correct way to go. He asks two passerby. The passerby point him in the correct direction with probability 0.7 independently of each other.
We want to propose a strategy for the tourist to decide which way to go.

- Strategy 1: If the two passerby point in the same direction, the tourist follows that direction. If they point in different directions, the tourist chooses randomly.
- Strategy 2: The tourist decides to follow the direction of the first passerby only.
- Strategy 3: The tourists decides at random.

Simulate the three strategies and estimate the probability of the tourist going in the correct direction. For the sake of the simulation, assume that the correct direction is the first one.
:::

In [1]:
import pandas as pd
import numpy as np

# Setup

repetitions = 10

# Simulate the directions of two passerby
directions = np.random.choice([1, 2], size = (repetitions, 2), p=[0.7, 0.3])

# Simulate the third strategy
random_choice = np.random.choice([1, 2], size = repetitions)

df = pd.DataFrame({
    "passerby1": np.random.choice([1, 2], size = repetitions, p = [0.7, 0.3]),
    "passerby2": np.random.choice([1, 2], size = repetitions, p = [0.7, 0.3]),
    "random": np.random.choice([1, 2], size = repetitions, p = [0.5, 0.5])
})

# Checks if the two passerby agree on the direction. If yes, then take the direction of the first passerby 
# (both are equal). If not, then take the random direction

df["strategy1"] = np.where(
    df["passerby1"] == df["passerby2"],
    df["passerby1"],
    df["random"]
)

# Calculate the proportions of times each strategy is successful (the tourist is directed to the correct path).
# Hint: use the np.mean function or the .mean() method of the columns of the DataFrame



:::{#exr-prisoners}
## The Executives Dilemma
Consider the following problem: A struggling company need to reduce its executive workforce from three executives to only one. The company has three executives: Alice, Bob, and Charlie. The company president decides to leave the decision of whom to keep to chance.
The president does not want to tell the executives who is remaining until the decision is announced. Alice insists, however, and asks 
the president to tell her at least the name of one person who is sacked. The president reminds Alice that he will not tell her if is sacked and also tells her that Bob is sacked. Alice is happy because she thinks that the probability of her remaining is now $1/2$ instead of $1/3$. Is Alice correct?

The example here is a rephrasing of the famous [Prisoner's Dilemma](https://en.m.wikipedia.org/wiki/Three_prisoners_problem).
:::