# Lesson - Statistics and Probability XVII: Bayes Theorem

In this lesson, we'll discuss Bayes' theorem, which is a central topic in probability.

### Independence versus Exclusivity

Two events A and B are independent if the occurrence of one doesn't change the probability of the other. In mathematical terms, we've seen A and B are independent if any of the conditions below are true:

$$\begin{equation}
P(A) = P(A|B) \\
P(B) = P(B|A) \\
P(A \cap B) = P(A) \cdot P(B) 
\end{equation}$$

Two events — A and B — are mutually exclusive if they cannot occur both at the same time. If one event happens, the other cannot possibly happen anymore, and vice-versa. Examples of mutually exclusive events include:

Getting a 5 (event A) and getting a 3 (event B) when we roll a regular six-sided die — it's impossible to get both a 5 and a 3.
A coin lands on heads (event A) and tails (event B) — it's impossible to flip a coin and see it landing on both heads and tails.

If two events A and B are mutually exclusive, then it's impossible that they both occur, which means (A ∩ B) is an impossible event (and the probability of impossible events is always 0):
$$\begin{equation}
P(A \cap B) = 0 
\end{equation}$$

Both independence and exclusivity describe the relationship between two or more events, and we see that they have different mathematical meanings:

$$\begin{aligned}
\text{Independence} &\implies P(A \cap B) = P(A) \cdot P(B) \\
\text{Dependence} &\implies P(A \cap B) = P(A) \cdot P(B|A) \\
\text{Exclusivity} &\implies P(A \cap B) = 0 
\end{aligned}$$

Let's take a quick look at a few examples. Say we roll a fair six-sided die twice and consider these four events:

- Event A: We get a 4 on the first roll.
- Event B: We get a 2 on the second roll.
- Event C: We get an even number on the first roll.
- Event D: We get a 5 on the first roll.

If event A happens, then the probability of event B stays the same, since the result of the first roll doesn't influence the result of the second one in any way — this means A and B are independent. Also, we can get a 4 on the first roll (event A) and a 2 on the second roll (event B), which means A and B are not mutually exclusive.

Now let's look at the relationship between events A and C. If C happens, then the probability of A changes, and vice-versa. This means A and C are dependent. Also, if the outcome was 4, then we'd get a 4 (event A) and an even number (event C) at the same time, which means A and C are not mutually exclusive.

However, if we look at events A and D, we see they cannot possibly happen together — we cannot get both a 4 and a 5 on the first roll. This means event A and D are mutually exclusive. Since A and D cannot possibly happen together, it becomes meaningless to talk about independence since the concept of independence makes sense only as long as both events can happen.

For the exercises below, consider the following probabilities:

- The probability of being infected with HIV is 0.00014. That is P(HIV)=0.00014.
- The probability of being infected with HIV given a positive result from an HIV test is 0.03. That is P(HIV|T+)=0.03.

Assess with True or False the following statements:

- Events HIV and T+ are independent. If  this statement is true, then assign the boolean True to `statement_1`, otherwise assign False.
- Events HIV and HIVC are mutually exclusive. If you think this statement is true, then assign the boolean True to `statement_2`, otherwise assign False.
- Events HIVC and T+ are dependent. If you think this statement is true, then assign the boolean True to `statement_3`, otherwise assign False.

In [1]:
statement_1 = False
statement_2 = True
statement_3 = True

### Example

Probabilities addition follows the following rule:
$$\begin{equation}
P(A \cup B) = P(A) + P(B) - P(A \cap B) 
\end{equation}$$

If events A and B are mutually exclusive, then P(A∩B)=0. Therefore, the addition rule for mutually exclusive events reduces to:

$$\begin{aligned}
P(A \cup B) &= P(A) + P(B) - 0 \\
P(A \cup B) &= P(A) + P(B)
\end{aligned}$$

With this in mind, let's consider the probabilities associated with testing for an HIV test:

- The probability of getting a positive test result given that a patient is not infected with HIV is 1.05%. That is P(T+|HIVC)=0.0105.
- The probability of getting a positive test result given that a patient is infected with HIV is 99.78%. That is P(T+|HIV)=0.9978.
- The probability of being infected with HIV is 0.14%. That is P(HIV)=0.0014.
- The probability of not being infected with HIV is 99.86%. That is P(HIVC)=0.9986.

Now what if we just want to find P(T+), the probability that a person selected at random will get a positive result? There are two possible scenarios when someone gets a positive result:

- The person is infected with HIV and gets a positive result.
- The person is not infected with HIV and gets a positive result.

In the first scenario, note that two events happen: HIV and T+. In set notation, we write (HIV ∩ T+) if both HIV and T+ occur. In the second scenario, two events happen: HIVC and T+. In set notation, we write (HIVC ∩ T+) if both HIVC and T+ happen. Since there are only two possible scenarios, we can understand the event T+ as the union of the events (HIV ∩ T+) and (HIVC ∩ T+):
$$\begin{equation}
T^+ = (HIV \cap T^+) \cup (HIV^C \cap T^+)
\end{equation}$$

We can visualize this set union using a Venn diagram:
![image.png](attachment:image.png)

The events $(HIV ∩ T+)$ and $(HIVC ∩ T+)$ are mutually exclusive (they cannot happen both at the same time), because a person tested positive cannot both have and not have HIV. This means that we can calculate the probability of their union using the addition rule $P(A \cup B) = P(A) + P(B)$

$$begin{equation}
P(\overbrace{T^+}^{A \cup B}) = P((\overbrace{HIV \cap T^+}^{A}) \cup (\overbrace{HIV^C \cap T^+}^{B}))
\end{equation}$$

$$\begin{equation}
P(T^+) = P(HIV \cap T^+) + P(HIV^C \cap T^+)
\end{equation}$$

Using the multiplication rule on $P(HIV ∩ T+)$ and $P(HIVC ∩ T+)$, the last equation above becomes:

$$\begin{equation}
P(T^+) = P(HIV) \cdot P(T^+ | HIV) + P(HIV^C) \cdot P(T^+ | HIV^C)
\end{equation}$$


All the probabilities we need were listed earlier, which means we can find P(T+):

$$\begin{aligned}
P(T^+) &= 0.0014 \cdot 0.9978 + 0.9986 \cdot 0.0105 \\
&= 0.0119
\end{aligned}$$

We see P(T+) — the probability of testing positive — is only 1.19%. This is mostly because the probability of having HIV is very low in the first place.

**Exercise**
We can find the word "secret" in many spam emails. However, some emails are not spam even though they contain the word "secret." Let's say we know the following probabilities:

- The probability of getting a spam email is 23.88%. That is P(Spam)=0.2388.
- The probability of an email containing the word "secret" given that the email is spam is 48.02%. That is P("secret"|Spam)=0.4802.
- The probability of an email containing the word "secret" given that the email is not spam is 12.84%. That is P("secret"|SpamC)=0.1284.
Calculate:

- P(SpamC). Assign the result to `p_non_spam`.
- P(Spam ∩ "secret"). Assign the result to `p_spam_and_secret`.
- P(SpamC ∩ "secret"). Assign the result to `p_non_spam_and_secret`.
- P("secret"). Assign the result to `p_secret`.


In [2]:
p_spam = 0.2388
p_secret_given_spam = 0.4802
p_secret_given_non_spam = 0.1285

p_non_spam = 1 - p_spam
p_spam_and_secret = p_spam * p_secret_given_spam
p_non_spam_and_secret = p_non_spam * p_secret_given_non_spam
p_secret = p_spam_and_secret + p_non_spam_and_secret 

print(p_non_spam, p_spam_and_secret, p_non_spam_and_secret, p_secret)


0.7612 0.11467176000000001 0.0978142 0.21248596000000003


### General Formula

The events HIV and HIVC are **mutually exclusive** and **exhaustive**. If two events are exhaustive, it means they make up the whole sample space Ω. This is how we could represent HIV and HIVC on a Venn diagram:

![image.png](attachment:image.png)


We can also add the event T+ on the diagram above, which will show us visually why $T^+ = (HIV \cap T^+) \cup (HIV^C \cap T^+)$
![image.png](attachment:image.png)

Now we need to develop a general formula that reflects the way we calculated P(T+) above:
$$\begin{equation}
P(T^+) = P(HIV \cap T^+) + P(HIV^C \cap T^+)
\end{equation}$$

 If instead of T+, HIV, and HIVC, we have A, B, and BC:
 
 ![image.png](attachment:image.png)
 
 With this in mind, we can now develop a general formula for P(A):
 
 $$\begin{equation}
P(A) = P(B \cap A) + P(B^C \cap A)
\end{equation}$$

Using the multiplication rule on P(B ∩ A) and P(BC ∩ A), the above formula becomes:

$$\begin{equation}
P(A) = P(B) \cdot P(A|B) + P(B^C) \cdot P(A|B^C)
\end{equation}$$

This formula plays a critical role in the Bayes Theorem.

**Exercise**

An airline transports passengers using two types of planes: a Boeing 737 and an Airbus A320.

- The Boeing operates 73% of the flights. Out of these flights, 3% arrive at the destination with a delay.
- The Airbus operates the remaining 27% of the flights. Out of these flights, 8% arrive with a delay.
Convert the percentages above to probabilities:

- Assign the probability of flying with a Boeing to`p_boeing` (to better understand what this probability means, imagine a passenger having bought a ticket with this airline — what's the probability that this passenger will be assigned to fly to her destination with a Boeing?).
- Assign the probability of flying with an Airbus to `p_airbus`.
- Assign the probability of arriving at the destination with a delay given that the passenger flies with a Boeing to `p_delay_given_boeing`.
- Assign the probability of arriving at the destination with a delay given that the passenger flies with an Airbus to `p_delay_given_airbus`.

Calculate:

 The probability that a passenger will arrive at her destination with a delay. Assign answer to `p_delay`. 

In [3]:
p_boeing = 0.73
p_airbus = 0.27

p_delay_given_boeing = 0.03
p_delay_given_airbus = 0.08

p_delay = p_boeing * p_delay_given_boeing + p_airbus * p_delay_given_airbus

print(p_delay)

0.0435


### Formula for Three Events

Above, we used this formula to calculate the probability of having a delay when flying with a particular airline:
$$\begin{equation}
P(A) = P(B) \cdot P(A|B) + P(B^C) \cdot P(A|B^C)
\end{equation}$$

Recall that the airline transports passengers using two types of planes: a Boeing 737 and an Airbus A320. This allowed us to model P(Delay) as:

$$\begin{equation}
\overbrace{P(Delay)}^{P(A)} = \overbrace{P(Boeing) \cdot P(Delay|Boeing)}^{P(B) \cdot P(A|B)} + \overbrace{P(Airbus) \cdot P(Delay|Airbus)}^{P(B^C) \cdot P(A|B^C)}
\end{equation}$$

However, let's consider another airline which has three types of planes: a Boeing 737, an Airbus A320, and an ERJ 145.

- The Boeing operates 58% of the flights. Out of these flights, 4% arrive at the destination with a delay.
- The Airbus operates 31% of the flights. Out of these flights, 7% arrive with a delay.
- The ERJ operates the remaining 11% of the flights. Out of these flights, 2% arrive with a delay.

A passenger buying a ticket with this airline will be assigned to only one of the three types of airplanes. This means that the sample space is made up of three events that are all **mutually exclusive** and **exhaustive**. On a Venn diagram, we have:

![image.png](attachment:image.png)

After adding the event Delay on the above Venn diagram:

![image.png](attachment:image.png)

udging by the diagram, we can see that P(Delay) is:

$$\begin{equation}
P(Delay) = P(Boeing \cap Delay) + P(Airbus \cap Delay) + P(ERJ \cap Delay)
\end{equation}$$

Using the multiplication rule, the equation above becomes:

$$\begin{aligned}
P(Delay) &= P(Boeing) \cdot P(Delay|Boeing) + P(Airbus) \cdot P(Delay|Airbus) + P(ERJ) \cdot P(Delay|ERJ) \\
&= 0.58 \cdot 0.04 + 0.31 \cdot 0.07 + 0.11 \cdot 0.02 = 0.05
\end{aligned}$$

To develop a more general formula, imagine that instead of the events Delay, Boeing, Airbus, and ERJ, we have events A, B1, B2, and B3:

$$\begin{equation}
\overbrace{P(A)}^{P(Delay)} = \overbrace{P(B_1)}^{P(Boeing)} \cdot P(A|B_1) + \overbrace{P(B_2)}^{P(Airbus)} \cdot P(A|B_2) + \overbrace{P(B_3)}^{P(ERJ)} \cdot P(A|B_3)
\end{equation}$$

**Exercise**


An airline transports passengers using three types of planes: a Boeing 737, an Airbus A320, and an ERJ 145.

- The Boeing operates 62% of the flights. Out of these flights, 6% arrive at the destination with a delay.
- The Airbus operates 35% of the flights. Out of these flights, 9% arrive with a delay.
- The ERJ operates the remaining 3% of the flights. Out of these flights, 1% arrive with a delay.
Calculate the probability of delay and assign result to `p_delay`. 

In [4]:
p_boeing = 0.62
p_airbus = 0.35
p_erj = 0.03
p_delay_boeing = 0.06 
p_delay_airbus = 0.09
p_delay_erj = 0.01

p_delay = p_boeing * p_delay_boeing + p_airbus * p_delay_airbus + p_erj * p_delay_erj
print(p_delay)

0.06899999999999999


### The Law of Total Probability

Above we developed a formula to include three events:

$$\begin{equation}
P(A) = P(B_1) \cdot P(A|B_1) + P(B_2) \cdot P(A|B_2) + P(B_3) \cdot P(A|B_3)
\end{equation}$$

To develop a formula with four events, we can use the same reasoning as we used to develop the formula above. Let's say the sample space Ω is made up of four mutually exclusive and exhaustive events:
$$\begin{equation}
\Omega = \{B_1, B_2, B_3, B_4\}
\end{equation}$$

Then we can understand event A as the union of the following events:

$$\begin{equation}
A = (B_1 \cap A) \cup (B_2 \cap A) \cup (B_3 \cap A) \cup (B_4 \cap A)
\end{equation}$$

![image.png](attachment:image.png)

Using the addition rule for mutually exclusive events, we have:

$$\begin{equation}
P(A) = P(B_1 \cap A) + P(B_2 \cap A) + P(B_3 \cap A) + P(B_4 \cap A)
\end{equation}$$

Using the multiplication rule, we arrive at:

$$\begin{equation}
P(A) = P(B_1) \cdot P(A|B_1) + P(B_2) \cdot P(A|B_2) + P(B_3) \cdot P(A|B_3) + P(B_4) \cdot P(A|B_4)
\end{equation}$$

IF the sample space Ω is made up of n mutually exclusive and exhaustive events:

$$\begin{equation}
\Omega = \{B_1, B_2, ..., B_n\}
\end{equation}$$

the formula for n events is:

$$\begin{equation}
P(A) = P(B_1) \cdot P(A|B_1) + P(B_2) \cdot P(A|B_2) + \dots + P(B_n) \cdot P(A|B_n)
\end{equation}$$

The above formula is called the **law of total probability**.

The law of total probability is often rewritten using the summation sign Σ:

$$\begin{equation}
P(A) = \sum_{i = 1}^{n} P(B_i) \cdot P(A|B_i)
\end{equation}$$

### Bayes Theorem

 Above, we discussed a few examples around plane delays and tried to use the law of total probability to find P(Delay), the probability that a passenger will arrive at her destination with a delay. Once a plane arrived with a delay, however, we might be interested to calculate the probability that it's a Boeing. In other words, what's the probability that the plane is a Boeing given that it arrived with a delay?
 
 For example, an airline transports passengers using two types of planes: a Boeing 737 and an Airbus A320.
- The Boeing operates 73% of the flights. Out of these flights, 3% arrive at the destination with a delay.
- The Airbus operates the remaining 27% of the flights. Out of these flights, 8% arrive with a delay.

Let's say a plane did arrive with a delay and we want to find the probability that the plane is a Boeing. In other words, we want to find P(Boeing|Delay). Let's begin by expanding P(Boeing|Delay) using the conditional probability formula:

$$\begin{equation}
P(Boeing|Delay) = \frac{P(Boeing \cap Delay)}{P(Delay)} = \frac{P(Boeing) \cdot P(Delay|Boeing)}{P(Delay)}
\end{equation}$$

We already know from the problem statement that P(Boeing)=0.73 and P(Delay|Boeing)=0.03. We don't know the value of P(Delay), but we can find it using the law of total probability:

$$\begin{aligned}
P(Delay) &= P(Boeing) \cdot P(Delay|Boeing) + P(Airbus) \cdot P(Delay|Airbus) \\
&= 0.73 \cdot 0.03 + 0.27 \cdot 0.08 = 0.0435
\end{aligned}$$

$$\begin{equation}
P(Boeing|Delay) = \frac{0.73 \cdot 0.03}{0.0435} = 0.5034
\end{equation}$$

This is an instance where we applied Bayes' theorem to solve a probability problem. Mathematically, Bayes' theorem can be defined as:

$$\begin{equation}
P(B|A) = \frac{P(B) \cdot P(A|B)}{\displaystyle \sum_{i = 1}^{n} P(B_i) \cdot P(A|B_i)}
\end{equation}$$

Note that we arrived at Bayes' theorem by substituting the law of total probability into the conditional probability formula and expanding the numerator P(B ∩ A) using the multiplication rule:

$$\begin{aligned}
\text{Conditional Probability} &\implies P(B|A) = \frac{P(B \cap A)}{P(A)} \\
\text{The Law of Total Probability} &\implies P(A) = \sum_{i = 1}^{n} P(B_i) \cdot P(A|B_i) \\
\text{Bayes' Theorem} &\implies P(B|A) = \frac{P(B) \cdot P(A|B)}{\displaystyle \sum_{i = 1}^{n} P(B_i) \cdot P(A|B_i)}
\end{aligned}$$

Above, we defined the formulas for P(B|A), but we can also define them for P(A|B):

$$\begin{aligned}
\text{Conditional Probability} &\implies P(A|B) = \frac{P(A \cap B)}{P(B)} \\
\text{The Law of Total Probability} &\implies P(B) = \sum_{i = 1}^{n} P(A_i) \cdot P(B|A_i) \\
\text{Bayes' Theorem} &\implies P(A|B) = \frac{P(A) \cdot P(B|A)}{\displaystyle \sum_{i = 1}^{n} P(A_i) \cdot P(B|A_i)}
\end{aligned}$$

An airline transports passengers using two types of planes: a Boeing 737 and an Airbus A320.

- The Boeing operates 73% of the flights. Out of these flights, 3% arrive at the destination with a delay.
- The Airbus operates the remaining 27% of the flights. Out of these flights, 8% arrive with a delay.
Use Bayes' theorem to find P(Airbus|Delay). Assign answer to `p_airbus_delay`. 

In [5]:
p_b = 0.73
p_bc = 0.27
p_a_given_b = 0.03
p_a_given_bc = 0.08
p_a = (p_b * p_a_given_b) + p_bc * (p_a_given_bc) # Total probability of a

# Using Bayes Theorem
p_bc_given_a = (p_bc * p_a_given_bc) / p_a 
p_airbus_delay = p_bc_given_a
print(p_airbus_delay)

0.4965517241379311


### Prior and Posterior Probability

Earlier, we considered an example around HIV testing and saw the following probabilities:

- The probability of getting a positive test result given that a patient is infected with HIV is 99.78%. That is P(T+|HIV)=0.9978.
- The probability of getting a positive test result given that a patient is not infected with HIV is 1.05%. That is P(T+|HIVC)=0.0105.
- The probability of being infected with HIV is 0.14%. That is P(HIV)=0.0014.
- The probability of not being infected with HIV is 99.86%. That is P(HIVC)=0.9986.

Since P(T+|HIV)=0.9978, it means that 99.78% of the people infected with HIV get a correct diagnosis — they test positive for a virus they actually have.

The value of P(T+|HIVC)=0.0105 means 1.05% of the persons who are not infected with HIV get a wrong diagnosis — they test positive for a virus they don't have. All in all, this suggests the test is quite efficient.

Now let's say a person comes in for a test and we don't know beforehand whether they have HIV or not. The patient tests positive. One important question we may have now is: Given the positive test result, what's the probability of being infected with HIV? In other words, what is P(HIV|T+)?

We can find the answer using Bayes' theorem. Let's begin by expanding P(HIV|T+) using the conditional probability formula:

$$\begin{equation}
P(HIV | T^+) = \frac{P(HIV \cap T^+)}{P(T^+)} = \frac{P(HIV) \cdot P(T^+|HIV)}{P(T^+)}
\end{equation}$$

From the problem statement, we know that $P(HIV)=0.00144 and 4P(T+|HIV)=0.9978.$ We don't know the value of P(T+), but we can find it using the law of total probability:

$$\begin{aligned}
P(T^+) &= P(HIV) \cdot P(T^+|HIV) + P(HIV^C) \cdot P(T^+|HIV^C) \\
&= 0.0014 \cdot 0.9978 + 0.9986 \cdot 0.0105 = 0.0119
\end{aligned}$$

We now have all the values we need to calculate P(HIV|T+):
$$\begin{equation}
P(HIV | T^+) = \frac{0.0014 \cdot 0.9978}{0.0119} = 0.1174
\end{equation}$$

We see that if a person tests positive, the probability of being infected with HIV is still pretty low: 11.74%. This low value may be a bit counter-intuitive given the high efficiency of the test. However, the probability is low because P(HIV) — the probability of having HIV — is very low in the first place: 0.14%.

Notice, however, that if a person tests positively, the probability of being infected with HIV actually increases a lot. The regular person in the population has a 0.14% chance to be infected with HIV — since P(HIV)=0.0014. But if a person tests positively, the probability of HIV infection increases to 11.74%, which is about 84 times more than the initial probability!
$$\begin{equation}
\frac{P(HIV|T^+)}{P(HIV)} = \frac{0.1174}{0.0014} = 83.85
\end{equation}$$

In the above example, we've considered the probability of being infected with HIV in two scenarios:

- Before doing any test: P(HIV)
- After testing positive: P(HIV|T+)

The probability of being infected with HIV before doing any test is called the **prior probability** ("prior" means "before"). The probability of being infected with HIV after testing positive is called the **posterior probability** ("posterior" means "after"). So, in this case, the prior probability is 0.14%, and the posterior probability is 11.74%.

**Exercise**

Many spam emails contain the word "secret". However, some emails are not spam even though they contain the word "secret". Let's say we know the following probabilities:

The probability of getting a spam email is 23.88%. That is P(Spam)=0.2388.
The probability of an email containing the word "secret" given that the email is spam is 48.02%. That is P("secret"|Spam)=0.4802.
The probability of an email containing the word "secret" given that the email is not spam is 12.84%. That is P("secret"|SpamC)=0.1284.
1. Use Bayes' theorem to find P(Spam|"secret"). Assign answer to `p_spam_given_secret`.

2. Assign the prior probability of getting a spam email to `prior`.

3. Assign the posterior probability of getting a spam email (after we see the email contains the word "secret") to `posterior`.

4. Calculate the ratio between the posterior and the prior probability — i.e. divide the posterior probability by the prior probability. Assign answer to `ratio`.

In [6]:
p_b = 0.2388
p_bc = 1 - 0.2388

p_a_given_b = 0.4802
p_a_given_bc = 0.1284

# calculate total probability of a
p_a = (p_b * p_a_given_b) + (p_bc * p_a_given_bc)
# Use Bayes Theorem to calculate p_b_given_a
p_b_given_a = (p_b * p_a_given_b) / p_a

p_spam_given_secret = p_b_given_a
prior = p_b
posterior = p_spam_given_secret
ratio = posterior/prior

print(p_spam_given_secret, ratio)

0.539860865202855 2.2607238911342336
