# Lesson - Statistics and Probability XV: Conditional Probability Fundamentals

**Recap**

We have already seen that a random experiment such as rolling a six sided die has six possible outcomes: 1, 2, 3, 4, 5, and 6. The set of all possible outcomes associated with a random experiment is called a sample space, and we denote it by the Greek letter Ω ("Omega"). We represent the sample space of a die roll as a set:
$$\begin{equation}
\Omega = \{1, 2, 3, 4, 5 ,6\}
\end{equation}$$

Under the assumption that all six outcomes are equally likely (the die is fair), we can find P(5) — the probability of the event "getting a 5" — by using the following formula:
$$\begin{equation}
\text{P(5)} = \frac{\text{number of successful outcomes}}{\text{total number of possible outcomes}}
\end{equation}$$

There are six possible outcomes (1, 2, 3, 4, 5, and 6) and one successful outcome (5), so the probability of getting a 5 when rolling a fair six-sided die is:
$$\begin{equation}
\text{P(5)} = \frac{\text{number of successful outcomes}}{\text{total number of possible outcomes}} = \frac{1}{6}
\end{equation}$$

Here we will discuss probability based on certain conditions based on conditional probability rules.

### Updating Probability
If in the above random experiment, the die showed an odd number (1, 3, or 5) after landing, and we know that the number is "Odd", then the probability of "getting a 5" changes in the light of the new knowledge.

When we don't know whether the number is odd, the possible outcomes of the experiment are 1, 2, 3, 4, 5, or 6. But after we find out the number is odd, the possible outcomes are 1, 3, or 5. In other words, the new information we got reduced the sample space from {1, 2, 3, 4, 5, 6} to {1, 3, 5}:
$$\begin{equation}
\Omega = \{1, 2, 3, 4, 5, 6\}\ \ \ \ \xrightarrow[]{becomes} \ \ \ \ \Omega = \{1, 3, 5\}
\end{equation}$$

With Ω = {1,3,5}, we now have three total possible outcomes (1, 3, and 5) and only one successful outcome (5), so P(5)≠ 1/6 , but rather:
![image.png](attachment:image.png)

**Exercise**

A fair six-sided die is rolled. All we know is that the number we got is less than 5. Calculate:

- The probability of getting a 3. Assign answer to `p_3`.
- The probability of getting a 6. Assign answer to `p_6`.
- The probability of getting an odd number. Assign answer to `p_odd`.
- The probability of getting an even number Assign answer to `p_even`.

In [2]:
p_3 = 1/4
p_6 = 0
p_odd = 2/4
p_even = 2/4


### Conditional Probability##

Above, we considered a die roll and updated P(5) from 1/6 to 1/3. Taken individually, however, are correct answers to different questions:

- What is the probability of getting a 5?
- What is the probability of getting a 5 given the die showed an odd number?

P(5) = 1/6 is a correct answer to the first question, where we only want to find the probability of getting a 5. P(5) = 1/3 is a correct answer to the second question, where we introduce a condition: the die showed an odd number. To account for this condition and for the fact that two different questions are being addressed, we have to introduce some extra notation:

$$\begin{equation}
\text{P}(5) = \frac{1}{6}
\\
\text{P}(5\underbrace{\text{ given the die showed an odd number}}_{\text{extra notation}}) = \frac{1}{3}
\end{equation}$$

For notation simplicity, $\text{P}(5 \text{ given the die showed an odd number})$ becomes $\text{P}(5\ | \text{ odd})$. The vertical bar character (|) should be read as "given." Another way to read $\text{P}(5\ | \text{ odd})$ is the conditional probability of getting a 5 given the die showed an odd number.

**Exercise**

A student is randomly selected from a class. All we know is that he was born during winter. Assume the winter months are December, January, and February and ignore the fact that these three months have different number of days. Find:

- The probability that he was born in December. Assign your answer to `p_december`.
- The probability that he was born in a 31-day month. Assign answer to `p_31`.
- The probability that he was born during summer. Assign answer to `p_summer`.
- The probability that he was born in a month which ends in letter "r" — "September," for instance, ends in "r," while "April" doesn't. Assign answer to `p_ends_r`.

In [4]:
p_december = 1/3
p_31 = 2/3
p_summer = 0
p_ends_r = 1/3

**Conditional Probability Formula**

If we roll a fair six-sided die and want to find the probability of getting an odd number, given the die showed a number greater than 1 after landing. Using probability notation, we want to find P(A|B) where: 

- A is the event that the number is odd: A = {1, 3, 5}
- B is the event that the number is greater than 1: B = {2, 3, 4, 5, 6}

To find P(A|B), we need to use the following formula:
$$\begin{equation}
P(A|B) = \frac{\text{number of successful outcomes}}{\text{total number of possible outcomes}}
\end{equation}$$

We know for sure event B happened (the number is greater than 1), so the sample space is reduced from {1, 2, 3, 4, 5, 6} to {2, 3, 4, 5, 6}:
$$\begin{equation}
\Omega = \{1, 2, 3, 4, 5, 6\} \xrightarrow[]{becomes} \Omega = \{2, 3, 4, 5, 6\}
\end{equation}$$

This means we're left with only five total possible outcomes if B happens:
$$\begin{equation}
\text{total number of possible outcomes} = 5
\end{equation}$$
The total number of possible outcomes above is given by the number of elements in the reduced sample space $\Omega = \{2, 3, 4, 5, 6\}$ — there are five elements.

The number of elements in a set is called the cardinal of the set. Ω is a set, and the cardinal of Ω = {2,3,4,5,6} is $\begin{equation}
\text{cardinal}(\Omega) = 5
\end{equation}$ In set notation, cardinal(Ω) is abbreviated as card(Ω), so we have: 
$$\begin{aligned}
\text{total number of possible outcomes} &= \text{card(B)} \\
&= 5
\end{aligned}$$

So far, we've developed half of our formula:
$$\begin{equation}
P(A|B) = \frac{\text{number of successful outcomes}}{\text{card(B)}}
\end{equation}$$

There are three odd numbers on a regular six-sided die (1, 3, and 5), but we know for sure we got a number greater than 1, so the only possible odd numbers we can get are 3 and 5. This means that the number of possible successful outcomes is two.

The only possible odd numbers we can get are only 3 and 5, and the number of possible successful outcomes is also given by the cardinal of the set {3, 5}. Note that the set {3, 5} is the result of the intersection between set A and set B:

![image.png](attachment:image.png)

So the number of successful outcomes is given by the cardinal of the intersection between set A and B:
$$\begin{aligned}
\text{total number of possible outcomes} &= \text{card(A} \cap \text{B)} \\
&= \text{card({3,5})} \\
&= 2
\end{aligned}$$

We can plug that into our formula and calculate P(A|B):
$$\begin{aligned}
P(A\ |\ B) &= \frac{\text{card(A} \cap \text{B)}}{\text{card(B)}} \\
&= \frac{2}{5}
\end{aligned}$$

We now have a formula for conditional probability, defined purely in terms of A and B, where A and B can be any events (not just events related to a die roll).

**Exercise**

Two fair six-sided dice are simultaneously rolled, and the two numbers they show are added together. The diagram below shows all the possible results that we can get from adding the two numbers together.

![image.png](attachment:image.png)

Find P(A|B), where A is the event where the sum is an even number, and B is the event that the sum is less than eight.

- Find card(B). Assign answer to `card_b`.
- Note that for calculating cardinal, we'll have to treat identical sums differently if they come from different die numbers. On the diagram above, we see that we have three sums of 4, but they all come from different die outcomes: (3, 1), (2,2), and (1, 3), where the first number describes the outcome of the first die throw, and the second number the outcome of the second die throw.
- Find card(A ∩ B). Assign answer to `card_a_and_b`.
- Calculate P(A|B). Assign answer to `p_a_given_b`.

In [5]:
card_b = 21
card_a_and_b = 9
p_a_given_b = 9/21

**Example**

A team of biologists wants to measure the efficiency of a new HIV test they developed (HIV is a virus that causes AIDS, a disease which affects the immune system). They used the new method to test 53 people, and the results are summarized in the table below:

![image.png](attachment:image.png)

By reading the table above, we can see that:

- 23 people are infected with HIV.
- 30 people are not infected with HIV (HIVC means not infected with HIV — the superscript "C" indicates a set complement).
- 45 people tested positive for HIV .
- 8 people tested negative for HIV.
- Out of the 23 infected people, 21 tested positive (correct diagnosis).
- Out of the 30 not-infected people, 24 tested positive (wrong diagnosis).


The team now intends to use these results to calculate probabilities for new patients and figure out whether the test is reliable enough to use in hospitals. They want to know:

- What is the probability of testing positive, given that a patient is infected with HIV?
- What is the probability of testing negative, given that a patient is not infected with HIV?

P(T+ | HIV) is the probability of testing positive, given that the patient is infected with HIV, and, according to the formula, we have:

$$\begin{equation}
P(T^+|HIV) = \frac{card(T^+ \cap HIV)}{card(HIV)}
\end{equation}$$

There are 23 people infected with HIV, which means card(HIV) = 23, out of these 21 are tested positive meaning $card(T^+ \cap HIV) = 21$ = 21. 

![image.png](attachment:image.png)

This means that P(T+ | HIV) is:
$$\begin{equation}
P(T^+|HIV) = \frac{21}{23} = 0.9130
\end{equation}$$

The probability of testing positive, given that the patient is infected with HIV, is therefore 91.30% — this may suggest that the new test is fairly good at detecting the HIV virus when the virus is actually present. However, at a probability of 91.31%, we can expect that for every 10,000 patients infected with HIV, about 9,131 patients will get a correct diagnosis, while the other 869 will not. The team should probably conclude that the test needs more refinement with respect to detecting the virus.

**Exercise**

Use the data in the table above and :

- Calculate P(T- | HIVC). Assign answer to `p_negative_given_non_hiv`.
- Print `p_negative_given_non_hiv`.
- Interpret the result — does the value of P(T- | HIVC) suggest that the test needs more work? Or does it look like the test is reliable enough to use in hospitals?

Ans. 
card(non_hiv) = 30
card(T- given non_hiv) = 6


In [7]:
p_negative_given_non_hiv = 6/30
print(p_negative_given_non_hiv)
# The team needs to do more work

0.2


### Probability Formula (in terms of probabilities)

Above, we used our formula and saw that P(T+ | HIV) — the probability of testing positive, given that the patient is infected with HIV — is:

$$\begin{equation}
P(T^+ | HIV) = \frac{card(T^+ \cap HIV)}{card(HIV)} = \frac{21}{23}
\end{equation}$$

Note, however, that we'd arrive at the same result if we used probabilities instead of cardinals. Using the table above, we see that:

$P(T^+ \cap HIV) = \frac{21}{53}$ and $P(HIV) = \frac{23}{53}$ This means that:

$$\begin{equation}
P(T^+ | HIV) = \frac{P(T^+ \cap HIV)}{P(HIV)} = \frac{\frac{21}{53}}{\frac{23}{53}} = \frac{21}{23}
\end{equation}$$

This allows us to define a formula for conditional probability purely in terms of probabilities instead of set cardinals. Thus, for any two events A and B, P(A|B) is:

$$\begin{equation}
P(A | B) = \frac{\text{P(A} \cap \text{B)}}{\text{P(B)}}
\end{equation}$$

This formula is useful when we only know probabilities. For instance, let's say a **different test** is used to diagnose a patient. The patient tests positive for HIV, and we want to find P(HIV | T+) — the probability that the patient actually has HIV, given that the test was positive.

This time, however, all we know is P(T+) = 0.12 and $P(HIV \cap T^+) = 0.000015# We can no longer find cardinals, but using the formula above, we have:

$$\begin{equation}
P(HIV | T^+) = \frac{P(HIV \cap T^+)}{P(T^+)} = \frac{0.000015}{0.12} = 0.000125
\end{equation}$$

To understand the mathematical reason for why the above formula works, let's start by considering the following mathematical statements:

$$\begin{equation}
\tag{1} P(A \cap B) = \frac{\text{number of successful outcomes}}{\text{total number of possible outcomes}} = \frac{card(A \cap B)}{card(\Omega)}
\end{equation}$$

$$\begin{equation}
\tag{2} P(B) = \frac{\text{number of successful outcomes}}{\text{total number of possible outcomes}} = \frac{card(B)}{card(\Omega)}
\end{equation}$$

Given statements (1) and (2) above, we have:

$$\begin{equation}
P(A|B) = \frac{\frac{card(A \cap B)}{card(\Omega)}}{\frac{card(B)}{card(\Omega)}} = \frac{P(A \cap B)}{P(B)}
\end{equation}$$

Both card(Ω) cancel out, so we're left with:

$$\begin{equation}
P(A|B) = \frac{card(A \cap B)}{card(B)} = \frac{P(A \cap B)}{P(B)}
\end{equation}$$

**Exercise**

A company offering a browser-based task manager tool intends to do some targeted advertising based on people's browsers. The data they collected about their users is described in the table below:

![image.png](attachment:image.png)

Find:

- P(Premium | Chrome) — the probability that a randomly chosen user has a premium subscription, provided their browser is Chrome. Assign answer to `p_premium_given_chrome`.
- P(Basic | Safari) — the probability that a randomly chosen user has a basic subscription, provided their browser is Safari. - Assign answer to `p_basic_given_safari`.
- P(Free | Firefox)} — the probability that a randomly chosen user has a free subscription, provided their browser is Firefox. Assign answer to `p_free_given_firefox`.
- Between a Chrome user and a Safari user, who is more likely to have a premium subscription? If a Chrome user is the answer, then assign the string 'Chrome' to a variable named `more_likely_premium`, otherwise assign 'Safari'. To solve this exercise, we'll also need to calculate P(Premium | Safari).

In [9]:
p_premium_given_chrome = (158/2762) 

p_basic_given_safari = (274/ 1288) 

p_free_given_firefox = (2103 / 2285)

p_premium_given_safari = (120/1288) 

if p_premium_given_chrome > p_premium_given_safari:
    more_likely_premium = 'Chrome'
else:
    more_likely_premium = 'Safari'
    
print(more_likely_premium)    

Safari
