# Conditional probability

Events can be independent, i.e., one event doesn't have any influence on other events. But the world is complex and we do have situations where the occurrence of one event affects the probability of another event occurring. That's the idea behind conditional probability. What's the probability of event A happening given that event B has happened?

## Sample space review

We know that in classical probability in the case of independent events the sample space is collectively exhausted, i. e., for a given experiment the sample space contains all the possible outcomes. When we think about conditional probability where events are dependent on others, the sample space is still collectively exhausted, but it goes through a review since the set for all possible outcomes for the dependent event can become smaller. 

As always let's visualize this.

374 people at a park were asked what was their favorite music style.

| Favorite style | Female | Male | **Total** |
|----------------|--------|------|-----------|
| A: Rock        | 30     | 60   | 92        |
| B: Country     | 45     | 27   | 72        |
| C: Pop         | 64     | 39   | 103       |
| D: Classical   | 65     | 42   | 107       |
| **Total**      | 204    | 170  | 374       |


$$ P(A) = \frac{92}{374} = 0.2459 $$

$$ P(F) = \frac{204}{374} = 0.5454 $$

But here is an interesting thing, if you use the multiplicative law to check the probability of being a rock fan and a female you get:

$$ P(A \cap F) = P(A) P(F) = 0.2459 \times 0.5454 = 0.1341  $$

Which is not the probability we got from the table above. This means both events are not independent. Why they are not independent? Well, because to be a female rock fan one event depend on the other.

The language to express conditional probability is:

$$ P(A|F) $$

which means: given that the person is female what is the probability that the person is also a rock fan?

$$ P(A|F) = \frac{30}{204} = 0.1470 $$

As you can see the sample space is revised. The conditioning event (being a female) changes the sample space (374 people) to 204 (total female), and from that 204 female, we know that 30 are rock fans. Now what we do is take the relative frequency and calculate the conditional probability.

One other way to think about it is:

$$ P(A|F) = \frac{\frac{30}{374}}{\frac{204}{374}} = 0.1470 = \frac{P(A \cap F)}{P(F)} $$

When we know that the conditioning event has occurred every outcome outside of F is discarded, that's why the sample space is reduced to the conditioning event, it's a revision of the original sample space. The only way the conditioned event can happen is when it intersects the conditioning event since that is our sample space.

I like to think of conditional probability as a bowl full of balls. When you have a conditioning event you are not more interested in the whole bowl, but just a portion of those balls that met the condition and from that selection you go even further to select only the balls that met the event you are interested in. It's like filtering until you get what you want.



<p align="center">
  <img src="./imgs/conditional_venn.png" alt="Conditional probability"/>
</p>

It's like: given that B happened, what portion of B intersects with the event I am interested in?

## Probability trees

Probability trees are a good way to visualize this filtering process until you find the probability of the conditioned event. You keep filtering until you reach the last branch, i. e., the conditioned event you are interested in and you just multiply all the probabilities you found along the way because you are interested in an event where all the previous probabilities of the previous branches will happen.

<p align="center">
  <img src="./imgs/cond_tree.png" alt="Probability tree"/>
</p>


## Bayes' Theorem

> In probability theory and statistics, Bayes' theorem (alternatively Bayes' law or Bayes' rule), named after Thomas Bayes, describes the probability of an event, based on prior knowledge of conditions that might be related to the event.[1] For example, if the risk of developing health problems is known to increase with age, Bayes' theorem allows the risk to an individual of a known age to be assessed more accurately (by conditioning it on their age) than simply assuming that the individual is typical of the population as a whole. [Wikipedia](https://en.wikipedia.org/wiki/Bayes%27_theorem)

The probability of one event happening, given a second event, is equal to the probability of both happening divided by the probability of the conditioning event.


#### The simplest form

## $$ P(A|B) = \frac{P(A \cap B)}{P(B)} $$

## $$ P(B|A) = \frac{P(A \cap B)}{P(A)} $$

Multiply both sides by the conditioning event:

## $$ P(A|B)P(B) = P(A \cap B) $$

## $$ P(B|A)P(A) = P(A \cap B) $$


## $$  P(A \cap B) = P(B|A)P(A) = P(A|B)P(B)  $$

which rearranging yields:

## $$ P(A|B) = \frac{P(B|A)P(A)}{P(B)} $$





## Total probability formula

<p align="center">
  <img src="./imgs/total_prob.png" alt="Conditional probability"/>
</p>

The total probability rule (also called the Law of Total Probability) breaks up probability calculations into distinct parts. It's used to find the probability of an event, A,  when you don't know enough about A's probabilities to calculate it directly. This is useful because even though we might not know anything about the probability of event A, we may be able to calculate the probabilities of other events, which added together yield the total probability of A. This is with the use of conditional probability. Without the law of total probability, we may not be able to directly calculate any probabilities about which we werenâ€™t given information directly.

## $$ P(A) = P(A\,|\,B)\,P(B) + P(A\,|\,B^c)\,P(B^c) $$

The probability of event A is equal to its conditional probability on a second event times the probability of the second event, plus its probability conditional on the second event not occurring times the probability of that non-occurrence.

This is what you can see in the diagram above. We might not know what's the probability of A, but we might add all the partitions (mutually exclusive and collectively exhaustive of A) that when summed up will yield all the probability of A happening.

In the diagram above there are 4 partitions, but we can generalize the formula to account for any number of partitions:

#### $$ \begin{aligned} P(A) & = \sum_{i=1}^n P(A\,|\,B_i)\,P(B_i)\\[1ex] & = P(A\,|\,B_1)\,P(B_1) + P(A\,|\,B_2)\,P(B_2) + \cdots + P(A\,|\,B_n)\,P(B_n) \end{aligned} $$

Now we can recall the Bayes'formula to build up the knowledge to include the total probability formula in the calculation:

### $$ P(A\,|\,B) = \frac{P(B\,|\,A)\,P(A)}{P(B\,|\,A)\,P(A) + P(B\,|\,A^c)\,P(A^c)} $$

If we generalize for any number of partitions we have:

## $$ P(A_k\,|\,B) = \frac{P(B\,|\,A_k)\,P(A_k)}{\sum\limits_{i=1}^n P(B\,|\,A_i)\,P(A_i)} $$


## Examples

1. 

$$ P(A) = 0.2 $$
$$ P(B) = 0.25 $$
$$ P(C) = 0.6 $$
$$ P(A|B) = 0 $$
$$ P(B|C) = 0.25 $$
$$ P(C|A) = 0.2 $$

**(a) Show that A and B are mutually exclusive**

Events are mutually exclusive when they can't happen at the same time. $ P(A|B) = 0 $. This means that if B occurred A can't happen and vice-versa

### $$ 0 = P(A\,|\,B) = \frac{P(A \cap B)}{P(B)} = \frac{P(A \cap B)}{0.25} $$

**(b) Show that C does not imply A**

By C implies A we mean that when C happens A necessarily will happen, i. e., $ P(A|C) = 1 $.

### $$ P(A\,|\,C) = \frac{P(C\,|\,A)\,P(A)}{P(C)} = \frac{0.2 \times 0.2}{0.6} = 0.067 \not= 1 $$

Therefore C does not imply A.

**(c) Probability $ P(B \cup C) $**

$ P(B \cup C) = P(B) + P(C) - P(B\cap C)  $

$ P(B \cup C) = 0.25 + 0.6 - P(B\cap C)  $

Following the Baye's theorem and rearraging it a bit:

$ P(B\cap C) = P(B|C)P(C) = 0.25 * 0.6 = 0.15  $

$ P(B \cup C) = 0.25 + 0.6 - 0.15 = 0.7   $

**(d) Probability $ P(B|A) $**

$ P(B|A) = \frac{P(A \cap B)}{P(A)} $

$ P(A\cap B) = P(A|B)P(B) = 0 * 0.25 = 0  $

$ P(B|A) = \frac{0}{0.2} = 0 $

**(e) Probability $ P(B^c|C) $**

$P(B^c|C) = 1 - P(B|C) = 1 - 0.25 = 0.75$

-------------

The company ABC is deciding whether to submit a bid to build a new shopping centre. In the past, ABC's main competitor, DEF, has submitted bids 70% of the time. If BBC does not bid for a job, the probability ABC will get the job is 50%. If BBC bids on a job, the probability that ABC will get the job reduces to 25% due to the competition.

A = ABC gets the job

B = DEF submits the bid

$$ P(B) = 0.7 $$

$$ P(B^c) = 0.3 $$

$$ P(A|B^c) = 0.5 $$

$$ P(A|B) = 0.25 $$


**If ABC gets the job, what is the probability that DEF did not bid?**

$ P(B^c|A) $

First information we will need to calculate is P(A) and we can use the total probability formula to sum all the possible scenarios when ABC gets the job, i. e., when DEF bids and when DEF does not bid.

$ P(A) = P(A|B)P(B) + P(A|B^c)P(B^c) = 0.25 * 0.7 + 0.5 * 0.3 = 0.175 + 0.15 = 0.325 $

$ P(B^c\,|\,A) = \frac{P(A\,|\,B^c)\,P(B^c)}{P(A\,|\,B)\,P(B) + P(A\,|\,B^c)\,P(B^c)} = \frac{0.5 \times 0.3}{0.25 \times 0.7 + 0.5 \times 0.3} = 0.4615 $








## More resources

I created a deck summarising what I cover in this Notebook:

https://pitch.com/public/dc3f5807-6f0f-433f-8e7a-79359ff37171

<iframe src="https://pitch.com/embed/dc3f5807-6f0f-433f-8e7a-79359ff37171" allow="fullscreen" allowfullscreen="" width="560" height="368" style="border:0"></iframe>