# Formally Defining Conditional Probability

In {doc}`Conditional Probability: Notation and Intuition<notation-and-intuition>`, we introduced the idea of a *conditional sample space*. Suppose that we are interested in the conditional probability of $A$ given that event $B$ occurred. A Venn diagram for a general set of events $A$ and $B$ is shown in {numref}`cond-prob-two-events`

:::{figure-md} cond-prob-two-events
<img src="venn-diagram-two-events.svg" alt="Venn diagram for two generic events, $A$ and $B$." width="400px">

Venn diagram of two generic events $A$ and $B$, where the event $B$ is shaded to indicate that it is known to have occurred.
:::


If the original sample space is $S$, then the set of outcomes that can have occurred given that $B$ occurred can be thought of as a “conditional sample space”,

$$
S_{|B} = S \cap B = B.
$$

```{note}
The reason that I have put “conditional sample space” in quotation marks is that although this concept is useful to understand where the formula for calculating conditional probabilities comes from, it is also misleading in that we are not restricted to only calculating conditional probabilities for events that lie within $S_{|B}$. In fact, our assumption in drawing the Venn diagrams is that $A$ is an event that is not wholly contained in $B$. Further below, we show that conditioning on an event $B$ induces a new *conditional probability measure* on the **original sample space and event class**.
```



Now, given that $B$ occurred, the only possible outcomes in $A$ that could have occurred are those in $A_{|B} = A \cap B$.  Then the “conditional sample space” $S_{|B}$ and a corresponding  conditional event 
$A_{|B}$ are shown in {numref}`cond-sample-space`. 

:::{figure-md} cond-sample-space
<img src="conditional-sample-space.svg" alt="Induced conditional sample space $S_{|B}$ and conditional event $A_{|B}$ from conditioning on event $B$."  width="400px">

Venn diagram of two generic events $A$ and $B$, where the event $B$ is shaded to indicate that it is known to have occurred.
:::



Based on {numref}`cond-sample-space`, we make the following observations under the condition that $B$ is known to have occurred:
* If $A$ and $B$ are mutually exclusive, then there are no outcomes of $A$ contained in the event $B$.  Thus, if $B$ occurs, $A$ cannot have occurred, and thus $P(A|B)=0$ in this case.
* If $B \subset A$, then the intersection region $A \cap B = B$.  In other words, every outcome in $B$ is an outcome in $A$. If $B$ occurred, then $A$ must have occurred, so $P(A|B)=1$ in this case.
* Under the condition that $B$ occurred, only the outcomes in $A$ that are also in $B$ are possible. Thus, $P(A|B)$ should be proportional to $P(A \cap B)$ (i.e., the smaller region in {numref}`cond-sample-space`.

These observations lead to the following definition of conditional probability:

````{card}
DEFINITION
^^^
conditional probability
: The conditional probability of an event $A$ given that an event $B$ occurred, where $P(B) \ne 0$, is 

$$
P(A|B) = \frac{P \left( A \cap B \right)}{P\left(B\right)}
$$
````

Now suppose we have a probability space $S, \mathcal{F}, P\left( \right)$ and an event $B$ with $P(B) \ne 0$.  Then we define a new probability space $S, \mathcal{F},  
P\left( ~ \left \vert B \right. \right)$, where $P\left( ~ \left \vert B \right. \right)$ is the conditional probability measure given that $B$ occurred. To be more precise, we define $P\left( ~ \left \vert B \right. \right)$ on the event class $A$ using the original probability measure $P()$ as follows:
* for each $A \in \mathcal{F}$, 
$$
P(A|B) = \frac{P \left( A \cap B \right)}{P\left(B\right)}.
$$


To claim that the triple $S, \mathcal{F}, P\left( ~ \left \vert B \right. \right)$ defined as above is a probability space, we need to verify that the conditional probability measure $P\left( ~ \left \vert B \right. \right)$  **satisfies the axioms** in this probability space:

**1.** Axiom 1 is that the probabilities are non-negative. Let's check:
\begin{eqnarray*}
      P(A|B)=\frac{P(A \cap B)}{P(B)}.
\end{eqnarray*}
Note that we are already given that $P(B)>0$, and $P() \ge 0$ for all events in $\mathcal{F}$. Since $\mathcal{F}$ is a $\sigma$-algebra, $A \cap B \in \mathcal{F}$ and so $P(A \cap B) \ge 0$.  Thus, $P(A|B)$ is a non-negative quantity divided by a positive quantity, and so $P(A|B) \ge 0$.


**2.** Axiom 2 is that the probability of $S$ (the sample space) is 1. Let's check:

\begin{eqnarray*}
      P(S|B)= \frac{P(S \cap B)}{P(B)}  = \frac{P(B)}{P(B)}
       =1
\end{eqnarray*} 

**3.** Axiom 3 says that if $A$ and $C$ are mutually exclusive events in $\mathcal{F}$, then the probability of $A \cup C$ is the sum of the probability of $A$  and the probability of $C$. Let's check if this still  holds for our conditional probability measure:

  \begin{eqnarray*}
    P(A \cup C |B) &=&  \frac{ P\left[ \left(A \cup C \right) \cap B \right]}{P[B]}  \\
     &=& \frac{ P\left[ \left(A \cap B \right) \cup \left(C \cap B \right) \right]}{P[B]}. 
  \end{eqnarray*}
  Note that $A \cap C = \emptyset \Rightarrow (A\cap B) \cap (C \cap B) = (A\cap C) \cap  B =\emptyset$, so
  \begin{eqnarray*}
    P(A \cup C |B) &=& \frac{ P\left[ A \cap B \right]}{P\left[B\right]} 
    + \frac{P\left[ C \cap B  \right]}{P[B]}  \\
    &=& P(A|B) + P(C|B)
  \end{eqnarray*}





The important thing to notice here is that the new conditional probability measure $P(~|B)$ satisifies the axioms with the original sample space and event class -- we are not restricted to applying $P(~|B)$ to those events that lie within the smaller “conditional sample space”, $S_{|B}$

**Exercise**

Consider again the problem with five computers in a lab, with sample space denoted by

$$
S= \left\{AD, AN, BD_1, BD_2, BN\right\}
$$

and events
* $E_A$ is the event that the user's computer is from manufacturer $A$
* $E_B$ is the event that the user's computer is from manufacturer $B$
* $E_D$ is the event that the user's computer is defective.

Use the formula for conditional probability,

$$
P(A|B) = \frac{ P(A \cap B)}{B}
$$
to calculate the probabilities in the problems below. (*It is easier to solve these using intuition/counting, but I encourage you to practice using the formula in the definition, which we will need to use in more complicated scenarios soon.*) Submit your answers as a fraction or a decimal with at least two digits of precision.

## Terminology Review

Use the flashcards below to help you review the terminology introduced in this section.

In [1]:
from jupytercards import display_flashcards

github='https://raw.githubusercontent.com/jmshea/Foundations-of-Data-Science-with-Python/main/'
github+='06-conditional-prob/flashcards/'
display_flashcards(github+'definition.json')