# Probability theory is the mathematical modeling of real-world experiments that have unpredictable outcomes

# The mathematical model corresponding to any experiment consists of

### 1. the *outcome space* $\Omega$

### 2. the collection of *events* $F$

### 3. the *probability measure* $P$

# Combining the three ($\Omega, F, P$), we have a *probability space*

_____

# 1. The outcome space

- aka the *sample space*
- consists of all possible outcomes of the experiment

**Example**

- if our experiment is rolling a die, then the outomes are:

$\Omega = \{1,2,3,4,5,6\}$

- **Note**: a generic outcome is typically denoted $w$

_____

# 2. The collection of events

- a subset of the outcome space

**Example**

$\{2,4,6\}$ represents the even rolls of the die


- If $A \subset \Omega$ is an event and the result of our experiment is contained in $A$, we say that *$A$ has occurred*

- $F$ denotes all possible subsets of $\Omega$
    - Therefore $F$ must contain both $\Omega$ and $\emptyset$
    
- $A^{C}$ denotes the complement of event $A$
    - This means it denotes all possible events **not contained** in $A$
    - $A^{C}$ must be contained in $F$
    
- If $A$ and $B$ are both events, then $A \cup B$ denotes all events that are contained in **either** $A$ or $B$
    - For finite spaces $\Omega$, then we require that $A \cup B$ is contained in $F$
        - When $\Omega$ is not finite (i.e. the more general rule), we require the following:
        
$$
A = \bigcup_{i=1}^{\infty}A_{i} = A_{1} \cup A_{2} \cup A_{3} \cup... \in F
$$

- All these rules can be summarized by the following axioms for any probability space $(\Omega, F, P)$:

### AXIOMS

#### 1.1: 

$\Omega \in F$

#### 1.2:

$A \in F \implies A^{C} \in F$

#### 1.3:

$A_{1}, A_{2}, A_{3},...\in F \implies \bigcup_{i=1}^{\infty}A_{i} = A_{1} \cup A_{2} \cup A_{3} \cup... \subset F$

### Aside:

- DeMorgan's law states:

$$
\left ( A \cup B \right )^{C} = A^{C} \cap B^{C}
$$

- Let's break this down:
    - $A\cup B$ gives us the set of all events in **both** $A$ and $B$
        - Therefore, $\left ( A \cup B \right )^{C}$ is the set of events that **are not elements of $A$ nor $B$**
    - $A^{C}$ gives us the set of events that are not in $A$, and $B^{C}$ gives us the set of events that are not in $B$
        - Therefore, $A^{C}\cap B^{C}$ gives us the set of events that **are not elements of $A$ nor $B$
        
- So, we can see that the two are equal

____

# 3. The probability measure

- A probability measure $P$ assigns a number $P(A)$ to each event $A$
    - Intuitively, the measure tells us the likelihood of an event taking place
        - This means that it meaures the likelihood that the experiment's outcome, denoted $w$, lies inside of $A$

### AXIOMS (cont'd)

#### 1.4: 

$P(\Omega) = 1$

#### 1.5: 

$0 \leq P(A) \leq 1 \text{  } \forall A \in F$

#### 1.6

- If $A_{1}, A_{2}, ..., A_{n}$ are all disjoint events i.e. no outcome is contained in more than one set, then

$P\left (\bigcup_{i=1}^{n}A_{i} \right ) =  \sum_{i=1}^{n} P(A_{i})$

____

# 4. Consequences of the axioms

#### 1.7

##### i. 

$P(\emptyset)=0$

##### ii. 

$P(A\cup B) = P(A) + P(B) - P(A\cap B)$

##### iii. 

$A \cap B = \emptyset \implies P(A\cup B) = P(A) + P(B)$

##### iv.

$P(A^{C}) = 1 - P(A)$

##### v. 

- We define $A $\ $B = A \cap B^{C}$ i.e. all the elements in $A$ that are **not** elements of $B$. Then:

$ B \subset A \implies P(A$ \ $B) = P(A) - P(B)$

- We can use statement iii. to show that for disjoint $A_{1},A_{2},...,A_{n}$:

$P(A_{1} \cup A_{2} \cup ... \cup A_{n}) = P(A_{1})+P(A_{2})+...+P(A_{n})$

- This property is called *finite additivity*

____

# 5. Partitions

**Partition of $\Omega$**

- If we have events $B_{1}, B_{2},...$ such that:
    1. No outcome is contained in more than one $B_{i}$
    2. Every outcome in $\Omega$ is contained in some $B_{j}$
    
- then we say that the events *partition $\Omega$*

## Theorem 1.8: *Partition Theorem*

- Suppose $B_{1}$, $B_{2}$, $B_{3}$,... partition $\Omega$. Then, for any event $A \in F$:

$$
P(A) = \sum_{i=1}^{\infty}P(A\cap B_{i})
$$

#### Proof:

$$
A = A \cap \Omega = A \cap \left ( \bigcup_{i=1}^{\infty}B_{i} \right ) = \bigcup_{i=1}^{\infty} \left ( A \cap B_{i} \right ) \implies P(A) = \sum_{i=1}^{\infty}P(A\cap B_{i})
$$

____

# 6. Conditional Probability

- Suppose we have events $A$ and $B$, and $B$ has some non-zero probability i.e. $P(B)>0$. Then, the **conditional probability of $A$ given $B$** is:

$$
P(A|B) = \frac{P(A\cap B)}{P(B)}
$$

- **Note**: $A|B$ is not an event. The $|B$ part simply indicates that $B$ has already occurred

- If we think about it, since $A|B$ means that we want to know the likelihood of $A$ after $B$ has already occurred, we arrive at the following theorem:

## Theorem 1.9 - *Partition Theorem with Conditional Probability*

- If $B_{1}, B_{2}, B_{3},...$ is a partition of $\Omega$ with $P(B_{i})>0$ then

$$
P(A) = \sum_{i=1}^{\infty} \left ( P(A|B_{i})\cdot P(B_{i}) \right )
$$

- We say that a sequence of events $A_{1} \subset A_{2} \subset ...$ is ***increasing* towards $A$** if $\bigcup_{i=1}^{\infty}A_{i} = A$

- We say that a sequence of events $B_{1} \supset B_{2} \supset ...$ is ***decreasing* towards $B$** if $\bigcap_{i=1}^{\infty}B_{i} = B$

- We denote these two as $A_{n} \uparrow A$ and $B_{n} \downarrow B$ as $n\rightarrow \infty$

## Theorem 1.10 - *Continuity of Probability Measures*

- If $A_{n} \uparrow A$ as $n\rightarrow \infty$ then $P(A_{n}) \rightarrow P(A)$
    - Similarly, if $B_{n} \downarrow B$ as $n\rightarrow \infty$ then $P(B_{n})\rightarrow P(B)$

#### Proof:

- If each $A_{i}$ is contained in each $A_{i+1}$, then we can think of $A_{i+1}$ as the union of the elements in $A_{i}$ and all the extra new elements
    - i.e. $A_{i+1} = A_{i} \cup (A_{i+1} $ \ $ A_{i} )$

- But we can do the same thing for $A_{i+2}$: $A_{i+2} = A_{i+1} \cup (A_{i+2} $ \ $ A_{i+1} ) = A_{i} \cup (A_{i+1} $ \ $ A_{i} )\cup (A_{i+2} $ \ $ A_{i+1} )$

- We can also continue this process until we get to $A_{0}$, which we define as $A_{0} = \emptyset$
    - This implies $A_{1} = \emptyset \cup (A_{1}$ \ $\emptyset) = A_{1}$
    
- From this, we can express $A_{n}$ as:

$$
A_{n} = \bigcup_{i=1}^{n} (A_{i} \setminus A_{i-1} )
$$

- Since with each iteration we add on the elements that were **never previously included in any subset (i.e. event)**, we know that each $(A_{i} \setminus A_{i-1} )$ is **disjoint**

- **Recall**: Axiom 1.7.v tells us:

$$
B \subset A \implies P(A \setminus B) = P(A) - P(B)
$$

- And Axiom 1.6 tells us:

$$
P\left (\bigcup_{i=1}^{n}A_{i} \right ) =  \sum_{i=1}^{n} P(A_{i})
$$

- Putting it all together, we get:

$$
P \left ( \bigcup_{i=1}^{n} (A_{i} \setminus A_{i-1} ) \right ) = \sum_{i=1}^{n} P(A_{i} \setminus A_{i-1} ) = \sum_{i=1}^{n} \left (P(A_{i})- P(A_{i-1}) \right )
$$

- Then, as $n\rightarrow \infty$:

$$
P(A) = P \left ( \bigcup_{i=1}^{\infty} (A_{i} \setminus A_{i-1} ) \right ) = \sum_{i=1}^{\infty} \left (P(A_{i})- P(A_{i-1}) \right ) = \lim_{n\rightarrow \infty} \left [(P(A_{1})- P(A_{0}))+(P(A_{2})- P(A_{1})) + ... + (P(A_{n-1})-P(A_{n-2}) )+ (P(A_{n})-P(A_{n-1}) )\right ]
= \lim_{n\rightarrow \infty} \left [ -P(A_{0}) + P(A_{1} - P(A_{1}) +P(A_{2}) - P(A_{2}) + ... + P(A_{n-1}) - P(A_{n-1}) + P(A_{n}) \right ]
$$
$$
= \lim_{n\rightarrow \infty} P(A_{n}) - P(A_{0}) = \lim_{n\rightarrow \infty} P(A_{n})
$$

## Theorem 1.11 - *Sub-Additivity of Probability Measures*

- If $A_{1}, A_{2}, ...$ is **any** sequence of events, then:

$$
P\left ( \bigcup_{i=1}^{n}A_{i} \right ) \leq \sum_{i=1}^{\infty}P(A_{n})
$$

_____

# 7. Independence

- Events $A$ and $B$ are **independent** if $P(A\cap B) = P(A)\cdot P(B)$
    - If $P(A)\cdot P(B) > 0$, then this is equivalent to saying $P(A|B) = P(A)$ 

- This simply means that whether $B$ occurs or not, it doesn't have an impact of whether $A$ occurs
    - E.g. if $A$ is going to the bathroom in the morning on Tuesday, and $B$ is having nice weather on Monday (i.e. the day before), the probability of $A$ isn't at all affected by $B$

- Three events $A$, $B$, and $C$ are independent if the following conditions are satisfied:
    1. $P(A\cap B) = P(A)\cdot P(B)$
    2. $P(A\cap C) = P(A)\cdot P(C)$
    3. $P(B\cap C) = P(B)\cdot P(C)$
        - **Note**: if the three conditions above are satisfied, events $A$, $B$, and $C$ are said to be **pairwise independent**
    4. $P(A\cap B \cap C) = P(A)\cdot P(B) \cdot P(C)$

- Pairwise independence is a weaker condition than the independence of the three events
    - *Why?*
        - Let's consider the following example:
        
#### Example

$$
\Omega = \left \{ 1,2,3,4 \right \} \\ A = \{ 1, 2 \} \\ B = \{1, 3 \} \\ C = \{1, 4 \}\\
\implies A \cap B = \{1\}, A\cap C = \{1\}, B\cap C = \{1\}
$$

- Here, $P(A\cap B)$ measures the probability that the outcome is 1 $\implies P(A\cap B) = 1/4$

- Since $P(A)$ measures the probability that the outcome is 1 or 2, $P(A) = 1/2$
    - Similarly, $P(B) = 1/2$
        - Then, $P(A\cap B) = 1/4 = 1/2 \cdot 1/2 = P(A)\cdot P(B)$

- This is also true when we look at $B$ vs $C$, and $A$ vs $C$
    - So $P(A\cap C) = P(A)\cdot P(C)$ and $P(B \cap C) = P(B)\cdot P(C) \implies$ $A$, $B$, and $C$ are pairwise independent

- But $A\cap B \cap C = \{1\} \implies P(A\cap B\cap C) = 1/4 > 1/8 = P(A)\cdot P(B)\cdot P(C)$
    - Therefore $A$, $B$, and $C$ are not independent events

- This makes sense when we think about it since if we know that $A$ and $B$ have both occurred, the only possibility is that the outcome was 1 therefore $C$ necessarily also occurred

- More generally, if we have a series of events $A_{1}$, $A_{2}$, ...then the series is **independent** if, for any subset $S \subset \{1,2,3,...\}$ then:

$$
P(A_{i}\cap A_{j}\cap...\cap A_{k}) = P(A_{i})\cdot P(A_{j})\cdot ...\cdot P(A_{k})
$$

- for any $i,j,...,k$ in $S$

____

# 8. Discrete probability spaces

- A set $A$ is **finite** if its elements can be represented in a *terminating* list

$$
A = \{ a_{1}, a_{2}, ..., a_{n} \}
$$

- A set $A$ is **countably infinite** if its elements can be represented in a *non-terminating list*

$$
A = \{ a_{1}, a_{2}, ...\}
$$