## 0. Introduction

The purpose of this notebook is to explore probability, the mathematical language for quantifying uncertainty. We will go through chapter 1 from *All of Statistics* (Wasserman, 2004).

## 1. Sample Spaces and Events

The **sample space** $\Omega$ is the set of possible outcomes of an experiment. Points $\omega$ in $\Omega$ are called **sample outcomes**, **realizations**, or **elements**. Subsets of $\Omega$ are called **Events**.

For example, if we toss a coin twice then $\Omega = \{HH, HT, TH, TT\}$. The event that the first toss is heads is $A = \{HH, HT\}$. If we toss a coin forever, then the sample space is the infinite set 

$$ \Omega = \left\{ \omega = (\omega_1, \omega_2, \omega_3, \ldots,) : \omega_i \in \{H, T\} \right\}. $$

### 1.1 Summary of Terminology

- $\Omega$ sample space.
- $\omega$ outcome (point or element).
- $A$ event (subset of $\Omega$).
- $A^c$ complement of $A$ (not $A$).
- $A \cup B$ union ($A$ or $B$).
- $A \cap B$ or $AB$ intersection ($A$ and $B$).
- $A - B$ set difference ($\omega$ in $A$ but not in $B$).
- $A \subset B$ set inclusion.
- $\emptyset$ null event (always false).
- $\Omega$ true event (always true).

<center><img src="../figures/venn.png"/></center>

We say that $A_1, A_2, \ldots$ are **disjoint** or are **mutually exclusive** if $A_i \cap A_j = \emptyset$ whenever $i \neq j$. For example, $A_1 = [0, 1)$, $A_2 = [1, 2)$, $A_3 = [2, 3), \ldots$ are disjoint. A **partition** of $\Omega$ is a sequence of disjoint sets $A_1, A_2, \ldots$ such that $\bigcup_{i=1}^\infty A_i = \Omega$. Given an event $A$, define the **indicator function** of $A$ by

$$
I_A(\omega) = I(\omega \in A) = \begin{cases} 
1 & \text{if } \omega \in A \\
0 & \text{if } \omega \notin A. 
\end{cases}
$$

## 2. Probability

A function $\mathbb{P}$ that assigns a real number $\mathbb{P}(A)$ to each event $A$ is a probability distribution or a probability measure if it satisfies the following three axioms:

**Axiom 1**: $\mathbb{P}(A) \geq 0$ for every event $A$.

**Axiom 2**: $\mathbb{P}(\Omega) = 1$.

**Axiom 3**: If $A_1, A_2, \ldots$ are disjoint, then

$$ \mathbb{P}\left(\bigcup_{i=1}^{\infty} A_i\right) = \sum_{i=1}^{\infty} \mathbb{P}(A_i). $$

For any events $A$ and $B$,

$$ \mathbb{P}(A \cup B) = \mathbb{P}(A) + \mathbb{P}(B) - \mathbb{P}(A \cap B). $$


## 3. Probability on Finite Sample Spaces

Suppose that the sample space $\Omega = \{\omega_1, \ldots, \omega_n\}$ is finite. For example, if we toss a die twice, then $\Omega$ has 36 elements: $\Omega = \{(i, j) ; i, j \in \{1, \ldots, 6\}\}$. If each outcome is equally likely, then $\mathbb{P}(A) = \frac{|A|}{36}$ where $|A|$ denotes the number of elements in $A$. The probability that the sum of the dice is $11$ is $2 / 36$ since there are two outcomes that correspond to this event.

If $\Omega$ is finite and if each outcome is equally likely, then

$$ \mathbb{P}(A) = \frac{|A|}{|\Omega|}, $$

which is called the **uniform probability distribution**. 

To compute probabilities, we need to count the number of points in an event $A$. Methods for counting points are called combinatorial methods. Given $n$ objects, the number of ways of ordering these objects is $n! = n(n - 1)(n - 2) \cdots 3 \cdot 2 \cdot 1$. For convenience, we define $0! = 1$. We also define

$$ C(n, k) = {}^nC_k = {}_nC_k = \binom{n}{k} = \frac{n!}{k!(n - k)!}, $$

read "$n$ choose $k$", which is the number of distinct ways of choosing $k$ objects from $n$. For example, if we have a class of $20$ people and we want to select a committee of $3$ students, then there are

$$ \binom{20}{3} = \frac{20!}{3!(20 - 3)!} = 1140. $$

## 4. Independent Events

$A$ and $B$ are independent if and only if 

$$ \mathbb{P}(A \cap B) = \mathbb{P}(A) \, \mathbb{P}(B). $$

Independence can arise in two distinct ways. Sometimes, we explicitly assume that two events are independent. For example, in tossing a coin twice, we usually assume the tosses are independent which reflects the fact that the coin has no memory of the first toss. In other instances, we derive independence by verifying that $\mathbb{P}(A \cap B) = \mathbb{P}(A) \, \mathbb{P}(B)$ holds.

## 4. Conditional Probability

If $\mathbb{P}(B) > 0$ then the conditional probability of $A$ given $B$ is

$$ \mathbb{P}(A|B) = \frac{\mathbb{P}(A \cap B)}{\mathbb{P}(B)}. $$

Think of $\mathbb{P}(A|B)$ as the fraction of times $A$ occurs among those in which $B$ occurs. In general, $\mathbb{P}(A|B) \neq \mathbb{P}(B|A)$. 

$A$ and $B$ are independent if and only if 

$$ \mathbb{P}(A|B) = \mathbb{P}(A). $$

## 5. Bayes' Theorem

Bayes' theorem describes the probability of an event, based on prior knowledge of conditions that might be related to the event. It is defined by

$$ \mathbb{P}(A \mid B) = \frac{\mathbb{P}(B \mid A) \, \mathbb{P}(A)}{\mathbb{P}(B)} $$