# Lecture 1: Probability Models and Axioms

A probabilistic model is a quantitative description of a situation, a phenomenon, or an experiment.

A probabilistic model involves two steps.

- Sample Space: We need to describe the possible outcomes of an experiment.

- Probability Laws: We need to specify how to assign probabilities to the outcomes or the collection of outcomes.
 - Axioms: Probabilities need to satisfy basic properties to be meaningful.
   - e.g. Probabilities cannot be negative.
 - Properties that follow from the axioms

Two types of probabilistic outcomes:
 - Discrete
 - Continuous

## Sample Space

What ever an experiment is, like
 - flipping a coin
 - flipping a coin for 5 times
 - rolling a dice
 
...There will be a set of possible outcomes of an experiment.

We denote the set of possible outcomes of a experiment as $\Omega$.

The elements of a sample space should be
 - Mutually exclusive
  - At the end of an experiment, there can only be one of the outcomes that has happened.
 - Collectively exhaustive
  - Together, all the elements of the sample space exhaust all the possibilities.
 - At the "right" granularity
  - The right granularity will include the sufficient but only relevant information in the model.
  - For example, for flipping a coin, the weather of the location of the experiment will be irrelevant information
    - Hence $\Omega = \{H, T\}$ will be a better sample space compare to $\Omega' = \{H~and~rain, H~and~no~rain, T~and~rain, T~and~no~rain\}$, although the elements in latter sample space are also mutually exclusive and collective exhaustive.

### Sample Space Examples

Samples space are sets. And a sample space can be finite, infinite, discrete, continuous, and so on.

#### Example: 2 rolls of a tetrahedral die

One possible representation of the sample space is the following

![](assets/sample_space_tetrahedral_die_xy.png)

...and the order of the dice roll matters. E.g. $(2, 3)$ is a different outcome to $(3, 2)$.

This is case of models that the probabilistic experiment can be described in phases or stages.

It is useful to describe the such an experiment as a sequential description in terms of a tree.

![](assets/sample_space_tetrahedral_die_tree.png)

In both descriptions, we have $16$ possible outcomes.

### Example: Throwing a dart towards a target as a unit square

And outcome will the dart hits point $(x, y)$ on the target such that $0 \leq x, y \leq 1$, while $x$ and $y$ are real numbers.

![](assets/sample_space_dart.png)

## Probability Axioms

### Definition: Event

Recall the dart example above. What is the probability of the dart hits on an exact point $(x, y)$ for any particular $x$ and $y$?

The probability of such an outcome would be essentially $0$. And it is natural that in a continuous model that any individual point should have a $0$ probability.

In this case instead of assigning probabilities to individual points, we will assign probabilities to a **subset** of the sample space.

A subset of the sample space is called an **event**.

The probability of an event $A$ is denoted as $P(A)$.

Why is this called an event? Because at the end of the experiment, the outcome of the experiment is either in the subset $A$ (then we would say event $A$ has occurred), or is outside $A$ (then we would say event $A$ did not occurred).

![](assets/continous_event.png)

By convention, probabilities are always given between $0$ and $1$.

Intuitively, $0$ probability means something practically cannot happen. And $1$ probability means practically the event of interest is going to happen.

### Axioms

The rules all probabilities should satisfy are call the Axioms of Probability.

1. Non-negativity: $P(A) \geq 0$ (a)
2. Normalization: $P(\Omega) = 1$ (b)
3. (Finite) Additivity (to be strengthen later): If $A \cap B = \emptyset$, then $P(A \cup B) = P(A) + P(B)$ (c)

![](assets/additivity_axiom.png)

### Simple Properties of Probabilities

We know that $A \cup A^C = \Omega$ (d), and $A \cap A^C = \emptyset$ (e).

We can derive some properties of probabilities from the Axioms of Probability.

#### 1. $P(A) + P(A^C) = 1$

##### Proof

$
\begin{equation}
\begin{aligned}
1 &= P(\Omega) & \text{(from (b))}\\
&= P(A \cup A^C) & \text{(from (d))} \\
&= P(A) + P(A^C) & \text{(from (c))}
\end{aligned}
\end{equation}
$


#### 2. $P(A) \leq 1$

##### Proof

We know $P(A) + P(A^C) = 1$ from (1), and $P(A^C) \geq 0$ (from (a)). Then we have

$P(A) = 1 - P(A^C) \leq 1$


#### 3. $P(\emptyset) = 0$

##### Proof

We know that $\Omega^C = \emptyset$.

Hence $P(\Omega) + P(\emptyset) = P(\Omega) + P(\Omega^C) = 1$ (from 1).

Hence

$
\begin{equation}
\begin{aligned}
P(\emptyset) &= 1 - P(\Omega) \\
&= 1 - 1 & \text{(from b)}\\
&= 0
\end{aligned}
\end{equation}
$


#### 4. $P(A \cup B \cup C) = P(A) + P(B) + P(C)$, given $A, B, and~C$ are disjoint events. And similarly for $k$ disjoint events.

##### Proof

$
\begin{equation}
\begin{aligned}
P(A \cup B \cup C) &= P((A \cup B) \cup C) \\
&= P(A \cup B) + P(C) & \text{from (c)} \\
&= P(A) + P(B) + P(C) & \text{from (c)}
\end{aligned}
\end{equation}
$

From this, we can easily generalize that $P(A_1 \cup A_2 \cup ... \cup A_k) = \sum_{i=1}^{k} P(A_k)$

And because $\{s_1, s_2, ..., s_k\} = \{s_1\} \cup \{s_2\} \cup ... \cup \{s_k\}$, we have

#### $P(\{s_1, s_2, ..., s_k\}) = P(\{s_1\}) + P(\{s_2\}) + ... + P(\{s_k\}) = P(s_1) + P(s_2) + ... + P(s_k)$

### More properties of Probabilities

#### 5. If $A \subset B$, then $P(A) \leq P(B)$.

##### Proof

We know

$B = A \cup (B \cap A^C)$.

Hence

$P(B) = P(A) + P(B \cap A^C)$.

From (a) we know $P(B \cap A^C) \geq 0$.

Hence $P(B) = P(A) + P(B \cap A^C) \geq P(A)$.

#### 6. $P(A \cup B) = P(A) + P(B) - P(A \cap B)$. Note $A$ and $B$ are not necessarily disjoint here.

##### Proof

Let represent $A \cup B$ in terms of disjoint events.

Let $a = P(A \cap B^C)$, $b = P(A \cap B)$, and $c = P(B \cap A^C)$.

Hence

$P(A \cup B) = a + b + c$

Then we have

$P(A) + P(B) - P(A \cap B) = (a + b) + (b + c) - b = a + b + c = P(A \cup B)$.

##### Union Bound

Since $P(A \cap B) \geq 0$, we have $P(A) + P(B) \geq P(A \cup B)$.

##### 7. $P(A \cup B \cup C) = P(A) + P(A^C \cap B) + P(A^C \cap B^C \cap C)$. Note $A$, $B$ and $C$ are not necessarily disjoint here.

##### Proof

Let's represent $A \cup B \cup C$ as a union of disjoint sets

$A \cup B \cup C = A \cup (B \cap A^C) \cup (A^C \cap B^C \cap C)$.

Hence $P(A \cup B \cup C) = P(A) + P(A^C \cap B) + P(A^C \cap B^C \cap C)$ (from 4).

### A Discrete Example

- Two rolls of a tetrahedral die (First roll's outcome denoted as $X$, second roll's outcome denoted as $Y$)
- Let every possible outcome has probability $\frac{1}{16}$

![](assets/sample_space_tetrahedral_die_xy.png)

$P(X = 1) = 4 \cdot \frac{1}{16} = \frac{1}{4}$.

Let $Z = min(X, Y)$.

$P(Z = 4) = P(X = Y = 4) = \frac{1}{16}$ as the event will be $\{(4, 4)\}$.

$P(Z = 2) = \frac{5}{16}$ as the event will be $\{(2, 2), (2, 3), (2, 4), (3, 2), (4, 2)\}$.

#### Discrete Uniform Law

- Assume $\Omega$ consists of $n$ equally likely elements
- Assume $A$ consists of $k$ elements

![](assets/discrete_uniform_law.png)

Then

$P(A) = k \cdot \frac{1}{n}$.

### A Continuous Example

Revisiting the earlier dart throwing example.

![](assets/sample_space_dart.png)

Uniform probability law: Probability = Area.

- $P(\{(x, y)\} | x + y \leq \frac{1}{2}) = \frac{1}{2}(\frac{1}{2} \frac{1}{2}) = \frac{1}{8}$.
![](assets/continuous_uniform_law_ex1.png)

- $P(\{0.5, 0.3\}) = 0$.

### Probability calculation steps

- Specify the sample space
- Specify a probability law
- Identify an event of interest
- Calculate

### Countable Additivity

We carry out an experiment whose outcome is arbitrary large positive integer.

For example, suppose we keep tossing a coin until we observe heads for the first time.

- Sample space: $\{1, 2, ...\}$

- Probability law: $P(n) = \frac{1}{2^n}$, $n = 1, 2, ...$
 - Is that a good probability law? From the geometric series we know that $\frac{1}{2} \sum_{n=0}^{\infty} \frac{1}{2^n} = \frac{1}{2} \frac{1}{1 - \frac{1}{2}} = 1$
 
Let calculate the probability of another event of the same experiment.

What is the probability that the outcome is even?

$
\begin{equation}
\begin{aligned}
P(even) &= P(\{2, 4, 6, ...\}) \\
&= P(2) + P(4) + P(6) + ... & \text{from(4), which is based on (c) finite additivity} \\
&= \frac{1}{2^2} + \frac{1}{2^4} + \frac{1}{2^6} + ... \text{(from the probability law of the experiment)}\\
&= \frac{1}{4}(1 + \frac{1}{4} + \frac{1}{4^2} + ...) \\
&= \frac{1}{4} (\frac{1}{1 - \frac{1}{4}}) & \text{(from geometric series)} \\
&= \frac{1}{4} \cdot \frac{4}{3} \\
&= \frac{1}{3}
\end{aligned}
\end{equation}
$

Is this correct? We have used the property (4) which is based on finite additivity, while the sum of probabilities here is infinite.

The way out of this dilemma is to introduce an additional axiom that will indeed allow this kind of calculation.

### Countable Additivity Axiom

To strengthen the finite additivity axiom, we are introducing an axiom that

If $A_1, A_2, A_3, ...$ is an infinite **sequence** of of __disjoint__ events, then

$P(A_1 \cup A_2 \cup A_3 \cup ...) = P(A_1) + P(A_2) + P(A_3) + ...$

Note the mathematical subtleties of the term ***sequence** here.

For example, for the unit square dart throwing experiment, does this axiom implies that

$P(\Omega) = P(\cup \{(x, y)\}) = \sum P(\{(x, y)\}) = \sum 0 = 0$, which appears to be a paradox?

No. $\mathbb{R}$ is uncountable, hence the unit square is an uncountable set, hence we can not find a sequence of $\{(x, y)\}$ to traverse the sample space $\Omega$.

#### Exercise: Using countable additivity

Let the sample space be the set of positive integers and suppose that $P(n) = \frac{1}{2^n}$, for $n = 1, 2, ...$. Find the probability of set $\{3, 6, 9, ...\}$, that is, of the set of of positive integers that are multiples of $3$.

##### Solution

$
\begin{equation}
\begin{aligned}
P(multiples~of~3) &= P(\{3, 6, 9, ..\}) \\
&= P(3) + P(6) + P(9) + ... & \text{(from Countable Additivity Axiom)}\\
&= \frac{1}{2^3} + \frac{1}{2^6} + \frac{1}{2^9} + ... \\
&= \frac{1}{2^3} (1 + \frac{1}{2^3} + (\frac{1}{2^3})^2) + ...) \\
&= \frac{1}{2^3} (\frac{1}{1 - \frac{1}{2^3}}) & \text{(from geometric series)}\\
&= \frac{1}{2^3} \frac{2^3}{2^3 - 1} \\
&= \frac{1}{7}
\end{aligned}
\end{equation}
$

## Interpretation and Uses of Probabilities

- A narrow point of view: Probability Theory is just a bunch of math
 - Axioms $\implies$ Theorems
 
- Are probabilities frequencies?
 - $P(coin~toss~head) = \frac{1}{2}$
 - $P(the~president~of~...~will~be~re-elected) = 0.7$
 
- Probabilities are often interpreted as
 - Description of beliefs
 - Betting preferences
 - Which are subjective
 
### The role of probability theory
- A framework for analyzing phenomena with uncertain outcomes
 - Rules of consistent reasoning
 - Used for predictions and decisions
 
![](assets/probability_theory_big_picture.png)