## I.1—PROBABILITY SPACES

A *finite probability space* is the most basic data structure used throughout this course for modeling uncertainty (often just called  a probability space or a probability model).

A *finite probability space* consists of two ingredients:

- a **sample space** $\Omega$ consisting of a finite (i.e., not infinite) number of collectively exhaustive and mutually exclusive possible outcomes.

How we specify a sample space is usually not unique, for instance we can add extraneous outcomes that all have probability 0 or extraneous information that doesn't matter. Generally speaking it's best to choose a sample space that is as simple as possible for modeling what we care about solving. (For example, if we were rolling a six-sided die, and we actually only care about whether the face shows up at least 4 or not, then it's sufficient to just keep track of two outcomes, "at least 4" and "less than 4".)

- an **assignment of probabilities** $\mathbb {P}$: for each possible outcome $\omega \in \Omega$, we assign a probability $\mathbb {P}(\text {outcome }\omega )\in[0, 1]$ at least 0 and at most 1, where we require that the probabilities across all the possible outcomes in the sample space add up to 1: $\sum _{\omega \in \Omega }\mathbb {P}(\text {outcome }\omega )=1.$

**Notation**: As shorthand we occasionally use the tuple “$(\Omega ,\mathbb {P})$" to refer to a finite probability space to remind ourselves of the two ingredients needed, sample space $\Omega$ and an assignment of probabilities $\mathbb {P}$.

In Python code, a probability space can be written as a dictionnary encoding in one structure the sample space and the probability table by:

## I.2—EVENTS

An event is a subset of the sample space $\Omega$. In our table representation for a probability space, an event could thus be thought of as a subset of the rows, and the probability of the event is just the sum of the probability values in those rows!

The probability of an event $\mathcal{A}\subseteq \Omega$ is the sum of the probabilities of the possible outcomes in $\mathcal{A}$:

$$\mathbb {P}(\mathcal{A})\triangleq \sum _{\omega \in \mathcal{A}}\mathbb {P}(\text {outcome }\omega )$$,
 
where “$\triangleq$" means “defined as".

We can translate the above equation into Python code. In particular, we can compute the probability of an event encoded as a Python set event, where the probability space is encoded as a Python dictionary prob_space:

In [2]:
def prob_of_events(event, prob_space):
    total = 0
    for outcome in event:
        total +=prob_space[outcome]
    return total

prob_space = {'sunny': 1/2, 'rainy': 1/6, 'snowy': 1/3}
rainy_or_snowy_event = {'rainy', 'snowy'}
print(prob_of_events(rainy_or_snowy_event, prob_space))

0.5


## I.3 — RANDOM VARIABLES

**Definition of a “finite random variable" (in this course, we will just call this a “random variable")**: Given a finite probability space $(\Omega ,\mathbb {P})$, a *finite random variable* $X$ is a mapping from the sample space $\Omega$ to a set of values $\mathcal{X}$ that random variable $X$ can take on. (We will often call $\mathcal{X}$ the “alphabet" of random variable $X$.):
$$ X:\Omega\to\mathcal{X}$$

For example, random variable $W$ takes on values in the alphabet $\{ \text {sunny},\text {rainy},\text {snowy}\}$, and random variable $I$ takes on values in the alphabet $\{ 0,1\}$.

**Quick summary**: There's an underlying experiment corresponding to probability space $(\Omega ,\mathbb {P})$. Once the experiment is run, let $\omega \in \Omega$ denote the outcome of the experiment. Then the random variable takes on the specific value of $X(\omega )\in \mathcal{X}$.

**Technical note**: Even though the formal definition of a finite random variable doesn't actually make use of the probability assignment $\mathbb {P}$, the probability assignment will become essential as soon as we talk about how probability works with random variables. 

**Explanation using a python example**: 



What happen in this code.

1. First, there is an underlying probability space $(\Omega , \mathbb {P})$, where $\Omega = \{ \text {sunny}, \text {rainy}, \text {snowy}\}$, and 
$$ \begin{eqnarray}
\mathbb{P}(\text{sunny}) &=& 1/2, \\
\mathbb{P}(\text{rainy}) &=& 1/6, \\
\mathbb{P}(\text{snowy}) &=& 1/3.
\end{eqnarray}$$

2. A random outcome $\omega \in \Omega$
is sampled using the probabilities given by the probability space $(\Omega ,\mathbb {P})$. This step corresponds to an underlying experiment happening.

3. Two random variables are generated:
    - $W$ is set to be equal to $\omega$. As an equation: $$\begin{eqnarray}W(\omega) &=&\omega\quad\text{for }\omega\in\{\text{sunny},\text{rainy},\text{snowy}\}.\end{eqnarray}$$ This step perhaps seems entirely unnecessary, as you might wonder “Why not just call the random outcome $W$ instead of $\omega$?" Indeed, this step isn't actually necessary for this particular example, but the formalism for random variables has this step to deal with what happens when we encounter a random variable like $I$.
    - $I$ is set to 1 if $\omega =\text {sunny}$, and 0 otherwise. As an equation: $$\begin{eqnarray} I(\omega) &=&
\begin{cases}
  1 & \text{if }\omega=\text{sunny}, \\
  0 & \text{if }\omega\in\{\text{rainy},\text{snowy}\}.
\end{cases}
\end{eqnarray}$$ Importantly, multiple possible outcomes (rainy or snowy) get mapped to the same value 0 that $I$ can take on. 

We see that random variable $W$ maps the sample space $\Omega =\{ \text {sunny},\text {rainy},\text {snowy}\}$ to the same set $\{ \text {sunny},\text {rainy},\text {snowy}\}$. Meanwhile, random variable $I$ maps the sample space $\Omega =\{ \text {sunny},\text {rainy},\text {snowy}\}$ to the set $\{ 0,1\}$.

We can pictorially see what's going on by looking at the probability tables for: the original probability space, the random variable $W$, and the random variable $I$:

<img src="https://d37djvu3ytnwxt.cloudfront.net/assets/courseware/v1/cdb0d997cac4daf2d612e86780ce72bf/asset-v1:MITx+6.008.1x+3T2016+type@asset+block/images_sec-random-variables-main.png" width="60%">

These tables make it clear that a “random variable" really is just reassigning/relabeling what the values are for the possible outcomes in the underlying probability space (given by the top left table):

- In the top right table, random variable $W$ does not do any sort of relabeling so its probability table looks the same as that of the underlying probability space.

- In the bottom left table, the random variable I
relabels/reassigns “sunny" to 1, and both “rainy" and “snowy" to 0. Intuitively, since two of the rows now have the same label 0, it makes sense to just combine these two rows, adding their probabilities ($\frac{1}{6}+\frac{1}{3}=\frac{1}{2}$). This results in the bottom right table.

Specify a Random Variable in Python:

### Random Variables Notation and Terminology

In this course, we denote random variables with capital/uppercase letters, such as $X, W, I$, etc. We use the phrases “probability table", “probability mass function" (abbreviated as PMF), and “probability distribution" (often simply called a distribution) to mean the same thing, and in particular we denote the probability table for $X$ to be $p_ X$ or $p_ X(\cdot )$.

We write $p_ X(x)$ to denote the entry of the probability table that has label $x \in \mathcal{X}$ where $\mathcal{X}$ is the set of values that random variable $X$ takes on. Note that we use lowercase letters like x to denote variables storing nonrandom values. We can also look up values in a probability table using specific outcomes, e.g., from earlier, we have $p_ W(\text {rainy}) = 1/6$ and $p_ I(1)=1/2$.

Note that we use the same notation as in math where a function $f$
might also be written as $f(\cdot )$ to explicitly indicate that it is the function of one variable. Both $f$ and $f(\cdot )$ refer to a function whereas $f(x)$ refers to the value of the function $f$ evaluated at the point $x$.

As an example of how to use all this notation, recall that a probability table consists of nonnegative entries that add up to 1. In fact, each of the entries is at most 1 (otherwise the numbers would add to more than 1). For a random variable $X$ taking on values in $\mathcal{X}$, we can write out these constraints as:
$$0 \le p_ X(x) \le 1\quad \text {for all }x\in \mathcal{X}, \qquad \sum _{x \in \mathcal{X}} p_ X(x) = 1.$$

Often in the course, if we are making statements about all possible outcomes of $X$, we will omit writing out the alphabet $\mathcal{X}$ explicitly. For example, instead of the above, we might write the following equivalent statement:
$$0 \le p_ X(x) \le 1\quad \text {for all }x, \qquad \sum _ x p_ X(x) = 1.$$