# Lecture 2.3: Introduction to Random Variables

## Outline

* What is a random variable?
* What is the probability mass function of a discrete random variable?
* Difference between sample statistics and population parameter
* The mean and variance of a discrete random variable
* The mean and variance of a transformed random variable
* Adding two independent random variables

## Introducing Random Variables

### Review of Experiment

* Suppose that we flip a coin 10 times and count the number of times the coin turns up heads
* This is an example of a **random experiment**: an experiment whose outcome is not known until it is observed
* Generally, an experiment is any procedure that can, at least in theory, be infinitely repeated and has a well-defined set of outcomes
* Before we flip the coin, we know that the number of heads appearing is an integer from 0 to 10, so the outcomes of the experiment are well defined

**Turning Letters into Numbers**

* We’ve been studying statements like $P(A)$ or $P(B|A)$. That is, probability of different events happening.
    * 'Things that happen' are sets, also called events.
    * We measure chance by measuring sets, using a measure called probability.
* But many things in the real world are easier explained using numerical outcomes.
* The concept of random variables enables us to talk about the probability of numerical outcomes.

#### Example

We might have all these different sample spaces:

$\Omega = \{head, tail\}$; $\Omega = \{same, different\}$; $\Omega = \{boy, girl\}$

We can represent these sample spaces more concisely in terms of numbers:

$\Omega = \{0, 1\}$

There are also many random experiments that genuinely produce random numbers as their outcomes.
    
* _e.g._ the number of heads from 10 tosses of a coin; the number of girls in a three-child family; and so on.

When the outcome of a random experiment is a number, it enables us to quantify many new things of interest:

1. quantify the average value (e.g. the average number of heads we would get if we made 10 coin-tosses again and again)
2. quantify how much the outcomes tend to diverge from the average value
3. quantify relationships between different random quantities (e.g. is the number of girls related to the hormone levels of the fathers?)

### Defining a Random Variable

* A random variable is a function from a sample space $\Omega$ to the real numbers $R$.
* We write $X: \Omega \rightarrow R$.
* Intuitively, a random variable is a rule or function that translates the outcomes of a probability experiment into numbers

* This definition serves two purposes:
    * Describing many different sample spaces in the same terms:
        * e.g. $\Omega = \{0, 1\}$ with $P(1) = p$ and $P(0) = 1 - p$ describes EVERY possible experiment with two outcomes.
    * Giving a name to a large class of random experiments that genuinely produce random numbers, and for which we want to develop general rules for finding averages, variances, relationships, and so on.

#### Example

Toss a coin 3 times. The sample space is  
$\Omega = \{HHH, HHT, HTH, HTT, THH, THT, TTH, TTT\}$  

One example of a random variable is $X: \Omega \rightarrow R$ such that, for sample point $s_i$, we have $X(s_i) = number \ heads \ in \ outcome \ s_i$.

So $X(HHH) = 3$; $X(THT) = 1$, etc.

Another example is $Y: \Omega \rightarrow R$ such that $Y(s_i)=
\begin{cases}
    1, & \text{if 2nd toss is a head}\\
    0, & \text{otherwise}
\end{cases}$

Then $Y(HTH) = 0$, $Y(THH) = 1$, $Y(HHH) = 1$, etc.

#### Notation

By convention, we use CAPITAL LETTERS for random variables (e.g. $X$), and lower case letters to represent the values that the random variable takes (e.g. $x$).

For a sample space $\Omega$ and random variable $X: \Omega \rightarrow R$, and for a real number $x$, $P(X = x) = P(\text{outcome s is such that }X(s) = x) = P(\{s : X(s) = x\})$.

#### Example

Toss a fair coin 3 times. All outcomes are equally likely: $P(HHH) = P(HHT) = . . .= P(TTT) = 1/8$.

Let $X: \Omega \rightarrow R$, such that $X(s) = \text{# heads in } s$. Then,

$P(X = 0) = P(\{TTT\}) = 1/8$ 

$P(X = 1) = P(\{HTT, THT, TTH\}) = 3/8$  

$P(X = 2) = P(\{HHT, HTH, THH\}) = 3/8$

$P(X = 3) = P(\{HHH\}) = 1/8$  

Note that $P(X = 0) + P(X = 1) + P(X = 2) + P(X = 3) = 1$.  

### Discrete vs Continuous

* **Discrete random variables** can assume only a finite or limited set of values
    * e.g. the coin tossing example

* **Continuous random variables** can assume any one of an infinite set of values
    * e.g. how long I have to wait before the bus comes

#### More Examples

<img src="images/example1.png">

<img src="images/example2.png">

### Probability Distribution

* A **probability distribution** is a rule according to which probability is apportioned, or distributed, among the different sets in the sample space.

* At its simplest, a probability distribution just lists every element in the sample space and allots it a probability between 0 and 1, such that the total sum of probabilities is 1.

* E.g. for a single coin toss, $P(heads) = 0.5$, $P(tails) = 0.5$

#### Discrete Probability Distributions

Let $\Omega = \{s_1, s_2, \dots\}$ be a discrete sample space.  
A discrete probability distribution on $\Omega$ is a set of real numbers $\{p_1, p_2, \dots\}$ associated with the sample points $\{s_1, s_2, \dots\}$ such that:

$$0 \leq p_i \leq 1 \text{ for all } i$$  

and  

$$\sum_i p_i = 1$$  

$p_i$ is called the probability of the event that the outcome is $s_i$.  

We write: $p_i = P(s_i)$.  

The rule for measuring the probability of any set, or event, $A \subseteq \Omega$, is to sum the probabilities of the elements of $A$:  

$$P(A) = \sum_{i \in A} p_i$$  

E.g. if $A = \{s_3, s_5, s_{14}\}$, then $P(A) = p_3 + p_5 + p_{14}$.

#### Continous Probability Distributions

On a continuous sample space $\Omega$, e.g. $\Omega = [0, 1]$, we can not list all the elements and give them an individual probability. We will need more sophisticated methods detailed later in the course.  

However, the same principle applies. A continuous probability distribution is a
rule under which we can calculate a probability between 0 and 1 for any set, or
event, $A \subseteq \Omega$.

#### Coin Toss Example

Sample Space: $\Omega = \{\text{heads, tails}\} = \{1, 0\}$

Random variable $X = 
\begin{cases}
    1, & \text{if it's heads}\\
    0, & \text{otherwise}
\end{cases}$

Assuming we are using a fair coin, the probability distribution of X is:

|x        |  0  |  1  |
|:--------|:---:|:---:|
|P(X = x) | 0.5 | 0.5 |

* **Probability mass function (PMF)**: It tells us the probability that a **discrete** random variable $X$ is exatly equal to a certain value. We define it as 

$$ f_X (x) = P (X = x) $$

* **Cumulative distribution function (CDF)**: It gives us the probability that a random variable $X$ is less than or equal to a certain value. We define it as

$$ F_X (x) = P( X \leq x) $$

## Sample vs Population

Last week, we had mostly been working with samples. When we want to find out something about a population, we take a sample from that population, analysis the sample and make inference about the population that the sample comes from.

It is important to be very clear about whether we are working with a sample or a populations. Here are some distinctions interms of notations.

| Measure | Sample Statistics (Data) | Population Parameter (X Random Variable) |
|---------|---------------|--------------------------------|
|Mean     | $\bar{x}$     | $E(X)$ or $\mu$ (Expectation)  |
|Variance | $s^2$         | $Var(X) \text{ or } \sigma^2$  |
|Standard Deviation | $s$ | $SD(X)$ or $\sigma$                       |

As the sample size $n$ increases, the sample statistics will converge to the values of the population parameters.

### Expectation / Mean

For now we will assume $X$ is a discrete random variable.  
If random variable $X$ takes outcomes $x_1, x_2, \dots, x_k$ with probabilities $P(X = x_1), P(X = x_2), \dots, P(X = x_k)$, the expectation (or mean) of $X$ is,  

$$ E(X) = \mu = \sum_{i = 1}^k x_i P(X = x_i) $$

In contract, for a sample of values from X, $x_1, x_2, \dots, x_n$, the sample mean is,

$$ \bar{x} = \frac{1}{n} \sum_{i = 1}^n x_i $$

Note: in the first equation, $x_1, \dots, x_k$ are all the possible outcomes $X$ can take, while in the second equation $x_1, \dots, x_n$ are actual realizations of $X$ (the actually numbers we draw from the distribution of $X$), so they are very different.

#### Example

Let's use the single coin toss example again.  

$X = 
\begin{cases}
    1, & \text{if it's heads}\\
    0, & \text{otherwise}
\end{cases}$

Assuming we are using a fair coin, what is the mean of $X$ (or $E(X)$)?

$$E(X) = \mu = \sum_{i = 1}^k x_i P(X = x_i) = 0 \times 0.5 + 1 \times 0.5 = 0.5 $$

#### A Betting Example

Find the expected value of each of the following bets:  

a) you get \$5 with probability 1.0.  
b) you get \$10 with probability 0.5, or \$0 with probability 0.5.  
c) you get \$5 with probability 0.5, \$10 with probability 0.25 and \$0 with probability 0.25.  
d) you get \$5 with probability 0.5, \$105 with probability 0.25 or lose \$95 with probability 0.25.  

Which bet is the best bet ?

### Variance

The variance for random variable $X$ is,

$$ \sigma^2 = \sum_{i = 1}^k (x_i - \mu)^2 P(X = x_i) $$

In contrast, the sample variance is,

$$ s^2 = \frac{1}{n - 1} \sum_{i = 1}^n (x_i - \bar{x})^2 $$

#### Example

Let's continue with the coin toss example and calculate the variance of $X$.

|x        |  0  |  1  |
|:--------|:---:|:---:|
|P(X = x) | 0.5 | 0.5 |

$$ \sigma^2 = \sum_{i = 1}^k (x_i - \mu)^2 P(X = x_i) = (0 - 0.5)^2 \times 0.5 + (1 - 0.5)^2 \times 0.5 = 0.25 $$

### Transforming X

If we let $Y = a X + b$, then for $Y$, we have

* $ E(Y) = a E(X) + b $   



* $ Var(Y) = a^2 Var(X) $    



* $ SD(Y) = |a| SD(X) $  

### Adding Random Variables

If we let $Z = X + Y$, then for $Z$, we have

* $E(Z) = E(X) + E(Y)$

If $X$ and $Y$ are **independent**,

* $Var(Z) = Var(X) + Var (Y)$

If $Z$ is a linear combination of $X$ and $Y$, i.e. $Z = a X + b Y$, then

* $E(Z) = a E(X) + b E(Y)$

And if $X$ and $Y$ are **independent**,

* $Var(Z) = a^2 Var(X) + b^2 Var(Y)$ 

#### Example

Let $X$ and $Y$ be the number of heads from two independent coin flips, and $Z = X + Y$, calculate $E(Z)$ and $Var(Z)$.