Random Variables
-----

![](http://assets.amuniversal.com/321a39e06d6401301d80001dd8b71c47)

By The End Of This Session You Should Be Able To:
----

- Define a random variable
- Create the probability mass function of a discrete random variable
- Calculate the mean and variance of a discrete random variable


Random Variables Introduction
-----

Let's flip a coin 3 times and count the number of times the coin turns up heads.

This is an example of a experiment: an experiment whose outcome is not known until it is observed

Generally, an experiment is any procedure that  has a well-defined set of observed outcomes and can be infinitely repeated  (at least in theory). 

Check for understanding
-------

What is the set of outcomes for number of heads after flipping a coin 3 times?

`{0, 1, 2, 3} `

Turning Letters into Numbers
-----

- We’ve been studying statements like P(A) or P(B|A). That is, probability of different events happening.
    - 'Things that happen' are sets, also called events.
    - We measure chance by measuring sets, using a measure called probability.
- But many things in the real world are easier explained using numerical outcomes.
- Random variables enables us to talk about the probability of numerical outcomes.

#### Example

We might have all these different sample spaces:  

$\Omega = \{head, tail\}$  
$\Omega = \{same, different\}$  
$\Omega = \{boy, girl\}$  

We can represent these sample spaces more concisely in terms of numbers:

$\Omega = \{0, 1\}$

When the outcome of a random experiment is a number, it allows us to use __math__:

1. Can calculate central tendency (e.g., mean)
2. Can calculate spread (e.g., variance)
3. Can quantify the relationships between different random variables (fancy!)

Defining a Random Variable
-----

Intuitively, a random variable is a rule (or function) that translates the outcomes of an experiment into numbers

A random variable is a function from a sample space Ω to the real numbers R.

We write X: Ω→R

This definition serves two purposes:

1. Describing many different sample spaces in the same terms:
    - e.g. Ω = {0, 1} with P(1) = p and P(0) = 1 - p describes EVERY possible experiment with two outcomes.
2. Giving a name to a large class of random experiments that genuinely produce random numbers, and for which we want to develop general rules for finding averages, variances, relationships, and so on.

PMF Example
-----

Toss a coin 3 times. The sample space is  
Ω = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT}

One example of a random variable is:  
$X(s_i) = number \ heads \ in \ outcome \ s_i$.

So…  
$X(THT) = 1$   
...   
$X(HHH) = 3$  

Notation
-----

By convention, we use CAPITAL LETTERS for random variables (e.g. X), and lower case letters to represent the values that the random variable takes (e.g. x).

What is a Probability Distribution?
-----

For a sample space, a mapping from each outcome to frequency for each outcome.

Probability Distribution: Sum of 2 dice
-----
<br>
<center><img src="https://upload.wikimedia.org/wikipedia/commons/thumb/1/12/Dice_Distribution_%28bar%29.svg/512px-Dice_Distribution_%28bar%29.svg.png" height="500"/></center>

At its simplest, a probability distribution just lists every element in the sample space and allots it a probability between 0 and 1, such that the total sum of probabilities is 1.



Check for understanding
-------

What is the probability distribution for a single coin flip?

P(heads) = 0.5  
P(tails) = 0.5

Coin Toss Example
-----

Sample Space: Ω = {heads, tails} = {1, 0}

Assuming we are using a fair coin, the probability distribution of X is:

|x        |  0  |  1  |
|:--------|:---:|:---:|
|P(X = x) | 0.5 | 0.5 |

$X = 
\begin{cases}
    1, & \text{if it's heads}\\
    0, & \text{otherwise}
\end{cases}$

Probability mass function (PMF)
------

It tells us the probability that a __discrete__ random variable X is exatly equal to a certain value. We define it as:

$$ f_X (x) = P (X = x) $$

PMF Example
------

Toss a fair coin 3 times. All outcomes are equally likely: P(HHH) = P(HHT) = . . .= P(TTT) = 1/8.

Let X: Ω→R, such that X(s) = {# heads in s}. Then,

P(X = 0) = P(\{TTT\}) = 1/8    

P(X = 1) = P(\{HTT, THT, TTH\}) = 3/8  

P(X = 2) = P(\{HHT, HTH, THH\}) = 3/8  

P(X = 3) = P(\{HHH\}) = 1/8  

Note that P(X = 0) + P(X = 1) + P(X = 2) + P(X = 3) = 1.  

Expected value of a discrete random variable 
-----

The sum of values weighted by probability of occurrence:

E[X] = p(x<sub>1</sub>)x<sub>1</sub> + p(x<sub>2</sub>)x<sub>2</sub> + … + p(x<sub>n</sub>)x<sub>n</sub>

Expectation Example
-----

Let's use the single coin toss example again.  

$X = 
\begin{cases}
    1, & \text{if it's heads}\\
    0, & \text{otherwise}
\end{cases}$



Assuming we are using a fair coin, what is the mean of X (or E(X))?

$$E(X) = \mu = \sum_{i = 1}^k x_i P(X = x_i) = 0 \times 0.5 + 1 \times 0.5 = 0.5 $$

Variance
-----

The variance for a discrete random variable X is,

$$ \sigma^2 = \sum_{i = 1}^k (x_i - \mu)^2 P(X = x_i) $$



In contrast, the sample variance is,

$$ s^2 = \frac{1}{n - 1} \sum_{i = 1}^n (x_i - \bar{x})^2 $$

Variance Example
-----

Let's continue with the coin toss example and calculate the variance of $X$.

|x        |  0  |  1  |
|:--------|:---:|:---:|
|P(X = x) | 0.5 | 0.5 |

$$ \sigma^2 = \sum_{i = 1}^k (x_i - \mu)^2 P(X = x_i) = (0 - 0.5)^2 \times 0.5 + (1 - 0.5)^2 \times 0.5 = 0.25 $$

Summary
----

- A random variable is observed outcome that has an element of chance.
- Probability Mass Function (PMF) is all the outcomes and associated probability
- Be familiar with the formulas for the mean and variance of a discrete random variable

<br>
<br> 
<br>

----

----
Bonus Materials
----

Adding Random Variables
-----

If we let Z = X + Y, then for Z, we have:

$E(Z) = E(X) + E(Y)$


If X and Y are __independent__:

Var(Z) = Var(X) + Var (Y)

#### Discrete Probability Distributions

Let $\Omega = \{s_1, s_2, \dots\}$ be a discrete sample space.  
A discrete probability distribution on $\Omega$ is a set of real numbers $\{p_1, p_2, \dots\}$ associated with the sample points $\{s_1, s_2, \dots\}$ such that:

$$0 \leq p_i \leq 1 \text{ for all } i$$  

and  

$$\sum_i p_i = 1$$  

$p_i$ is called the probability of the event that the outcome is $s_i$.  

We write: $p_i = P(s_i)$.  

The rule for measuring the probability of any set, or event, $A \subseteq \Omega$, is to sum the probabilities of the elements of $A$:  

$$P(A) = \sum_{i \in A} p_i$$  

E.g. if $A = \{s_3, s_5, s_{14}\}$, then $P(A) = p_3 + p_5 + p_{14}$.

#### Continous Probability Distributions

On a continuous sample space $\Omega$, e.g. $\Omega = [0, 1]$, we can not list all the elements and give them an individual probability. We will need more sophisticated methods detailed later in the course.  

However, the same principle applies. A continuous probability distribution is a
rule under which we can calculate a probability between 0 and 1 for any set, or
event, $A \subseteq \Omega$.

#### More Examples

<img src="images/example1.png">

<img src="images/example2.png">

Notation
-----

For a sample space $\Omega$ and random variable $X: \Omega \rightarrow R$, and for a real number $x$, $P(X = x) = P(\text{outcome s is such that }X(s) = x) = P(\{s : X(s) = x\})$.

### Transforming X

If we let $Y = a X + b$, then for $Y$, we have

- $ E(Y) = a E(X) + b $   



- $ Var(Y) = a^2 Var(X) $    



- $ SD(Y) = |a| SD(X) $  

If Z is a linear combination of X and Y, i.e. Z = a X + b Y, then:  

$E(Z) = a E(X) + b E(Y)$

And if X and Y are __independent__:  

$Var(Z) = a^2 Var(X) + b^2 Var(Y)$ 