In [None]:
# Slides for Probability and Statistics module, 2015-2016
# Matt Watkins, University of Lincoln

# Expectation and Variance

In these lectures we will define the expectation and variance of a random variable. 

This is a further step in connecting abstract probability distribututions to statistical measures, and the real world, if required.

<div style="background-color:Gold; margin-left: 20px; margin-right: 20px; padding-bottom: 8px; padding-left: 8px; padding-right: 8px; padding-top: 8px; border-radius: 25px;">
By the end of the lecture you should know 
<li> how to calculate the expectation value of a random variable (this is some measure of where the centre of a distribution is) </li>
<li> how to calculate the variance of a random variable (this is a measure of how widely spread a distribution is).</li>
<li> how to calculate expectation values of a function of a random variable.</li>
</div>

Later we'll study particular probability distributions and understanding their expectation values will allow us to model and predict the results of measurements made on objects that can be modelled as random variables.

# Revision

## Discrete random variables

**Definition**
Consider an experiment, with outcome set $S$, split into $n$ mutually exclusive and exhaustive events $E_1,E_2,E_3,\ldots,E_n$. A variable, $X$ say, which can assume exactly $n$ numerical values each of which corresponds to one and only one of the given events is called a random variable.

Schematically the mutually exclusive and exhaustive events look like

<img src="../Images/Exclusive_and_exhaustive.jpg" alt="Exhaustive" height="200" width="200">

Here our outcome set is split into 4 mutually exclusive events (no overlap) and exhaustive (all of $S$ is covered by them). 

So to associate a random variable (call it $X$) with this sample space we could have something like where to each mutually exclusive event we associate a value of $X = x$ and a probability $p_X(x)$. 

|$X$|event|$p_X(x)$|
|-|-|-|
|1|$E_1$|$p_X(1) = 0.1$|
|2|$E_2$|$p_X(2) = 0.2$|
|3|$E_3$|$p_X(3) = 0.3$|
|4|$E_4$|$p_X(4) = 0.4$|

Check list
- is the sample space well defined?
- are the events mutually exclusive?
- do the events cover all the sample space?
- are values of the variable clearly assigned one-to-one to the possible events?

### Expectation of a Random variable

**Definition**

Given a random variable $X$ with a probability mass function $p(x)$  we define the expectation of $X$, written as $\text{E}[X]$ as 

<div style="background-color:Gold; margin-left: 20px; margin-right: 20px; padding-bottom: 8px; padding-left: 8px; padding-right: 8px; padding-top: 8px; border-radius: 25px;">
$$
\text{E}[X] = \sum_{x\in X} x \cdot p(x)
$$
</div>

The equivalent for a continuous random variable $Z$ is

<div style="background-color:Gold; margin-left: 20px; margin-right: 20px; padding-bottom: 8px; padding-left: 8px; padding-right: 8px; padding-top: 8px; border-radius: 25px;">
$$
\text{E}[Z] = \int_{-\infty}^{\infty} z\cdot f(z) \mathrm{d}z
$$
</div>

We see that the relationship is very close between discrete and continuous cases - we replace a sum with an integral.

<div style="background-color:Gold; margin-left: 20px; margin-right: 20px; padding-bottom: 8px; padding-left: 8px; padding-right: 8px; padding-top: 8px; border-radius: 25px;">
We also call $\text{E}[X]$ the mean of $X$ and write it as $\mu_X$
</div>

### Example - discrete random variable

In a game 3 dice are rolled. The player bets £1.  They get back £1 if they roll a single 5, £2 if 2 fives come up, and £3 if 3 fives come up. If no 5s come up they lose their £1 stake.


Let us set up a random variable $W$ for the winnings of the player.

The sample space is the cartesian product of rolling a die:

$S = \{(x_1,x_2,x_3): x_i = 1,2, \ldots, 6; i = 1,2,3\}$

Now, we'll define $W$ as the amount the player wins - this can be $-1,1,2,3$ corresponding to 0,1,2, or 3 fives showing:

$$
V = 
  \begin{cases} 
      \hfill -1   \hfill & \text{if '0 dice lands with 5 spots on the top face'} \\
      \hfill 1   \hfill & \text{if '1 dice lands with 5 spots on the top face'} \\
      \hfill 2   \hfill & \text{if '2 dice land with 5 spots on the top face'} \\
      \hfill 3   \hfill & \text{if '3 dice land with 5 spots on the top face'} 
  \end{cases}
$$ 

and our probability mass function will be

$$
p_V(v) = 
  \begin{cases} 
      \hfill \frac{125}{216}   \hfill & \text{ for } v = -1 \\
      \hfill \frac{75}{216}   \hfill & \text{ for } v = 1 \\
      \hfill \frac{15}{216}   \hfill & \text{ for } v = 2 \\
      \hfill \frac{1}{216}   \hfill & \text{ for } v = 3 \\
  \end{cases}
$$

note that $p_V(0) > 0.5$.

Now, suppose the player plays the game $n >> 1$ times.

They win $v_1$ pounds the first time, $v_2$ the second time, $\ldots$ ,$v_n$ pounds the $n^{th}$ time. The average amount one would then be a standard average

$$
\bar{v} = \frac{1}{n}\sum_{i=1}^{n} v_i
$$

Now each of the $v_i$ can only be $-1,1,2,3$. 

Lets reorganise our results and say that $k_{-1}$ times they got $v_i = -1$, $k_1$ times $v_i = 1$, $k_2$ times $v_i = 2$ and $k_3$ times $v_i = 3$. Where $k_{-1} + k_1 + k_2 + k_3$ will be equal to $n$ because we are just placing our $n$ values into these boxes, then

$$
\bar{v} = \frac{1}{n}\sum_{i=1}^{n} v_i = (-1) \frac{k_{-1}}{n} + (1) \frac{k_{1}}{n} + (2) \frac{k_{2}}{n} + (3) \frac{k_{3}}{n}  
$$


$$
\bar{v} = \frac{1}{n}\sum_{i=1}^{n} v_i = (-1) \frac{k_{-1}}{n} + (1) \frac{k_{1}}{n} + (2) \frac{k_{2}}{n} + (3) \frac{k_{3}}{n}  
$$

is our average value - but as $n \to \infty$ the ratios $\frac{k_i}{n}$ tend to our original frequentist definition of probabilities - the number of times something occurs out of the total number of attempts.

$$
\frac{k_{-1}}{n} = p_V(-1), \frac{k_{1}}{n} = p_V(1), \frac{k_{2}}{n} = p_V(2), \frac{k_{3}}{n} = p_V(3)
$$

and finally we get

$$
\bar{v} = \mu_V = \sum_{i=1}^{n} v_i p_V(i) = \sum_{v \in R_V } v p_V(v) 
$$

in this case 

$$
\text{E} = \mu_V = \sum_{v \in R_V } v p_V(v) = (-1) \frac{125}{216} + (1) \frac{75}{216} + (2) \frac{15}{216} + (3) \frac{1}{216} = \frac{-17}{216} = -0.08
$$

On average the player loses 8p every time they play.

Note that the average in _not_ in general a value the $V$ can take on.

### Example - continuous random variable

Suppose you meet that friend of yours that is always late. You, of course, arrive on time ($T=0$), your friend shows up at a random time $T > 0$ with probability density function

$$
f_T(t) = \frac{2}{15} - \frac{2t}{225}, 0 < t < 15
$$

We'll ignore other times, where $f_T(t) = 0$

We could break this interval up into a large number, $n$, of small pieces with length $\Delta t = 15/n$ and let $t_1, t_2, t_3, \ldots , t_n$ be the midpoints of these small intervals.



The probability of your friend arriving between at time $t_i$ is approximately $P( t_i - \Delta t/2 < T < t_i + \Delta t/2 ) \approx f_T(t_i) \Delta t$.

We'd expect our average waiting time to be about $\sum_{i=1}^n t_i f_T(t_i) \Delta t$ - the sum of the product of length of wait, $t_i$, times the probability of waiting that long, $f_T(t_i) \Delta t$. The same as for the discrete random variable case.

If we now take $n$ larger and large towards infinity, we get

$$
\int_{0}^{15} tf_T(t) \text{d}t = \int_{0}^{15} t\Big(\frac{2}{15} - \frac{2t}{225}\Big) \text{d}t = 5 \text{ (minutes)}
$$
This is the length of time you expect to wait for your friend.

### Expectation of a function of a random variable

**Definition**

<div style="background-color:Gold; margin-left: 20px; margin-right: 20px; padding-bottom: 8px; padding-left: 8px; padding-right: 8px; padding-top: 8px; border-radius: 25px;">

Let $g(X)$ be any function of a discrete random variable $X$. 

Then

$$
\text{E}[g(X)] = \sum_{x \in X}g(x) \cdot p(x)
$$

</div>
<br>

<div style="background-color:Gold; margin-left: 20px; margin-right: 20px; padding-bottom: 8px; padding-left: 8px; padding-right: 8px; padding-top: 8px; border-radius: 25px;">

Let $h(Z)$ be any function of  the continuous random variable $Z$

$$
\text{E}[h(Z)] = \int_{-\infty}^{\infty} h(z)\cdot f(z) \mathrm{d}z
$$
</div>

these can be understood in the same way as $\text{E}[X]$, it is the value of $g(X)$ times the probability of getting that value $X=x$, and hence $g(x)$. 

The derivation using repetitions of experiments can be repeated, but replacing $v_i$ with $g(v_i)$.

**Example**

Find $\text{E}[X^2]$ for the continuous random variable $X$ with probability density function 

$$
\begin{align}
f_X(x) & = \frac{3}{4}x(2-x): 0 \leq x \leq 2 \\ 
     & = 0 \text{   otherwise}
\end{align}     
$$

taking our general formula

$$
\text{E}[h(Z)] = \int_{-\infty}^{\infty} h(z)\cdot f(z) \mathrm{d}z
$$

we fill in for our particular $f_X(x)$

$$
\text{E}[X^2] = \int_0^2 x^2 \cdot \frac{3}{4} x(2-x) \mathrm{d}x = \frac{6}{5}
$$

where we ignored the parts of the improper integral $\int_{-\infty}^{\infty} x^2 \cdot f(x) \mathrm{d}x$ where $f(x)$ is zero.

$\text{E}[g(X)]$ can be interpreted as the 'average' (mean) value of the $g(X)$. 

**It is a number not a function.**

#### Example

$\text{E}[a] = a$, where $a$ is a constant for any random variable.


this is a special case of 
$$
\text{E}[g(X)] = \sum_{x \in X}g(x) \cdot p(x)
$$
with $g(X) = a$. So

$$
\begin{align}
\text{E}[a] & = \sum_{x \in R_X} g(x) \cdot p_X(x) \\
            & = \sum_{x \in R_X} a \cdot p_X(x) \\
            & = a \sum_{x \in R_X}  p_X(x) \\
            & = a
\end{align}
$$

because the sum of all the $p_X(x)$ must be 1 for a valid probability mass function.

# Summary of part 1

we have given the definitions of
- expectation of random variables
- expectation of functions of a random variable