In [None]:
# Slides for Probability and Statistics module, 2015-2016
# Matt Watkins, University of Lincoln

# Summary of relations given in part 1

**Definition**

Given a random variable $X$ with a probability mass function $p(x)$  we define the expectation of $X$, written as $\text{E}[X]$ as 

$$
\text{E}[X] = \sum_{x \in R_X} x\cdot p(x)
$$

The expectation of a discrete random variable $X$ is just the arithmetic mean of the values it takes on.

The equivalent for a continuous random variable $Z$ is

$$
\text{E}[Z] =  \int_{-\infty}^{\infty} z\cdot f(z) \mathrm{d}z
$$

again we see that the relationship is very close between discrete and continuous cases.

---


**Definition**

Let $g(X)$ be any function of a random variable $X$. Then

$$
\text{E}[g(X)] = \sum_{x \in R_X}g(x) \cdot p(x)
$$

or for the continuous random variable $Z$

$$
\text{E}[g(Z)] = \int_{-\infty}^{\infty} g(z)\cdot f(z) \mathrm{d}z
$$

---

if $X$ is a random variable, then

- $\text{E}[a] = a$
- $\text{E}[aX] = a\text{E}[X]$
- $\text{E}[g_1(X) + g_2(X)] = \text{E}[g_1(X)] + \text{E}[g_2(X)]$, where $g_1(X)$ and $g_2(X)$ are any functions of X. 

these define the properties of a linear operator. 




### Variance of a random variable

The mean describes where a distribution is centred - expected value.

The variance describes how widely the values of the distribution are spread around the mean.

**Definition**

<div style="background-color:Gold; margin-left: 20px; margin-right: 20px; padding-bottom: 8px; padding-left: 8px; padding-right: 8px; padding-top: 8px; border-radius: 25px;">

If $X$ is a random variable with mean $\mu$, then the variance of $X$, denoted by $\text{Var}(X)$ or $\sigma_X^2$, is defined by

$$
\sigma_X^2 = \text{Var}(X) = \text{E}[(X-\mu)^2]
$$

this can also be written

$$
\sigma_X^2 = \text{Var}(X) = \text{E}[X^2] - (\text{E}[X])^2 
$$
</div>

It is the expected squared distance of a value from the centre of the distribution.

## Standard Deviation

the standard deviation of a random variable $X$ is the square root of its variance.

$$
\sigma_X = \sqrt{\text{Var}(X)} = \sqrt{\text{E}[X^2] - (\text{E}[X])^2 } = \sqrt{\text{E}[(X-\mu)^2]}
$$

a major advantage of the standard deviation is that it has the same units, if any, as the variable itself.

It gives the expected root mean squared distance of a data point from the centre of the distribution.

The smaller the variance or standard deviation, the more well defined a distribution is.

## Example

show how the two formulae for the variance are equivalent

$$
\sigma_X^2 = \text{Var}(X) = \text{E}[(X-\mu)^2] =  \text{E}[X^2] - (\text{E}[X])^2 
$$

we can expand the bracket in $\text{E}[(X-\mu)^2]$ to get

$$
\text{E}[(X-\mu)^2] = \text{E}[X^2 - 2\mu X + \mu^2]
$$

and we use the fact that it is a linear operator to write this as

$$
\text{E}[X^2 - 2\mu X + \mu^2] = \text{E}[X^2] - \text{E}[2\mu X] + \text{E}[\mu^2]
$$

and then that $\text{E}[a] = a$ and $\text{E}[aX] = a \text{E}[X]$ to write

$$
\begin{align}
\text{E}[X^2 - 2\mu X + \mu^2] & = \text{E}[X^2] - \text{E}[2\mu X] + \text{E}[\mu^2] \\
                               & = \text{E}[X^2] - 2\mu\text{E}[X] + \mu^2 \\
                               & = \text{E}[X^2] - 2\mu \mu + \mu^2 \\
                               & = \text{E}[X^2] - (\text{E}[X])^2 
\end{align}
$$

because by definition $\mu = \text{E}[X]$.

### Useful identity of the variance

$$
\text{Var}(aX + b) = a^2 \text{Var}(X)
$$

a shift of the distribution doesn't change its spread.

## Moments of a random variable

The mean and variance describe a probability law for a random variable to some extent - the centre of the distribution $\mu_X = \text{E}[X]$ and how far points stray from the centre $\sigma_X^2$.

The $k^{th}$ moment of a random variable $X$ is defined as

$m_k = \text{E}[X^k]$

so 
$$
m_1 = \text{E}[X] = \mu_X
$$

the second moment is

$$
m_2 = \text{E}[X^2] = \sigma_X^2 + \mu_X^2
$$

so the mean and variance can be defined in terms of the first two moments of the distribution.

Higher moments define skewedness ($m_3$) - how wonky a distribution is (zero for a symmetric distribution) - and other details of the shape of the distribution.

A full set of moments fully defines a distribution (though sometimes not all moments will be well defined).

### Moment generating function 

There are some short cuts to calculating the moments of a distribution

The moment generating function of a random variable $X$ is

$m_X(t) = \text{E}[e^{tX}], -\infty < t < \infty$

This function can be used to generate (hence the name) the moments of a distribution.

Remember that

$$
e^{tx} = 1 + tx + \frac{(tx)^2}{2!} + \frac{(tx)^3}{3!} + \cdots
$$

the Taylor series of $e^{tx}$ about $x = 0$.



Then 

$$
\begin{align}
m_X(t) & = \text{E}[e^{tX}] \\
       & = \text{E}\Bigg[1 + tX + \frac{(tX)^2}{2!} + \frac{(tX)^3}{3!} + \cdots \Bigg]
\end{align}
$$

and again using the linear properties of $\text{E}[]$ we get

$$
\begin{align}
m_X(t) & =  1 + \text{E}[tX] + \frac{t^2}{2!}\text{E}[X^2] + \frac{t^3}{3!}\text{E}[X^3] + \cdots \\
       & = 1 + t m_1 + \frac{t^2}{2!} m_2 +  \frac{t^3}{3!} m_3 + \cdots
\end{align}
$$


If we now take derivatives wrt (with respect to) $t$ we get

$$
\frac{\text{d}m_X(t)}{\text{d}t}= m_1 +  t m_2 + \frac{t^2}{2!} m_3 + \cdots
$$

$$
\frac{\text{d}^2m_X(t)}{\text{d}t^2}= m_2 +  t m_3 + \frac{t^2}{2!} m_4 + \cdots
$$

finally setting $t = 0$ all terms in the series will be equal to zero except the first

$$
\frac{\text{d}m_X(t)}{\text{d}t} \Bigg|_{t=0} = m_1 
$$

$$
\frac{\text{d}^2m_X(t)}{\text{d}t^2} \Bigg|_{t=0} = m_2 
$$

etc.

## Example 

The length of time a transistor will work is a random variable $Y$ with density function

$$
\begin{align}
f_Y(y) & = 0.001e^{-0.001y}, \hfill y > 0 \\
       & = 0 \text{ otherwise:}
\end{align}
$$

the moment generating function of $Y$ is 
$$
\begin{align}
m_Y(t) & = \text{E}[e^{tY}]\\
       & = \int_0^{\infty} e^{ty} 0.001e^{-0.001y} \text{d}y\\
       & = \int_0^{\infty} 0.001e^{-y(0.001-t)} \text{d}y\\
       & = \frac{0.001}{0.001-t}
\end{align}
$$

to find the moments we take the derivative with respect to $t$ then set $t = 0$

$$
\mu = m^{(1)}(0) = 1000
$$

$$
m_2 = m^{(2)}(0) = 2(1000)^2
$$

so 

$$
\sigma^2 = m_2 - \mu^2 = (1000)^2
$$

# Summary

we have given the definitions of
- expectation of random variables
- variance of random variables
- expectation of sums of random variables.

next we will look at more examples and the expectations of joint random variables. 