In [1]:
# Slides for Proabability and Statistics module, 2016-2017
# Matt Watkins, University of Lincoln

# Covariance and Correlation

last week we looked at the properties of expectations of random variables.


<div style="background-color:Gold; margin-left: 20px; margin-right: 20px; padding-bottom: 8px; padding-left: 8px; padding-right: 8px; padding-top: 8px; border-radius: 25px;">
By the end of this lecture you should know:
<br><br>
<li> how to calculate the expectation value of a jointly distributed discrete random variable  </li>
<li> how to calculate the covariance and correlation of jointly distributed random variables.</li>
<li> how to calculate expectation values and variances of sums of independent random variables</li>
</div>

# Expectation and Variance revisited

- The expectation value gives the weighted average of the possible values of the random variable. The weighting is the likelihood that that value turns up. This is the **mean** of the random variable, often written as $\mu$.

Given a random variable $X$ with a probability mass function $p(x)$  we define the expectation of $X$, written as $\text{E}[X]$ as 

$$
\text{E}[X] = \sum_{\text{all $x$}} x\cdot p(x)
$$

The expectation of a discrete random variable $X$ is just the arithmetic mean of the values it takes on.

The equivalent for a continuous random variable $Z$ is

$$
\text{E}[Z] = \int_{\text{all $z$}} z\cdot f(z) \mathrm{d}z
$$

**example**  
let $X$ be the random variable 'number of heads' when two coins are flipped, what is the expectation of $X$?

Lets write down the probability mass function of $X$.

$p(0)=\frac{1}{4}, p(1)= \frac{1}{2}, p(2)= \frac{1}{4}$, then the expectation of $X$ is given by

$$ \text{E}[X] = \sum_{\text{all $x$}} x\cdot p(x) = \\
0 \cdot p(0) + 1 \cdot p(1) + 2 \cdot p(2) = \\
0 \cdot \frac{1}{4} + 1 \cdot \frac{1}{2} + 2 \cdot \frac{1}{4} = 1 = \mu_X
$$

this is the expected number of heads between two coins. From our initial discussions this should be the long-term average number of heads if the experiment was repeated many, many times.

**example**  
In a game the player wins back a sum that is the 'square of the number of heads' on two coins.

We have a new random variable $Y$ with probability mass function

$p(0)=\frac{1}{4}, p(1)= \frac{1}{2}, p(4)= \frac{1}{4}$, then the expectation of $Y$ is given by

$$ \text{E}[Y] = \sum_{\text{all $y$}} y\cdot p(y) = \\
0 \cdot p(0) + 1 \cdot p(1) + 4 \cdot p(4) = \\
0 \cdot \frac{1}{4} + 1 \cdot \frac{1}{2} + 4 \cdot \frac{1}{4} = \frac{3}{2} = \mu_Y
$$

note that $Y$ = $X^2$.

$$ \text{E}[X^2] = \sum_{\text{all $x$}} x^2\cdot p(2) = \\
0 \cdot p(0) + 1 \cdot p(1) + 2^2 \cdot p(2) = \\
0 \cdot \frac{1}{4} + 1 \cdot \frac{1}{2} + 4 \cdot \frac{1}{4} = \frac{3}{2} = \mu_{X^2}
$$

This is the square of number of heads between two coins. From our initial discussions this should be the long-term average if the experiment was repeated many, many times. It may not be a possible value of $Y$

**Definition**

Let $g(X)$ be any function of a random variable $X$. Then

$$
\text{E}[g(X)] = \sum_{\text{all $x$}}g(x) \cdot p(x)
$$

or for the continuous random variable $Z$

$$
\text{E}[g(Z)] = \int_{\text{all $z$}} g(z)\cdot f(z) \mathrm{d}z
$$

---

if $X$ is a random variable, then

- $\text{E}[a] = a$
- $\text{E}[aX] = a\text{E}[X]$
- $\text{E}[g_1(X) + g_2(X)] = \text{E}[g_1(X)] + \text{E}[g_2(X)]$, where $g_1(X)$ and $g_2(X)$ are any functions of X. 

these define the properties of a linear operator.


**Definition**

If $X$ is a random variable with mean $\mu$, then the variance of $X$, denoted by $\text{Var}(X)$, is defined by

$$
\text{Var}(X) = \text{E}[(X-\mu)^2]
$$

this can also be written

$$
\text{Var}(X) = \text{E}[X^2] - (\text{E}[X])^2 
$$

---

we saw in the problem class that

$$
\text{Var}(aX + b) = a^2 \text{Var}(X)
$$



**example**
the variance of $X$ - we know that it is given by

$$
\text{Var}(X) = \text{E}[X^2] - \text{E}[X]^2 = \frac{3}{2} - 1^2 = \frac{1}{2}
$$

and we have a standard deviation of $\sqrt{\text{Var}(X)} = \frac{1}{4}$.

It is easier to see the meaning of the variance from the alternative form:

$$
\text{Var}(X) = \text{E}[(X-\mu_X)^2] =  \sum_{\text{all $x$}} (x - \mu_X) ^2 \cdot p(x) = \\
(0 - 1)^2 \cdot p(0) + (1 - 1)^2 \cdot p(1) + (2 - 1)^2 \cdot p(2) = \frac{1}{2}
$$

In agreement with the previous value.

**It is the expected value of the square of the distance of $X$ to $\mu_X$.**

- If $X$ only took on its average value, the variance would be 0. 
- The closer the values of $x$ are to $\mu$ the smaller the value of $\text{var}(X)$. 
- The less likely values of $x$ far from $\mu$ are, the smaller the variance.

# Joint random variables.

We defined two random variables - $X$ and $Y$. What about if it was important to consider the *joint* probability mass function of $X$ and $Y$. This is written as 


<div style="background-color:Gold; margin-left: 20px; margin-right: 20px; padding-bottom: 8px; padding-left: 8px; padding-right: 8px; padding-top: 8px; border-radius: 25px;">
$$P\{X=x,Y=y\} = p(x,y)$$

for discrete random variables, or

$$P\{X=x,Y=y\} = f(x,y)dx dy$$

for continuous ones.
</div>

Remember the capitalised $X$ is the random variable, $x$ is a value it can take on.

How many values can this function take in the discrete case? 


#### Short example

Lets look at some data from last year, and set up two random variables:

- $X=0$ if a student was doing straight maths, and $X=1$ if a student was doing any other course code.  
- $Y=0$ if a student's surname did not begin with M, $Y=1$ if a students surname began with M.

There are 4 joint probabilities - $X$ is 0 or 1 and $Y$ is 0 or 1. 

So we need to define $p(0,0), p(0,1),p(1,0),p(1,1)$

These are the probabilities that both of the variables have specific values.

| | $X=0$ | $X=1$ | sum |
|-|-|-|-|
|$Y=0$|$\frac{22}{41}$|$\frac{16}{41}$|$\frac{38}{41}$|
|$Y=1$|$\frac{3}{41}$|$\frac{0}{41}$|$\frac{3}{41}$|
|sum|$\frac{25}{41}$|$\frac{16}{41}$|$\frac{41}{41}$|

We found these values by looking through the spreadsheet of student information similar to the one we aggregated a few computational classes ago. 

The marginal values give us the individual probability mass functions of $X$, $p_X(x)$ and $Y$, $p_Y(y)$. 




### Marginal distributions

We can see this property of the marginal values (column or row sums) as each entry in the table is 

$$
P\{X=x,Y=y\} = p(x,y) = P(X=x \cap Y=y).
$$

We then have using the law of total probability

$$
P(A) = \sum_{i=1}^n P(A \cap E_i) = \sum_{i=1}^n P(A \mid E_i) P(E_i)
$$

we can immediately see, specialising to our case where the $E_i$ are the values of $y$

$$
P\{X=x\} = \sum_{\text{all $y$}} P(X=x \cap Y=y) = \sum_{\text{all $y$}}P(X=x \mid Y=y) P(Y=y).
$$

and equally 

$$
P\{Y=y\} = \sum_{\text{all $x$}} P(Y=y \cap X=x) = \sum_{\text{all $x$}}P(Y=y \mid X=x) P(X=x).
$$.

#### Continuous random variables

as normal, we replace the sums with integrals:

$$
\begin{align}
P\{ x - \Delta x /2 \leq X \leq x + \Delta x /2\}\Delta x & = \int_{-\infty}^{\infty} f(x,y) dy \Delta x  \\
                 & = f_X(x) \Delta x 
\end{align}
$$

and equally 

$$
\begin{align}
P\{ y - \Delta y /2 \leq Y \leq y + \Delta y /2\}\Delta y & =\int_{-\infty}^{\infty} f(x,y) dx \Delta y \\
                 & = f_Y(y) \Delta y 
\end{align}
$$.

## Expectation of Sums of Random Variables

We now come to a very important general result that will be the basis of a lot of the development we do later. 

What we will seek to show is that 

$$
\text{E}[X_1 + \cdots + X_n] = \text{E}[X_1] + \cdots + \text{E}[X_n],
$$

the expectation of the sum of the random variables $X_1 \ldots X_n$ is just the sum of their individual expectations. 

Lets start with just two variables, $X$ and $Y$. We can use the general expression for the expectation of a function of two random variables

$$
\begin{align}
\text{E}[X+Y] & = \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} (x+y)f(x,y) \mathrm{d}x \mathrm{d}y \\
              & = \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} x f(x,y) \mathrm{d}x \mathrm{d}y + \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} y f(x,y) \mathrm{d}x \mathrm{d}y \\
              & = \int_{-\infty}^{\infty} x f_{X}(x) \mathrm{d}x + \int_{-\infty}^{\infty} y f_{Y}(y) \mathrm{d}y \\
              & = \text{E}[X] + \text{E}[Y]
\end{align}
$$

At this point we can use proof by induction to show what we were after:

$$
\text{E}[X_1 + \cdots + X_n] = \text{E}[X_1] + \cdots + \text{E}[X_n],
$$


**example**
In fact we have already seen this result: 

We gave $X$ as the 'number of heads' when two coins were flipped.

Let us define three new random variables, $X_1$ the number of heads on the first coin, $X_2$ the number of heads on the second coin, and then $X = X_1 + X_2$.

We want the expectation value of $X_1 + X_2$. The expectation value of each is given by

$$ 
\begin{align}
\text{E}[X_1] & = \text{E}[X_2] = \sum_{\text{all $x_1$}} x_1\cdot p(x_1) \\
              & = 0 \cdot p(0) + 1 \cdot p(1) \\
              & = 0 \cdot \frac{1}{2} + 1 \cdot \frac{1}{2} \\ 
              & = \frac{1}{2} = \mu_{X_1} \\ 
              & = \mu_{X_2}
\end{align}
$$

we can see that as expected

$$
\text{E}[X] = \text{E}[X_1] + \text{E}[X_2] = 1
$$

what about the variance of the variables?

$$ 
\begin{align}
\text{E}[X_1^2] & = \text{E}[X_2^2] = \sum_{\text{all $x_1$}} x_1^2\cdot p(x_1) \\
& = 0 \cdot p(0) + 1 \cdot p(1)  \\
& = 0 \cdot \frac{1}{2} + 1 \cdot \frac{1}{2}  \\
& = \frac{1}{2}
\end{align}
$$

so

$$
\text{Var}(X_1) = \text{Var}(X_2) = \frac{1}{2} - (\frac{1}{2})^2 =  \frac{1}{4}
$$

and we get **in this case** (because $X_1$ and $X_2$ are independent)

$$
\text{Var}(X) = \text{Var}(X_1) + \text{Var}(X_2)
$$

It should be clear that this would hold for tossing three coins, four coins $\ldots$ $n$ coins, which again could be proved by induction.

## Expectation of Joint random variables.

The joint probability mass/density function contains all the information about the variables it describes.

We've just seen that we can retrieve the individual probability mass/density functions by integrating/summing over the other variables. 

If we have a general function of the variables, $g(X,Y)$, say, then we can obtain the expectation value of that function as

$$
\text{E}[g(X,Y)] = \sum_{y} \sum_{x} g(x,y)p(x,y)
$$

for discrete random variables and 

$$
\text{E}[g(X,Y)] = \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} g(x,y)f(x,y) \mathrm{d}x \mathrm{d}y
$$


## Variances of Sums of Random variables.


$$
\text{Var}(X + Y) = \text{Var}(X) + \text{Var}(Y)
$$

**if and only if the two variables are independent.**

Or in general if $n$ variables are mututally independent. 

$$
\text{Var}(X_1 + X_2 + .. + X_n) = \text{Var}(X_1) + \text{Var}(X_2) + .. + \text{Var}(X_n)
$$

This formula is actually a special case of something called the **covariance** between two random variables.

---

**Definition**  
the covariance between $X$ and $Y$, denoted by $\text{Cov}(X,Y)$, is defined by
$$
\text{Cov}(X,Y) = \text{E}\Big[ (X - \text{E}[X])\cdot(Y - \text{E}[Y]) \Big]
$$

---

**example**
back to our coins,

$$
\begin{align}
\text{Cov}(X_1,X_2) & = \text{E}\Big[ (X_1 - \text{E}[X_1]) \cdot (X_2 - \text{E}[X_2]) \Big] \\
                    &=  \sum_{\text{all $x_1$}}\sum_{\text{all $x_2$}} (x_1 - \mu_{X_1} ) \cdot (x_2 - \mu_{X_2} )\cdot p(x_1, x_2)
\end{align}
$$

were the second line comes from the general expression

$$
\text{E}[g(X,Y)] = \sum_{y} \sum_{x} g(x,y)p(x,y)
$$

adjusted to our particular case.

So,

$$
\begin{align}
\text{Cov}(X_1,X_2) & = \text{E} \Big[ (X_1 - \text{E}[X_1])\cdot(X_2 - \text{E}[X_2]) \Big] \\
                    &=  \sum_{\text{all $x_1$}}\sum_{\text{all $x_2$}} (x_1 - \mu_{X_1} ) \cdot (x_2 - \mu_{X_2} )\cdot p(x_1, x_2)
\end{align}
$$

In this case we can very reasonably expect that $p(x_1,x_2) = p(x_1)p(x_2)$ (remember the definition of independent probabilities). We can now write out all the terms in the double sum:

$$
\begin{align}
\sum_{\text{all $x_1$}}\sum_{\text{all $x_2$}} (x_1 - \mu_{X_1} ) \cdot (x_2 - \mu_{X_2} )\cdot p(x_1, x_2) = \\
(0-\frac{1}{2})\times(0-\frac{1}{2}) \times \frac{1}{2}\times\frac{1}{2} + \\
(1-\frac{1}{2})\times(0-\frac{1}{2}) \times \frac{1}{2}\times\frac{1}{2} + \\
(0-\frac{1}{2})\times(1-\frac{1}{2}) \times \frac{1}{2}\times\frac{1}{2} + \\
(1-\frac{1}{2})\times(1-\frac{1}{2}) \times \frac{1}{2}\times\frac{1}{2} = \\
\frac{1}{16} - \frac{1}{16} -\frac{1}{16} + \frac{1}{16} = 0
\end{align}
$$

this will be the case whenever the two variables are independent.

#### Independent variables have 0 covariance

We can also see that the independence of $X$ and $Y$ guarantees that $\text{Cov}(X,Y)=0$ from this expression by using the result that

$$
\text{E}[XY] = \text{E}[X]\text{E}[Y]
$$

for independent variables. 

$$
\implies \text{Cov}(X,Y) = \text{E}[XY] - \text{E}[X]\text{E}[Y] = 0
$$

for independent variables. 

Again we can check that this holds for our $X_1$ and $X_2$.

### Useful formula

In a similar way to the variance the covariance can more easily be caluclated using an alternative expression

$$
\begin{align}
\text{Cov}(X,Y) & = \text{E}\Big[(X - \text{E}[X])\cdot(Y - \text{E}[Y])\Big] \\
                & = \text{E}\Big[XY - \text{E}[X]Y - X\text{E}[Y] + \text{E}[X]\text{E}[Y]\Big] \\
                & = \text{E}\Big[XY] - \text{E}[X]\text{E}[Y] - \text{E}[X]\text{E}[Y] + \text{E}[X]\text{E}[Y]\Big]\\ 
                & = \text{E}[XY] - \text{E}[X]\text{E}[Y]
\end{align}
$$



**Example**

calculate $\text{E}[X_1 X_2]$ and $\text{Cov}(X_1,X_2)$ for our two coin example.

$$
\begin{align}
\text{E}[X_1 X_2] = \sum_{\text{all $x_1$}}\sum_{\text{all $x_2$}} (x_1 ) \cdot (x_2 )\cdot p(x_1, x_2) = \\
\sum_{\text{all $x_1$}}\sum_{\text{all $x_2$}} (x_1 ) \cdot (x_2 )\cdot p(x_1) \cdot p(x_2) = \\
(0)\times(0) \times \frac{1}{2}\times\frac{1}{2} + \\
(1)\times(0) \times \frac{1}{2}\times\frac{1}{2} + \\
(0)\times(1) \times \frac{1}{2}\times\frac{1}{2} + \\
(1)\times(1) \times \frac{1}{2}\times\frac{1}{2} = \frac{1}{4}
\end{align}
$$

looking back we see that this is indeed equal to $\text{E}[X_1]\text{E}[X_2]$, so again

$$
\begin{align}
\text{Cov}(X_1,X_1) & = \text{E}\Big[(X_1 - \text{E}[X_1])\cdot(X_2 - \text{E}[X_2])\Big] \\
                    & = \text{E}[X_1 X_2] - \text{E}[X_1]\text{E}[X_2] = 0
\end{align}
$$


## Variance is a special case of covariance

remember

$$
\text{Cov}(X,Y) = \text{E} \Big[(X - \text{E}[X]) \cdot (Y - \text{E}[Y]) \Big]
$$

in the special case that $X=Y$,

$$
\begin{align}
\text{Cov}(X,X) & = \text{E} \Big[(X - \text{E}[X])(X - \text{E}[X]) \Big] \\
                & = \text{E}\Big[(X - \text{E}[X])^2\Big] \\
                & = \text{Var}(X)
\end{align}
$$

so the covariance can have either sign, but the variance is the square of a value, so must be positive semi-definite ($\geq 0$).

# Summary

We have 
- defined the covariance of two random variables $X$ and $Y$
- examined the meaning of the covariance
- shown that $\text{Cov}(X,Y) = 0$ for independent random variables.