Covariance Trivial Example

$ n $ - number of points

$ m $ - number of dimensions

$ h, i $ - indices goes over the rows of $X$ 

$ j, k$ - indices used to go over the columns of $X$

$ X = [ x_1, x_2, ..., x_j, ... , x_m ] $

What we want is the covariance matrix of $ X^T $


$ \mu_j = \frac{1}{n} \sum^n_{i=0} x_{ij} $

$ C_{jk} = \frac{1}{n-1} \sum^n_{i=0} (x_{ij} - \mu_j)^T(x_{ik} - \mu_k)$

If we were to expand $C_{jk} $ we could separate out different components

$$ C_{jk} = \frac{1}{n-1} \left[ \sum^n_{i=0}{x_{ij} x_{ik}} - \mu_j \left(\sum^n_{i=0} {x_{ik}} \right) - \left( \sum^n_{i=0} {x_{ij}} \right) \mu_k  + n \mu_j \mu_k \right] $$


$$ C_{jk} = \frac{1}{n-1} \left[\sum_{i=0}^n x_{ij}x_{ji} - \mu_j ( n \mu_k ) - ( n \mu_j ) \mu_{k} + n \mu_{j} \mu_{k} \right] $$

$$ C_{jj} = \frac{1}{n-1} \left[ \sum^n_{i=0} x_{ij} x_{ji} - n \mu_j \mu_i \right] $$

How do we rewrite this equation to get $A$ from $C_{jk}$


If we substitute for the summations

$$ A_{jk} = \sum^n_{i=0} {x_{ij} x_{ik}}$$

Where 

\begin{equation} \mathbf{A} = \begin{matrix} 
A_{11} & A_{12} & ...    & A_{1i} & ...    & A_{1m} \\
A_{21} & A_{22} & ...    & A_{2i} & ...    & A_{2m} \\
\vdots & \vdots & \ddots & \vdots & ...    & \vdots \\
A_{h1} & A_{h2} & ...    & A_{hi} & ...    & A_{hm} \\
\vdots & \vdots & ...    & \vdots & \ddots & \vdots \\
A_{m1} & A_{m2} & ...    & A_{mi} & ...    & A_{mm}
\end{matrix}
\end{equation}


We can rewrite the equation as:

$$ C_{ij} = \frac{1}{n-1} \left[ A_{ij} - n \mu_i \mu_j \right] $$

In [55]:
import numpy as np

dimensions = 2
number_pts = 3

data = np.array([[1.0, 4.0],[2.0, 5.0],[3.0, 6.0]])
print(data)

[[1. 4.]
 [2. 5.]
 [3. 6.]]


In [56]:
# Here we show what the correct solution is using numpy's builtin calculation
print(np.cov(data.transpose()))

[[1. 1.]
 [1. 1.]]


In [57]:
A = np.zeros((dimensions,dimensions))
for j in range(0,dimensions):
    for k in range(0,dimensions):
        sum_var = 0.0
        for i in range(0,number_pts):
            sum_var = data[i][j]*data[i][k]+sum_var
        A[j][k] = sum_var
        
print(A)

[[14. 32.]
 [32. 77.]]


In [58]:
# Here we are calculating the mean
Mu = np.mean(data,axis=0);
print(Mu)

[2. 5.]


In [59]:
# Here we explicitly calculate the covariance matrix
Cov = np.zeros((dimensions,dimensions))
for j in range(0,dimensions):
    for k in range(0,dimensions):
        Cov[j,k] = 1.0/(3-1) * ( A[j,k] - 3*Mu[j]*Mu[k])
print(Cov)

[[1. 1.]
 [1. 1.]]
