Covariance

$ n $ - number of points

$ m $ - number of dimensions

$ h, i $ - indices goes over the rows of $X$ 

$ j, k$ - indices used to go over the columns of $X$

$ X = [ x_1, x_2, ..., x_j, ... , x_m ] $

What we want is the covariance matrix of $ X^T $


In [34]:
import numpy as np

n = 6
m = 3

X = np.array([ [0.6787,    0.6948,    0.7094],
    [0.7577,    0.3171,    0.7547],
    [0.7431,    0.9502,    0.2760],
    [0.3922,    0.0344,    0.6797],
    [0.6555,    0.4387,    0.6551],
    [0.1712,    0.3816,    0.1626]
    ])
print(X)

[[0.6787 0.6948 0.7094]
 [0.7577 0.3171 0.7547]
 [0.7431 0.9502 0.276 ]
 [0.3922 0.0344 0.6797]
 [0.6555 0.4387 0.6551]
 [0.1712 0.3816 0.1626]]


$ \mu_j = \frac{1}{n} \sum^n_{i=0} x_{ij} $

In [35]:
Mu = np.mean(X,axis=0);
print(Mu)


[0.5664 0.4695 0.5396]


$ C_{jk} = \frac{1}{n-1} \sum^n_{i=0} (x_{ij} - \mu_j)^T(x_{ik} - \mu_k)$

In [36]:
np.set_printoptions(precision=4)
C = np.matmul(np.subtract(X,Mu).transpose(),np.subtract(X,Mu)) / float(n-1)
print(C)

[[ 0.055   0.0378  0.0297]
 [ 0.0378  0.1006 -0.0305]
 [ 0.0297 -0.0305  0.0639]]


In [37]:
print(np.cov(X.transpose()))

[[ 0.055   0.0378  0.0297]
 [ 0.0378  0.1006 -0.0305]
 [ 0.0297 -0.0305  0.0639]]


In [38]:
C2 = np.zeros((m,m))
for j in range(0,m):
    for k in range(0,m):
        sum_var = 0.0
        for i in range(0,n):
            sum_var = (X[i][j]-Mu[j])*(X[i][k]-Mu[k])+sum_var
        C2[j][k] = (sum_var)/float(n-1)
        
print(C2)

[[ 0.055   0.0378  0.0297]
 [ 0.0378  0.1006 -0.0305]
 [ 0.0297 -0.0305  0.0639]]


If we were to expand $C_{jk} $ we could separate out different components

$ C_{jk} = \frac{1}{n-1} \left[ \sum^n_{i=0}{x_{ij} x_{ik}} - \mu_j \left(\sum^n_{i=0} {x_{ij}} \right) - \left( \sum^n_{i=0} {x_{ik}} \right) \mu_k  + \mu_j \mu_k \right] $



If we substitute for the summations

$ A_{jk} = \sum^n_{i=0} {x_{ij} x_{ik}} $

Where 

\begin{equation} \mathbf{A} = \begin{matrix} 
A_{11} & A_{12} & ...    & A_{1i} & ...    & A_{1m} \\
A_{21} & A_{22} & ...    & A_{2i} & ...    & A_{2m} \\
\vdots & \vdots & \ddots & \vdots & ...    & \vdots \\
A_{h1} & A_{h2} & ...    & A_{hi} & ...    & A_{hm} \\
\vdots & \vdots & ...    & \vdots & \ddots & \vdots \\
A_{m1} & A_{m2} & ...    & A_{mi} & ...    & A_{mm}
\end{matrix}
\end{equation}

$ B_{j} = \sum^n_{i=0} {x_{ij}} $

Where 

$ \mathbf{B} = \left[ B_{1}, B_{2}, ..., B_{i}, ..., B_{m} \right] $

We can reduce the equation to

$ C_{jk} = \frac{1}{n-1} \left[ A_{jk} - \mu_j B_j - B_k \mu_k  + \mu_j \mu_k \right] $

In [39]:
A = np.zeros((m,m))
for j in range(0,m):
    for k in range(0,m):
        sum_var = 0.0
        for i in range(0,n):
            sum_var = X[i][j]*X[i][k]+sum_var
        A[j][k] = sum_var
        
B = np.zeros(m)
for j in range(0,m):
    sum_var = 0.0   
    for i in range(0,n):
        sum_var = X[i][j]+sum_var
    B[j] = sum_var
                   

In [40]:
X_new = np.array([[0.0318,    0.7952,    0.4984],
    [0.2769,    0.1869,    0.9597],
    [0.0462,    0.4898,    0.3404],
    [0.0971,    0.4456,    0.5853],
    [.8235,    0.6463,    0.2238]])
print(X_new)

n_new = 5

[[0.0318 0.7952 0.4984]
 [0.2769 0.1869 0.9597]
 [0.0462 0.4898 0.3404]
 [0.0971 0.4456 0.5853]
 [0.8235 0.6463 0.2238]]


In [41]:
X_full = np.concatenate([X,X_new])
print(X_full)
print("\nCovariance matrix")
print(np.cov(X_full.transpose()))

[[0.6787 0.6948 0.7094]
 [0.7577 0.3171 0.7547]
 [0.7431 0.9502 0.276 ]
 [0.3922 0.0344 0.6797]
 [0.6555 0.4387 0.6551]
 [0.1712 0.3816 0.1626]
 [0.0318 0.7952 0.4984]
 [0.2769 0.1869 0.9597]
 [0.0462 0.4898 0.3404]
 [0.0971 0.4456 0.5853]
 [0.8235 0.6463 0.2238]]

Covariance matrix
[[ 0.0981  0.0173  0.0037]
 [ 0.0173  0.0717 -0.0344]
 [ 0.0037 -0.0344  0.0639]]


We will start by updating $ \mu $ 

$ n' $ - is the total number of points in the new batch

$ n^\dagger $ - is the combined total number of points

$ n^\dagger = n' + n $

$ \mu_j^\dagger = \frac{1}{n^\dagger} \left( \mu_j n + \sum^{n'}_{i=0} x'_{ij} \right)$

In [42]:
# Step 1 calculate Mu of the new addition
Mu_full = 1/(n+n_new) * ( np.sum(X_new,axis=0) + Mu * n )
print(Mu_full)
print("Should be equal to")
print(np.mean(X_full,axis=0))

[0.4249 0.4891 0.5314]
Should be equal to
[0.4249 0.4891 0.5314]


In [43]:
n_full  = n_new + n

The next step is to Adjust $ \mathbf{A} $

$ A^\dagger_{jk} = A_{jk} +  \sum^{n'}_{i=0} {x'_{ij} x'_{ik}} $

And $\mathbf{B}$

$ B^\dagger_{j} = B_j + \sum^{n'}_{i=0} x'_{ij} $

In [50]:
A_full = np.zeros((m,m))
for j in range(0,m):
    for k in range(0,m):
        sum_var = 0.0
        for i in range(0,n_new):
            sum_var = X_new[i][j]*X_new[i][k]+sum_var
        A_full[j][k] = A[j][k] + sum_var
        
B_full = np.zeros(m)
for j in range(0,m):
    sum_var = 0.0   
    for i in range(0,n_new):
        sum_var = X_new[i][j]+sum_var
    B_full[j] = B[j] + sum_var

In [53]:
# Now to calculate the new covariance matrix

C_full = np.zeros((m,m))
for j in range(0,m):
    for k in range(0,m):
        C_full[j,k] = 1.0/(n_full-1) * ( A_full[j,k] - Mu_full[j]*B_full[k] - B_full[k]*Mu_full[j]+Mu_full[j]*Mu_full[k]*n_full)
print(C_full)

[[ 0.0981  0.0173  0.0037]
 [ 0.0173  0.0717 -0.0344]
 [ 0.0037 -0.0344  0.0639]]
