# Table of Contents
 <p><div class="lev1"><a href="#Variance"><span class="toc-item-num">1&nbsp;&nbsp;</span>Variance</a></div><div class="lev1"><a href="#Covariance"><span class="toc-item-num">2&nbsp;&nbsp;</span>Covariance</a></div><div class="lev1"><a href="#Covariance-Matrix"><span class="toc-item-num">3&nbsp;&nbsp;</span>Covariance Matrix</a></div><div class="lev2"><a href="#Generate-some-sample-data"><span class="toc-item-num">3.1&nbsp;&nbsp;</span>Generate some sample data</a></div><div class="lev2"><a href="#Centering-data---Deviation-Scores-($x_{i}---\bar{x}$)"><span class="toc-item-num">3.2&nbsp;&nbsp;</span>Centering data - Deviation Scores ($x_{i} - \bar{x}$)</a></div><div class="lev2"><a href="#Centering-paramter_1"><span class="toc-item-num">3.3&nbsp;&nbsp;</span>Centering paramter_1</a></div><div class="lev3"><a href="#Step-1:-Calculate-the-mean-value-of-the-parameter_1-"><span class="toc-item-num">3.3.1&nbsp;&nbsp;</span><strong>Step 1</strong>: Calculate the mean value of the <strong>parameter_1 </strong></a></div><div class="lev3"><a href="#Step-2:-Remove-the-mean-value-from-the-parameter_1"><span class="toc-item-num">3.3.2&nbsp;&nbsp;</span><strong>Step 2</strong>: Remove the mean value from the <strong>parameter_1</strong></a></div><div class="lev2"><a href="#Center-data-using-Matrix-Algebra"><span class="toc-item-num">3.4&nbsp;&nbsp;</span>Center data using Matrix Algebra</a></div><div class="lev3"><a href="#Define-raw-data-as-a-Matrix"><span class="toc-item-num">3.4.1&nbsp;&nbsp;</span>Define raw data as a Matrix</a></div><div class="lev3"><a href="#Define-a-column-vector-of-ones"><span class="toc-item-num">3.4.2&nbsp;&nbsp;</span>Define a column vector of ones</a></div><div class="lev3"><a href="#Build-a-square-matrix-of-ones"><span class="toc-item-num">3.4.3&nbsp;&nbsp;</span>Build a square matrix of ones</a></div><div class="lev3"><a href="#Center-raw-data-Matrix"><span class="toc-item-num">3.4.4&nbsp;&nbsp;</span>Center raw data Matrix</a></div><div class="lev3"><a href="#Centering-Example-Data"><span class="toc-item-num">3.4.5&nbsp;&nbsp;</span>Centering Example Data</a></div><div class="lev2"><a href="#Calculate-Covariance-Matrix"><span class="toc-item-num">3.5&nbsp;&nbsp;</span>Calculate Covariance Matrix</a></div><div class="lev3"><a href="#Using-Matrix-Algebra"><span class="toc-item-num">3.5.1&nbsp;&nbsp;</span>Using Matrix Algebra</a></div><div class="lev3"><a href="#With-sample-data"><span class="toc-item-num">3.5.2&nbsp;&nbsp;</span>With sample data</a></div><div class="lev1"><a href="#Calculate-Covariance-Matrix-using-DataFrame"><span class="toc-item-num">4&nbsp;&nbsp;</span>Calculate Covariance Matrix using DataFrame</a></div><div class="lev1"><a href="#Calculate-Covariance-Matrix-using-numpy-arrays"><span class="toc-item-num">5&nbsp;&nbsp;</span>Calculate Covariance Matrix using numpy arrays</a></div>

In [55]:
import pandas as pd
import numpy as np
import numpy as np
from IPython.display import display, HTML, Math
from sympy import init_printing, Matrix, symbols, sqrt
init_printing(use_latex = 'mathjax')

# Variance
Variance is the average squared deviation from the mean. This measure of the variability of spread in a set of data.


$$\sigma^{2}(x) = \frac{\Sigma{(x_{i} - \bar{x})^2}}{N - 1}$$


- $N$ - Number of observations 
- $\bar{x}$ - mean of the given variable
- $x_{i}$ - $i^{th}$ row variable




# Covariance

**Covariance** is a measure of the extent to which corresponding elements from two sets of ordered data move in the same direction. $X$ and $Y$ are two vectors.
 $$V = \sigma^{2}_{XY} = \frac{1}{N-1}\sum(X_i - \bar{X})(Y_i-\bar{Y})$$
 
 Dividing by N-1 give us the unbiased estimator [read more](https://en.wikipedia.org/wiki/Unbiased_estimation_of_standard_deviation)
 
 
 

The **covariance** measures the degree of the linear relationship between two variables.

- $\sigma^2_{XY} >> 0$, A and B are positively correlated
- $\sigma^2_{XY} = 0$, A and B are NOT correlated 
- $\sigma^2_{XY} << 0$, A and B are negatively correlated
- $|\sigma^2_{XY}|$, Absolute magnitude of the covariance measure the degree of redundancy
- $\sigma^2_{XY} = \sigma^2_{X}$ if $X=Y$


# Covariance Matrix 

## Generate some sample data

Let us assume there is an experiment (observing stars in a galaxy, running an experiment in the lab or watching the stock market). Each observation measures some parameters. In the case of observing stars, we can record its Mass, Radius, Flux, and distance. Experiments in the lab could be a chemical reaction that has its temperature, reaction rate, color changes ... etc. Watching the stock market records high value, low value, buying price, selling price ...etc. We can pick any number of parameters, and that depends on the experiment and also what quantities we are interested in. For the practical purposes, let us assume all the parameters are recorded as floating point numbers.

We can store these observations in matrix (table format)

In [56]:
# random normal values
d1 = np.random.randn(3)
d2 = np.random.randn(3)
d3 = np.random.randn(3)
d4 = np.random.randn(3)
d5 = np.random.randn(3)

Observations = pd.DataFrame(
    [d1, d2, d3, d4, d5],
    index = [
        'Observation_1', 
        'Observation_2', 
        'Observation_3',
        'Observation_4',
        'Observation_5'
    ],
    columns={
        'parameter_1', 
        'parameter_2', 
        'parameter_3'
    }
)

Above code snippet generates some random values and put them into a data frame. In this dataset, we have three different **observations (row)**, and each observation records three separate **parameters (columns)**. 

In [57]:
display(Observations)

Unnamed: 0,parameter_3,parameter_2,parameter_1
Observation_1,1.354721,0.723148,1.074798
Observation_2,0.108113,0.226227,0.495319
Observation_3,0.687069,-0.791294,0.890422
Observation_4,1.256354,0.599203,-1.71036
Observation_5,-0.575724,-1.970987,0.100809


## Centering data - Deviation Scores ($x_{i} - \bar{x}$)
How to center. Centered independent variables are obtained just by subtracting the mean of the variable. Centering data is important because that makes interpretation os parameter estimators easier.


We have 3 parameters in the above dataset, and we have 5 observations. We need to center all three parameters in this data set. Will take a look at how to center **parameter_1** in details 

## Centering paramter_1

### **Step 1**: Calculate the mean value of the **parameter_1 **

$$\mu_{parameter\_1} = \frac{1}{5}\Big(\sum^{5}_{observation=1}{\big(parameter\_1_{observation}\big)}\Big)$$

In [58]:
mu_parameter_1 = Observations[['parameter_1']].mean()

### **Step 2**: Remove the mean value from the **parameter_1**
**Raw Data**

In [59]:
Observations[['parameter_1']]

Unnamed: 0,parameter_1
Observation_1,1.074798
Observation_2,0.495319
Observation_3,0.890422
Observation_4,-1.71036
Observation_5,0.100809


**parameter_1** mean

In [60]:
display(mu_parameter_1)

parameter_1    0.170197
dtype: float64

**centred Data**

In [61]:
Observations[['parameter_1']] - mu_parameter_1

Unnamed: 0,parameter_1
Observation_1,0.904601
Observation_2,0.325121
Observation_3,0.720224
Observation_4,-1.880558
Observation_5,-0.069389


## Center data using Matrix Algebra 

Calculating deviation scores and centering all the parameters can be done using linear algebra matrix manipulations. This is a very handy way to manipulate large amount of data.


### Define raw data as a Matrix 
Let us assume the raw data is in the Matrix $X$. Each row is an observation, and each column is a parameter.

$$X = 
\begin{bmatrix}
\vec{x}_{1} \\
\vec{x}_{2} \\
. \\
\vec{x}_{m}
\end{bmatrix} = 
\begin{bmatrix} 
x_{11} & x_{12} & ... & x_{1n} \\ 
x_{21} & x_{22} & ... & x_{2n} \\
. & . & ... & . \\
x_{m1} & x_{m2} & ... & x_{mn} \\
\end{bmatrix}_{m\times n}$$
 

### Define a column vector of ones
$$L = 
\begin{bmatrix}
1 \\
1 \\
. \\
1
\end{bmatrix}_{m\times 1}$$

### Build a square matrix of ones
$$LL^{T} = \begin{bmatrix}
1 \\
1 \\
. \\
1
\end{bmatrix}
\begin{bmatrix}
1 & 1 & . & 1 \\
\end{bmatrix} = \begin{bmatrix} 
1 & 1 & ... & 1 \\ 
1 & 1 & ... & 1 \\
. & . & ... & . \\
1 & 1 & ... & 1 \\
\end{bmatrix}_{m\times m}$$

### Center raw data Matrix

Transform the raw scores from matrix $X$ into deviation scores for matrix $D$.

$$D = X-\frac{1}{m}(LL^{T})X$$
 
$$D =
\begin{bmatrix} 
x_{11} & x_{12} & ... & x_{1n} \\ 
x_{21} & x_{22} & ... & x_{2n} \\
. & . & ... & . \\
x_{m1} & x_{m2} & ... & x_{mn} \\
\end{bmatrix}_{m\times n} - \frac{1}{m}\begin{bmatrix} 
1 & 1 & ... & 1 \\ 
1 & 1 & ... & 1 \\
. & . & ... & . \\
1 & 1 & ... & 1 \\
 \end{bmatrix}_{m\times m}\begin{bmatrix} 
x_{11} & x_{12} & ... & x_{1n} \\ 
x_{21} & x_{22} & ... & x_{2n} \\
. & . & ... & . \\
x_{m1} & x_{m2} & ... & x_{mn} \\
\end{bmatrix}_{m\times n}$$
 

$$D = 
\begin{bmatrix} 
x_{11} & x_{12} & ... & x_{1n} \\ 
x_{21} & x_{22} & ... & x_{2n} \\
. & . & ... & . \\
x_{m1} & x_{m2} & ... & x_{mn} \\
\end{bmatrix}_{m\times n} -
\begin{bmatrix} 
\frac{1}{m}\sum_{i=1}^{m}x_{i1} & \frac{1}{m}\sum_{i=1}^{m}x_{i2} & ... & \frac{1}{m}\sum_{i=1}^{m}x_{in} \\ 
\frac{1}{m}\sum_{i=1}^{m}x_{i1} & \frac{1}{m}\sum_{i=1}^{m}x_{i2} & ... & \frac{1}{m}\sum_{i=1}^{m}x_{in} \\ 
. & . & ... & . \\
\frac{1}{m}\sum_{i=1}^{m}x_{i1} & \frac{1}{m}\sum_{i=1}^{m}x_{i2} & ... & \frac{1}{m}\sum_{i=1}^{m}x_{in} \\ 
\end{bmatrix}_{m\times n}$$

$$D = 
\begin{bmatrix} 
x_{11} & x_{12} & ... & x_{1n} \\ 
x_{21} & x_{22} & ... & x_{2n} \\
. & . & ... & . \\
x_{m1} & x_{m2} & ... & x_{mn} \\
\end{bmatrix}_{m\times n} -
\begin{bmatrix} 
\mu_{param_1} & \mu_{param_2} & ... & \mu_{param_n} \\ 
\mu_{param_1} & \mu_{param_2} & ... & \mu_{param_n} \\ 
. & . & ... & . \\
\mu_{param_1} & \mu_{param_2} & ... & \mu_{param_n} \\ 
\end{bmatrix}_{m\times n}$$


$$D = \begin{bmatrix} 
(x_{11} - \mu_{param_1}) & (x_{12} - \mu_{param_2}) & ... & (x_{1n} - \mu_{param_n})\\ 
(x_{21} - \mu_{param_1}) & (x_{22} - \mu_{param_2}) & ... & (x_{2n} - \mu_{param_n})\\
. & . & ... & . \\
(x_{m1} - \mu_{param_1}) & (x_{m2} - \mu_{param_2}) & ... & (x_{mn} - \mu_{param_n})\\
\end{bmatrix}_{m\times n}$$

**Centerd data**
$$D = 
\begin{bmatrix} 
d_{11} & d_{12} & ... & d_{1n} \\ 
d_{21} & d_{22} & ... & d_{2n} \\
. & . & ... & . \\
d_{m1} & d_{m2} & ... & d_{mn} \\
\end{bmatrix}_{m\times n}$$



### Centering Example Data 

**Raw Data**

In [62]:
display(Observations)

Unnamed: 0,parameter_3,parameter_2,parameter_1
Observation_1,1.354721,0.723148,1.074798
Observation_2,0.108113,0.226227,0.495319
Observation_3,0.687069,-0.791294,0.890422
Observation_4,1.256354,0.599203,-1.71036
Observation_5,-0.575724,-1.970987,0.100809


**Centering data**

In [63]:
ObsCenterd = Observations - Observations.mean(axis=0)
display(ObsCenterd)

Unnamed: 0,parameter_3,parameter_2,parameter_1
Observation_1,0.788614,0.965889,0.904601
Observation_2,-0.457994,0.468967,0.325121
Observation_3,0.120963,-0.548553,0.720224
Observation_4,0.690247,0.841944,-1.880558
Observation_5,-1.141831,-1.728246,-0.069389


## Calculate Covariance Matrix
### Using Matrix Algebra

$$V = \frac{1}{m-1}D^{T}D$$

$$V = \frac{1}{m-1}
\begin{bmatrix} 
d_{11} & d_{21} & ... & d_{m1} \\
d_{12} & d_{22} & ... & d_{m2} \\
. 	   & 	.	& ... &	. \\
d_{1n} & d_{2n} & ... & d_{mn} \\
\end{bmatrix}_{n\times m} \times
\begin{bmatrix} 
d_{11} & d_{12} & ... & d_{1n} \\ 
d_{21} & d_{22} & ... & d_{2n} \\
. & . & ... & . \\
d_{m1} & d_{m2} & ... & d_{mn} \\
\end{bmatrix}_{m\times n}
$$




$$V = 
\begin{bmatrix} 
\frac{1}{m-1}\sum_{i=1}^{m-1}d^{2}_{i1} & \frac{1}{m-1}\sum_{i=1}^{m-1}d_{i1}d_{i2} & ... & \frac{1}{m-1}\sum_{i=1}^{m-1}d_{i1}d_{in} \\ 
\frac{1}{m-1}\sum_{i=1}^{m-1}d_{i2}d_{i1} & \frac{1}{m-1}\sum_{i=1}^{m-1}d^{2}_{i2} & ... & \frac{1}{m-1}\sum_{i=1}^{m-1}d_{i2}d_{in} \\ 
. & . & ... & . \\
\frac{1}{m-1}\sum_{i=1}^{m-1}d_{in}d_{i1} & \frac{1}{m-1}\sum_{i=1}^{m-1}d_{in}d_{i2} & ... & \frac{1}{m-1}\sum_{i=1}^{m-1}d^{2}_{nn} \\ 
\end{bmatrix}_{n\times n}$$


$$V = 
\begin{bmatrix} 
\sigma^{2}_{d_{i1}d_{i1}} & \sigma^{2}_{d_{i1}d_{i2}} & ... & \sigma^{2}_{d_{i1}d_{in}} \\ 
\sigma^{2}_{d_{i2}d_{i1}} & \sigma^{2}_{d_{i2}d_{i2}} & ... & \sigma^{2}_{d_{i2}d_{in}} \\ 
. & . & ... & . \\
\sigma^{2}_{d_{in}d_{i1}} & \sigma^{2}_{d_{in}d_{i2}} & ... & \sigma^{2}_{d_{nn}d_{nn}} \\ 
\end{bmatrix}_{n\times n}$$

The $ij^{th}$ element in the $V$ ($V_{X_{ij}}$) is the dot product between the vector $i^{th}$ parameter with the vector of the $j^{th}$ parameter. 

- $V$ is a square symmetric $n\times n$ matrix 
- The diagonal terms of $V$ are the **variance** of particular observations types
- The off-diagonal terms of $V$ are the **covariance** between observation types 



$V$ capture all the covariance among all the possible parameters in observations. The covariance values reflect the noise and redundancy in the parameters.

- Diagonal terms, by assumption, large values correspond to an interesting structure.
- Off-diagonal terms large magnitudes correspond to high redundancy 


Let us assume we can manipulate this covariance matrix. If you can do that, what features we want to optimize ? (Will cover this in another post)


### With sample data

$D^{T}$

In [64]:
display(ObsCenterd.T)

Unnamed: 0,Observation_1,Observation_2,Observation_3,Observation_4,Observation_5
parameter_3,0.788614,-0.457994,0.120963,0.690247,-1.141831
parameter_2,0.965889,0.468967,-0.548553,0.841944,-1.728246
parameter_1,0.904601,0.325121,0.720224,-1.880558,-0.069389


$D$

In [65]:
display(ObsCenterd)

Unnamed: 0,parameter_3,parameter_2,parameter_1
Observation_1,0.788614,0.965889,0.904601
Observation_2,-0.457994,0.468967,0.325121
Observation_3,0.120963,-0.548553,0.720224
Observation_4,0.690247,0.841944,-1.880558
Observation_5,-1.141831,-1.728246,-0.069389


In [66]:
# http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.dot.html
DtD = ObsCenterd.T.dot(ObsCenterd)/(len(ObsCenterd) - 1)

$V = \frac{1}{n-1}D^{T}D$

In [67]:
display(DtD)

Unnamed: 0,parameter_3,parameter_2,parameter_1
parameter_3,0.65663,0.758772,-0.141805
parameter_2,0.758772,1.287371,-0.208067
parameter_1,-0.141805,-0.208067,1.24601


# Calculate Covariance Matrix using DataFrame

Above sections explain the nuts and bolts of the covariance matrix. It is important to understand the each step on of this process; however, for the practical purposes, you do not need to do all these steps on your dataset. Instead, apply each of these steps to your dataset, you can use the built-in methods to generate the covariance matrix. 


In [68]:
ObsCenterd.cov()

Unnamed: 0,parameter_3,parameter_2,parameter_1
parameter_3,0.65663,0.758772,-0.141805
parameter_2,0.758772,1.287371,-0.208067
parameter_1,-0.141805,-0.208067,1.24601


# Calculate Covariance Matrix using numpy arrays

It is important to arrange the data (observations) in the proper format before calcaulte the covariance matrix. Arrange all the vectors (observations) as column vectors: each **column represent an observation and row represent parameters**. 

In [69]:
P = np.column_stack([d1, d2, d3, d4, d5])
P

array([[ 1.3547205 ,  0.10811275,  0.68706947,  1.25635385, -0.57572443],
       [ 0.72314792,  0.22622653, -0.79129412,  0.59920295, -1.97098655],
       [ 1.07479824,  0.49531879,  0.89042182, -1.71036014,  0.10080855]])

In [70]:
np.cov(P)

array([[ 0.65663041,  0.75877213, -0.14180543],
       [ 0.75877213,  1.2873712 , -0.20806728],
       [-0.14180543, -0.20806728,  1.24601032]])