# Effect of Covariance Matrix on Bivariate Gaussian



# Linear Transformation of a multivariate normal distribution

Given a random $\bar{x} = [x1, x2,.....,xn]$ and a linear transformation $f(x) = A\bar{x}+\bar{b} = \bar{y}$

and the assumption that $\bar{x} \sim N(\bar{\mu_{\bar{x}}},\Sigma_\bar{x})$

The parameters of $y$ can be found as follows:

$$\mathbf{y} = A\mathbf{x}+ \mathbf{b}$$

calculating for the mean, $$E(\mathbf{y}) = E(\mathbf{Ax} + \mathbf{b})$$
$$E(\mathbf{y}) = \mathbf{A}E(\mathbf{x}) + \mathbf{b} = \mathbf{A}\bar{\mu_{{x}}}$$

for the variance,

$$\mathbf{\Sigma_y} = Var(\mathbf{Ax + b}) = Var(\mathbf{Ax})+ Var(\mathbf{b})$$
$$ = \mathbf{A\Sigma_xA^T}+ \mathbf{\Sigma_b}$$

For the distribution $\bar{\mathbf{Z}} = [{\bar{x}},{\bar{y}}]^T$,
since $\bar{x}, \bar{y}, \bar{b}$ are normally distributed, 

$$\mathbf{\mu_{xy}} = \bigg[\begin{array}{c}\bar{\mu_{x}}\\ \mathbf{A}\bar{\mu_{{x}}} \end{array}\bigg]$$

$$
    Cov(x,y) = E[(\mathbf{x}-\mu_x)(\mathbf{y}-\mu_y)] \\
       = E[\mathbf{xy^T}]-E[\mathbf{x}]E[\mathbf{y}^T]\\
             = E[\mathbf{xx^T}A^T+\mathbf{xb}^T]-\mathbf{\mu_x\mu_x^TA^T}\\
             = E[\mathbf{xx^T}]A^T-\mathbf{\mu_x\mu_x^TA^T}\\
             = \Sigma_x A^T
$$
and 
$$
       \Sigma_{xy} = \bigg[\begin{array}{c c}
       Var(x) & Cov(x,y)\\
       Cov(y,x) & Var(y)
       \end{array}\bigg]
$$

$$
       \Sigma_{xy} = \bigg[\begin{array}{c c}
       \Sigma_x & \Sigma+xA^T\\
       A\Sigma_x  & {A\Sigma_xA^T}+ {\Sigma_b}\\
       \end{array}\bigg]
$$

# Mahalanobis Distance

The Mahalanobis distance is a measurement of distance from a multivariate random variable from its mean. This is analogous to the Euclidean distance from a variable $x_i$ to its mean or centroid $\mu_x$. The difference between the Euclidean distance and the Mahalanobis distance is that the Mahalanobis distance chooses basis vectors which minimize the variance along those basis vectors, whereas the Euclidean distance maintains the original basis vectors in which the data was recorded. More specifically, the Eigen vectors are selected to be the new basis vectors of the original data and scaled down to elminate the variance of the distribution along each eigen vector. 

![eigens_cov](images/eigens_covariance.png)

Distribution showing covariance and eigen values

In the direction along each eigen vector $v_i$, **each data point** is scaled by a factor of $1/\sqrt{\lambda_i}$. This eliminates the effect that variables with high variance have on the distance measure and from here we can use the Euclidean distance to measure the distances from $x_i$ to the mean $\mu_i$


**Calculating the distance**
The Mahalanobis distance can be calculated as 

$$D^2 = (x - \bar{x})^TS^{-1}(x - \bar{x})$$


### Python Implementation

In [50]:
import numpy as np

def mahalanobis(x, input_data):
    """
    Compute the Mahalanobis distance from a given data point to the 
    mean of points of the given dataset
    
    Args:
        x: the data points whose distance to the mean is to be measures
        input_data: the dataset
    Returns : _
    Raises: _
    """
    
    mu_x_distance = x-np.mean(input_data, axis = 0)
    inv_cov = np.linalg.inv(np.cov(input_data.T))
    mahal_distance_sq = np.matmul(np.matmul(mu_x_distance.T,inv_cov),mu_x_distance)
    return np.sqrt(mahal_distance_sq)
    

In [56]:
a = np.random.rand(10,4).astype(np.float32)*10
print("input data\n", a)
x = a[4]
print("random observation", x)
m = mahalanobis(x,a)
print("mahalanobis distance",m)

input data
 [[6.997874   3.6564732  0.81739104 8.629433  ]
 [5.6590056  0.11363751 9.655211   7.5297422 ]
 [2.6685863  8.615622   7.300787   1.2548863 ]
 [1.1316918  6.5021687  5.6171536  7.090973  ]
 [9.776507   1.8921409  9.894703   2.3003526 ]
 [2.5225303  5.9856806  4.772582   4.413095  ]
 [7.636512   1.9576786  4.7321267  1.3114064 ]
 [0.52352417 8.240083   3.927919   7.2479157 ]
 [2.0298588  1.5984614  7.1859426  2.35942   ]
 [4.984607   2.8390322  2.3046315  1.2996609 ]]
random observation [9.776507  1.8921409 9.894703  2.3003526]
mahalanobis distance 2.2837965697856344
