In [2]:
import math
import scipy as sp
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

# required for interactive plotting
from __future__ import print_function
from ipywidgets import interact, interactive, fixed
import ipywidgets as widgets
import numpy.polynomial as np_poly

from IPython.display import Math
from IPython.display import Latex
from IPython.display import HTML

initialization  
$ \newcommand{\E}[1]{\mathbb{E}\left[#1\right]}$  
$ \newcommand{\V}[1]{\mathbb{V}\left[#1\right]}$
$ \newcommand{\cov}[1]{\text{cov} \sigma\left[#1\right]}$
$ \newcommand{\EXP}[1]{\exp\left(#1\right)}$  
$ \newcommand{\P}{\mathbb{P}}$
$\newcommand{\mat}[1]{
\left[
\begin{matrix}
#1
\end{matrix}
\right]
}$
$\newcommand{\commentgray}[1]{\color{gray}{\text{#1}}}$

$\newcommand{\Nl}[3]{\mathcal{N}\left(#1 \mid #2, #3\right)}$
$\newcommand{\Nstdx}{\Nl{\mathbf{x}}{\mathbf{\mu}}{\Sigma}}$
$\newcommand{\xb}{\mathbf{x}}$
$\newcommand{\yb}{\mathbf{y}}$
$\newcommand{\zb}{\mathbf{z}}$
$\newcommand{\Ub}{\mathbf{U}}$
$\newcommand{\mub}{\mathbf{\mu}}$
$\newcommand{\multivarcoeff}{\frac{1}{(2\pi)^{D/2}}
\frac{1}{\left| \mathbf{\Sigma}\right|^{1/2}}}$
$\newcommand{\multivarexp}[2]
{
\left\{
 -\frac{1}{2} 
 {#1}^T 
 #2
 {#1}
\right\}
}$
$\newcommand{\multivarexpx}[1]{\multivarexp{#1}{\Sigma^{-1}}}$
$\newcommand{\multivarexpstd}{\multivarexpx{(\xb-\mub)}}$


<div id='StandardForm' \>
Standard Form  
$$
\Nstdx
=
\multivarcoeff \exp \multivarexpstd
$$

Quadratic Form:
<div id='QuadraticForm'\>
$$
\Delta^2
=
(\xb - \mub)^T
\mathbf{\Sigma}^{-1}
(\xb - \mub)
$$
$\Delta$ is called the Mahalanobis Distance

$\Sigma$
* symmetric since the non-symmetric part will disappear if written as a combination of a symmetric part + asymmetric part
* Real valued
  * Hence the eigen values are real
  * Eigen vectors are orthonormal

init
$\newcommand{\u}{\mathbf{u}}$

* Eigen: If $\left\{\u_i\right\}, \left\{\lambda_i\right\}$ are the eigenvectors and eigenvalues of $\Sigma$, then
$$
\Sigma ~ \u_i = \lambda_i \u_i
$$

* Orthonormality
$$
\u_i^T \u_j
=
\begin{cases}
1 & \text{if i == j}\\
0 & \text{otherwise}
\end{cases}
$$

* <div id='SpectralDecomposition' \>Spectral Decomposition  
$
\begin{array}{ll}
\Sigma      &= \sum_{i=1}^{D} \lambda_i \u_i \u_i^T\\
\Sigma^{-1} &= \sum_{i=1}^{D} \frac{1}{\lambda_i} \u_i \u_i^T
\end{array}
$

* Determinant
$$
\left\lvert
\Sigma
\right\rvert
^{1/2}
=
\prod_{i=1}^{D} \lambda_i^{1/2}
$$

Multivariate to Independent Univariates
=====================

Substituting <a href="#SpectralDecomposition">Spectral Decomposition</a> into <a href="#QuadraticForm">Quadratic Form</a>, we get
\begin{array}{llr}
\Delta^2
&=
(\xb - \mub)^T
\left(
  \sum_{i=1}^{D} \frac{1}{\lambda_i} \u_i \u_i^T
\right)
(\xb - \mub)
\\
&=
\sum_{i=1}^{D} \frac{y_i^2}{\lambda_i}
\end{array}
where $y_i = \u_i^T (\xb - \mub)$. Further,
\begin{array}{ll}
\text{If } \yb
= 
\left[
\begin{matrix}
y_1 \\ \vdots \\ y_D
\end{matrix}
\right] 
&
\text{Then }
\yb = \Ub (\xb - \mub)
\\
\text{where }
\Ub
&=
\mat{\u_1^T \\ \vdots \\ \u_D^T}
\end{array}

Hence, in the $y_j$ coordinate system, <a href="#StandardForm">Standard Form</a> becomes
$$
p(\yb)
=
p(\xb)
\lvert \mathbf{J} \rvert
=
\prod_{i=1}^{D}
\frac{1}{(2\pi\lambda_i)^{1/2}}
\exp \left(-\frac{y_i^2}{2\lambda_i}\right)
$$
Thus, eigenvectors definea new set of shifted and rotated coordinate system,
where it becomes a set of independent univariate Gaussians with 
$y_i = \u_i^T (\xb - \mu), \lambda_i$ as mean and covariance

Moments
====

Mean
----
\begin{array}{llr}
\E{\xb}
&=
\multivarcoeff
\int \exp \multivarexpstd \xb ~dx
\\
&=
\multivarcoeff
\int \exp \multivarexpx{\zb} ~(\zb + \mub) ~d\zb
&
\commentgray{[1]}
\\
&=
\mub
\multivarcoeff
\int \exp \multivarexpx{\zb} ~d\zb
&
\commentgray{[2]}
\\
\implies
\E{\xb}
&=
\mub
\end{array}

1. Sub $\zb = (\xb - \mub)$
1. Since the exponent is a even component and the integral ranges from $(-\infty, \infty)$,  
$\zb$ in the factor $(\zb + \mub)$ will get the fuck lost, leaving just $\mub$.


Second Moment
------------

\begin{array}{llr}
\E{\xb \xb^T}
&=
\multivarcoeff
\int \exp \multivarexpstd \xb \xb^T ~d\xb
\\
&=
\multivarcoeff
\int \exp \multivarexpx{\zb} (\zb+\mub)(\zb+\mub)^T ~d\zb
&
\commentgray{[1]}
\\
&=
\multivarcoeff
\int \exp \multivarexpx{\zb} (\zb \zb^T + \mub \mub^T) ~d\zb
&
\commentgray{[2]}
\\
&=
\mub \mub^T
+
\multivarcoeff
\int \exp \multivarexpx{\zb} \zb \zb^T ~d\zb
\\
&=
\mub \mub^T
+
\multivarcoeff
\sum_{i=1}^{D} \sum_{j=1}^{D}
\u_i \u_j
\int \exp
\multivarexpx{\zb} \zb \zb^T
y_i y_j ~d\yb
&
\commentgray{[3]}
\\
&=
\mub \mub^T
+
\multivarcoeff
\sum_{i=1}^{D} \sum_{j=1}^{D}
\u_i \u_j
\int \exp
\left\{
  -\sum_{k=1}^{D}
     \frac{y_k^2}
          {2\lambda_k}
\right\}
y_i y_j ~d\yb
&
\commentgray{[4]}
\\
&=
\mub \mub^T
+
\multivarcoeff
\sum_{i=1}^{D}
\u_i \u_i^T
\int \exp
\left\{
  -\sum_{k=1}^{D}
     \frac{y_k^2}
          {2\lambda_k}
\right\}
y_i^2 ~d\yb
&
\commentgray{[5]}
\\
&=
\mub \mub^T
+
\sum_{i=1}^{D} \u_i \u_i^T \lambda_i
&
\commentgray{[6]}
\\
\E{\xb \xb^T}
&=
\mub \mub^T
+
\Sigma
&
\commentgray{[7]}
\end{array}

1. Sub $\zb = (\xb - \mub)$
2. Exponent is even $\implies \zb$ terms will vanish
3. $\zb = \sum_{i=1}^{D} y_i \u_j$
4. <a href='#SpectralDecomposition'>SD</a> of $\Sigma^{-1}$:
   $\left( \sum_{i=1}^{D} y_i \u_j^T \right)
    \left( \sum_{i=1}^{D} \frac{1}{\lambda_i} \u_i \u_i^T \right)
    \left( \sum_{i=1}^{D} y_i \u_j \right)
   $
5. By Symmetry,
$$
\text{integral term containing } y_i y_j = 
\begin{cases}
  y_i^2 & \text{if i==j}\\
  0 & \text{if i}\ne\text{j}
\end{cases}
$$
6. $\E{x^2} = \mu^2 + \sigma^2$
7. <a href='#SpectralDecomposition'>SD</a> of $\Sigma = \sum_{i=1}^{D} \lambda_i \u_i \u_i^T$

Covariance
---------
\begin{array}{llr}
\cov{\xb}
&=
\E{\left(\xb - \E{\xb}\right)\left(\xb - \E{\xb}\right)^T}
\\
&=
\E{\left(\xb - \mub\right)\left(\xb - \mub\right)^T}
&
\commentgray{$\E{\xb} = \mub$}
\\
&=
\E{\xb \xb^T} - \mub \mub^T
&
\commentgray{expanding}
\\
&=
\Sigma
&
\commentgray{$\E{\xb\xb^T} = \mub \mub^T + \Sigma$}
\end{array}
Hence the parameter $\Sigma$ is called the fucking covariance matrix.

Limitations
======

1. Complexity
  1. Number of parameters =
     Parameters in
     $\mu$ + Parameters in $\Sigma = D + \frac{D(D+1)}{2} = \frac{D(D+3)}{2}$ 
  1. Inverting Large $\Sigma$ becomes prohibitive.
1. Always Unimodal

Cure
----
1. Continuous Latent Variables: Tractable no. of parameters
  1. MRF's
  1. Linear dynamical systems
1. Mixture of Gaussians: Multimodal

Conditional Gaussian Distributions
===================


Marginal Gaussian Distributions
=================