# Principle Component Analysis
Say we have N of D dimensional data points so our dataset will be $X \in R^{D \times N}$ with covariance $\Sigma \in R^{D \times D}$. We want the best ideas to reduce dimensionality of our dataset so that in the end we have $X' \in R^{q \times N}$ where $q << D$.

One idea is to project all the points into a direction which retains the most variance in our dataset, and that was the original idea of **PCA**.

If you feel comfortable with the covariance matrix and the related transformation it can apply to the data you can continue on. But if you want to get an image of the idea take a look [here](http://www.visiondummy.com/2014/04/geometric-interpretation-covariance-matrix/).

In 2D space we know that any data can be made by transformation of the white data (drawn from 2D normal RV with mean zero and diagonal covariance matrix with diagonal elements or variances equal 1). This transformation includes a scaling **S** and a rotation **R**. So transformation $T = RS$. If the covariance of the white data is $\Sigma = \begin{bmatrix} 1 & 0 \\ 0 & 1 \\ \end{bmatrix}$ and $D' = TD$ then $\Sigma' = RSSR^{-1}$. On the other hand $\Sigma' = VLV^{-1}$ where L is the diagonal matrix with eigenvalues and each column of V includes the respective eigen vectors. After equating previous two relations eigenvectors V are seen as only rotations and eigenvalues L are the scale in that direction. If we wanted to capture just the highest amont of variance in our data then this unit vector will be the eigenvector corresponding to the largest eigenvalue of the covariance matrix of the original dataset. So if that direction can be shown by a vector $U \in R^{D \times 1}$ (q=1) then $ X' = U^TX$  and $ X' \in R^{1 \times N} $. This statement can be more formarly followed in *Strang Linear Algebra page 475; ISBN: 9780980232714*. To preserve more than 1 dimension we can either sequentially find the largest eigenvector and falt out data in that direction by subtracting it from each datapoint and continue this procedure upto the desired dimension or equally take the first largest eigenvalues and the corresponding eigenvectors and transform data in their direction.

Lets have an example:

In [1]:
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import minimize


from scipy.linalg import det
from scipy.linalg import pinv2 as inv #pinv uses linalg.lstsq algorithm while pinv2 uses SVD
from scipy.stats import norm


%matplotlib inline
%load_ext autoreload
%autoreload 2
%autosave 0

Autosave disabled


## Gaussian Process Latent Variable Model
A probabilistic model used originaly for dimensionality reduction

In [4]:
from tensorflow.examples.tutorials.mnist import input_data

mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)


Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
Extracting MNIST_data/train-images-idx3-ubyte.gz
Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
