    # The Mathematics behind Principal Component Analysis

## Introduction

The central idea of principal component analysis (PCA) is to reduce the dimensionality of a data set consisting of a large number of interrelated variables while retaining as much as possible of the variation present in the data set. 

This is achieved by transforming to a new set of variables, the principal components (PCs), which are uncorrelated, and which are ordered so that the first few retain most of the variation present in all of the original variables.

## Mathematics Behind PCA

PCA can be thought of as an unsupervised learning problem. The whole process of obtaining principle components from a raw dataset can be simplified in six parts :

    - Take the whole dataset consisting of d+1 dimensions and 
    ignore the labels such that our new dataset becomes d 
    dimensional.
    - Compute the mean for every dimension of the whole dataset.
    - Compute the covariance matrix of the whole dataset.
    - Compute eigenvectors and the corresponding eigenvalues.
    - Sort the eigenvectors by decreasing eigenvalues and choose k eigenvectors with the largest eigenvalues to form a d × k dimensional matrix W.
    - Use this d × k eigenvector matrix to transform the samples
    onto the new subspace.

So, let’s unfurl the maths behind each of this one by one.

    Take the whole dataset consisting of d+1 dimensions and ignore the labels such that our new dataset becomes d dimensional.

Let’s say we have a dataset which is d+1 dimensional. Where d could be thought as X_train and 1 could be thought as y_train (labels) in modern machine learning paradigm. So, X_train + y_train makes up our complete train dataset.

So, after we drop the labels we are left with d dimensional dataset and this would be the dataset we will use to find the principal components. Also, let’s assume we are left with a three-dimensional dataset after ignoring the labels i.e d = 3.

we will assume that the samples stem from two different classes, where one-half samples of our dataset are labeled class 1 and the other half class 2.

Let our data matrix X be the score of three courses :

In [12]:
import pandas as pd
import numpy as np

ss_df = pd.DataFrame([[90,60,90],
                      [90,90,30],
                      [60,60,60],
                      [60,60,90],
                      [30,30,30]
                     ])
ss_df.columns = ['Math','English','Art']
ss_df.index.rename('Student')
ss_df.index += 1
ss_df

Unnamed: 0,Math,English,Art
1,90,60,90
2,90,90,30
3,60,60,60
4,60,60,90
5,30,30,30


In [14]:
ss_means = pd.Series(ss_df.mean())
ss_means

Math       66.0
English    60.0
Art        60.0
dtype: float64

Compute the covariance matrix of the whole dataset ( sometimes also called as the variance-covariance matrix)

So, we can compute the covariance of two variables X and Y using the following formula


$$Cov(X,Y) = \frac{1}{1-n}\sum\nolimits_{i=1}^{n} (X_{i} - \bar{x})    (Y_{i} - \bar{y}) $$


Using the above formula, we can find the covariance matrix of A. Also, the result would be a square matrix of d ×d dimensions.

In [16]:
ss_df

Unnamed: 0,Math,English,Art
1,90,60,90
2,90,90,30
3,60,60,60
4,60,60,90
5,30,30,30


In [15]:
ss_df.cov()

Unnamed: 0,Math,English,Art
Math,630.0,450.0,225.0
English,450.0,450.0,0.0
Art,225.0,0.0,900.0


Compute Eigenvectors and corresponding Eigenvalues

    Intuitively, an eigenvector is a vector whose direction 
    remains unchanged when a linear transformation is applied to
    it.

Now, we can easily compute eigenvalue and eigenvectors from the covariance matrix that we have above.