# Principal Component Analysis (PCA)

Principal component analysis (PCA) is a technique for reducing the dimensionality of such datasets, increasing interpretability but at the same time minimizing information loss. It does so by creating new uncorrelated variables that successively maximize variance.

![alt text](https://image.slidesharecdn.com/lecture6-pca-151214043230/95/lecture6-pca-12-638.jpg?cb=1450067576)



In [1]:
from numpy import mean, cov
from numpy.linalg import eig
import numpy as np
from sklearn.decomposition import PCA

In [2]:
x=np.array([[1,2,3,4],[3,4,5,6],[5,6,7,8],[7,8,9,0],[9,0,1,2]])

pca=PCA(3)
pca.fit(x)

PCA(copy=True, iterated_power='auto', n_components=3, random_state=None,
    svd_solver='auto', tol=0.0, whiten=False)

In [3]:
print(pca.components_)
print(pca.explained_variance_)

[[ 0.00000000e+00 -7.07106781e-01 -7.07106781e-01  0.00000000e+00]
 [ 7.07106781e-01  1.11022302e-16  9.93013661e-17 -7.07106781e-01]
 [-7.07106781e-01  1.11022302e-16  9.93013661e-17 -7.07106781e-01]]
[20. 15.  5.]


In [4]:
B=pca.transform(x)
print(B)

[[ 2.82842712e+00 -2.82842712e+00  2.82842712e+00]
 [ 0.00000000e+00 -2.82842712e+00  2.22044605e-16]
 [-2.82842712e+00 -2.82842712e+00 -2.82842712e+00]
 [-5.65685425e+00  4.24264069e+00  1.41421356e+00]
 [ 5.65685425e+00  4.24264069e+00 -1.41421356e+00]]


#PCA from Scratch

In [5]:
A=np.array([[1,2,3,4],[3,4,5,6],[5,6,7,8],[7,8,9,0],[9,0,1,2]])

M=mean(A.T, axis=1)
A=A-M

In [6]:
C=cov(A.T)

In [7]:
values, vectors=eig(C)

print(vectors)
print(values)

[[ 0.70710678  0.70710678  0.          0.        ]
 [ 0.          0.          0.70710678  0.70710678]
 [ 0.          0.          0.70710678 -0.70710678]
 [-0.70710678  0.70710678  0.          0.        ]]
[1.50000000e+01 5.00000000e+00 2.00000000e+01 1.77635684e-15]


In [8]:
B=np.dot(vectors[:,:3].T, A.T)
print(B.T)

[[-2.82842712e+00 -2.82842712e+00 -2.82842712e+00]
 [-2.82842712e+00  2.22044605e-16  0.00000000e+00]
 [-2.82842712e+00  2.82842712e+00  2.82842712e+00]
 [ 4.24264069e+00 -1.41421356e+00  5.65685425e+00]
 [ 4.24264069e+00  1.41421356e+00 -5.65685425e+00]]
