### Principal Component Analysis - Through SVD  From Scratch

In [112]:
import numpy as np
import math
np.random.seed(123)

In [2]:
mu_vec1 = np.array([0,0,0])
cov_mat1 = np.array([[1,0,0],[0,1,0],[0,0,1]])
class1_sample = np.random.multivariate_normal(mu_vec1, cov_mat1, 20).T

mu_vec2 = np.array([1,1,1])
cov_mat2 = np.array([[1,0,0],[0,1,0],[0,0,1]])
class2_sample = np.random.multivariate_normal(mu_vec2, cov_mat2, 20).T

In [8]:
class1_sample.shape

(3, 20)

#### Computing the Covariance Matrix - Example

In [164]:
# For the below Array
X = np.array([[90, 60, 90],
              [90, 90, 30],
              [60, 60, 60],
              [60, 60, 90],
              [30, 30, 30],])
print(X.shape)

(5, 3)


In [61]:

# V = 	
# Σ x12 / N    	    Σ x1 x2 / N    	. . .    	Σ x1 xc / N	
# Σ x2 x1 / N    	Σ x22 / N    	. . .    	Σ x2 xc / N
# . . .    	. . .    	. . .    	. . .
# Σ xc x1 / N    	Σ xc x2 / N    	. . .    	Σ xc2 / N

#Create a deviation Matrix ----->  x = X - 11'X ( 1 / n )

#ones array is a n X 1 vector of ones
ones_array = np.ones((5, 1), dtype=np.int32)
ones_one_t= ones_array.dot(ones_array.transpose())

x= X - ones_one_t.dot(X)*1/5

# Multiple difference Matrix with its Transpose to get the deviation matrix
dev_matrix = x.transpose().dot(x)
print("Shape of deviation Matrix is ", dev_matrix.shape)


#Covariance matrix    ---- >   V = x'x ( 1 / n )
V = dev_matrix*(1/5)
print("Covariance matrix is\n", V)

Shape of deviation Matrix is  (3, 3)
Covariance matrix is
 [[504. 360. 180.]
 [360. 360.   0.]
 [180.   0. 720.]]


### PCA Steps

1. Take the whole dataset consisting of d+1 dimensions and ignore the labels such that our new dataset becomes d dimensional.
2. Compute the mean of every dimension of the whole dataset.
3. Compute the covariance matrix of the whole dataset ( sometimes also called as the variance-covariance matrix)
4. Compute Eigenvectors and corresponding Eigenvalues
5. Sort the eigenvectors by decreasing eigenvalues and choose k eigenvectors with the largest eigenvalues to form a d × k dimensional matrix W.
6. Transform the samples onto the new subspace. W is a d X K matrix
    y = W′ × x

#### Demonstrate that the S matrix in the SVD is a diagnol matrix of singular value(sqrt(eigen values) of A X t(A)) in D.Order


In [241]:
U, s, V = np.linalg.svd(X)
S= np.diag(s)
print("The S matrix is \n ", S)

# 
mat=X.dot(X.transpose())
w,v=np.linalg.eig(mat)
print("\nThe Eigen values of the A X A_transpose are ", w)

print("\nThe Square root of the Eigen values of the A X A_transpose are ")
for w in w:
    print(math.sqrt(w))

The S matrix is 
  [[249.53284904   0.           0.        ]
 [  0.          56.2059804    0.        ]
 [  0.           0.          16.56034466]]

The Eigen values of the A X A_transpose are  [6.22666428e+04 2.74245015e+02 3.15911223e+03 1.62665088e-13
 1.72168893e-28]

The Square root of the Eigen values of the A X A_transpose are 
249.53284904341118
16.56034465876612
56.20598040298784
4.0331760235843843e-07
1.312131443737577e-14


In [244]:
#Getting back the original Matrix by taking all the SVD Components

Z=np.zeros((X.shape[0],X.shape[1]))
Z[:-2,:] = S
U.dot(Z.dot(V))

array([[90., 60., 90.],
       [90., 90., 30.],
       [60., 60., 60.],
       [60., 60., 90.],
       [30., 30., 30.]])

In [266]:
#Getting back the original Matrix by taking only the top-3 eigen vector SVD Components
U[:,:3].dot(S[:3,:3].dot(V[:3,]))

array([[90., 60., 90.],
       [90., 90., 30.],
       [60., 60., 60.],
       [60., 60., 90.],
       [30., 30., 30.]])

In [267]:
#Getting back the original Matrix by taking only the top-2 eigen vector SVD Components
U[:,:2].dot(S[:2,:2].dot(V[:2,]))

array([[80.77681996, 68.78746566, 91.53548819],
       [90.38374858, 89.63438018, 29.93611299],
       [63.69648152, 56.47814481, 59.38460447],
       [66.27696838, 54.01956334, 88.95500134],
       [31.84824076, 28.23907241, 29.69230224]])

#### From the above example we can infer that the top 'K' components approximates the original matrix

In [308]:
U[:3,:2]

array([[-0.55752795,  0.29800626],
       [-0.48940614, -0.83179009],
       [-0.41590784,  0.02929798]])