# Type of PCA                                                                                         

There are three types of PCA
1. PCA
    disadvantage:-
        A particular disadvantage of ordinary PCA is that the principal components are usually linear 
        combinations of all input variables
2. RandomizedPCA
    advantage:-
        Linear dimensionality reduction using approximated Singular Value Decomposition of the data and 
        keeping only the most significant singular vectors to project the data to a lower dimensional space.

This implementation uses a randomized SVD implementation and can handle both scipy.sparse and numpy dense arrays as input.
    
3. SparsePCA
    advantage:-
        A particular disadvantage of ordinary PCA is that the principal components are usually linear 
        combinations of all input variables. Sparse PCA overcomes this disadvantage by finding linear 
        combinations that contain just a few input variables.  
        
        Finds the set of sparse components that can optimally reconstruct the data. The amount of sparseness 
        is controllable by the coefficient of the L1 penalty, given by the parameter alpha.
        
        Contemporary datasets often have the number of input variables ({\displaystyle p}p) comparable 
        with or even much larger than the number of samples ({\displaystyle n}n). It has been shown that 
        if {\displaystyle p/n}{\displaystyle p/n} does not converge to zero, the classical PCA is not 
        consistent. But sparse PCA can retain consistency even if {\displaystyle p\gg n.}{\displaystyle p\gg n.}



In [67]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

from sklearn.datasets import load_digits, fetch_lfw_people
from sklearn.model_selection import train_test_split
from sklearn.decomposition import PCA as RandomizedPCA
from sklearn.decomposition import SparsePCA

In [39]:
faces = fetch_lfw_people(min_faces_per_person=60)

In [40]:
xtrain,xtest, ytrain,ytest = train_test_split(faces.data, faces.target, test_size=.3, random_state=13)

In [41]:
print('Shape of xtrain :',xtrain.shape)
print('Shape of xtest :',xtest.shape)
print('Shape of ytrain :',ytrain.shape)
print('Shape of ytest :',ytest.shape)

Shape of xtrain : (943, 2914)
Shape of xtest : (405, 2914)
Shape of ytrain : (943,)
Shape of ytest : (405,)


# Reducing Dimension using PCA

In [42]:
pca = PCA(n_components=3)
pca.fit(xtrain)
new_dim = pca.transform(xtrain)

In [43]:
print('Original dimension is :',xtrain.shape)
print('After reducing dimension is :',new_dim.shape)

Original dimension is : (943, 2914)
After reducing dimension is : (943, 3)


I am seeing in previous time dimension was row=943 and column=2914                                              
now the dimension is row=943 column=3                                                                         
Note: PCA reduce the number of feature not numbfers of rows                                                         
n_components parameter have range. It's range is 0 to min(n_sample, n_feature)

In [44]:
pca.components_

array([[-0.00669957, -0.0069887 , -0.00749792, ..., -0.00766871,
        -0.00617269, -0.00570053],
       [ 0.01656931,  0.0156669 ,  0.01580789, ..., -0.03752427,
        -0.03711444, -0.03550629],
       [-0.01995536, -0.01839831, -0.01683864, ..., -0.03024678,
        -0.02691727, -0.02500657]], dtype=float32)

In [45]:
pca.explained_variance_

array([768107.25, 609143.6 , 293751.56], dtype=float32)

the "components" to define the direction of the vector,                                                            
and the "explained variance" to define the squared-length of the vector

In [24]:
pca = PCA(n_components=405)
pca.fit(xtest)
new_dim = pca.transform(xtest)

In [26]:
print('Original dimension is :',xtest.shape)
print('After reducing dimension :',new_dim.shape)

Original dimension is : (405, 2914)
After reducing dimension : (405, 405)


original dimension rows=405 columns=2914                                                                                  
after reducing dimension rows=405 columns=405

In [None]:
pca = PCA(n_components=1000)
pca.fit(xtrain)

We can't do this because n_components value is not between 0 to min(n_sample, n_feature)

# Using RandomizedPCA

In [51]:
rpca = RandomizedPCA(n_components=500)
rpca.fit(xtrain)

PCA(copy=True, iterated_power='auto', n_components=500, random_state=None,
    svd_solver='auto', tol=0.0, whiten=False)

In [52]:
new_dim = rpca.transform(xtrain)

In [53]:
print('Original dimension :',xtrain.shape)
print('After reducing :',new_dim.shape)

Original dimension : (943, 2914)
After reducing : (943, 500)


In [56]:
rpca.components_

array([[-0.00669958, -0.00698873, -0.00749796, ..., -0.00766872,
        -0.00617271, -0.00570054],
       [ 0.01656925,  0.01566691,  0.0158079 , ..., -0.03752425,
        -0.03711441, -0.03550625],
       [-0.01995537, -0.01839848, -0.01683853, ..., -0.03024708,
        -0.02691753, -0.02500677],
       ...,
       [-0.01952291,  0.00492571,  0.00922741, ...,  0.01322585,
         0.0483762 , -0.05301105],
       [-0.00608023,  0.00840851,  0.03278473, ...,  0.02613602,
         0.01521712, -0.03184588],
       [-0.03551473, -0.00502403,  0.02582608, ..., -0.00517617,
         0.01218924, -0.03521325]], dtype=float32)

In [59]:
a = rpca.explained_variance_
a.shape

(500,)

In [None]:
rpca = RandomizedPCA(n_components=500)
rpca.fit(xtest)

It is not possible because n_components is not between 0 to min(n_samples, n_features)

# Backing into Previous Dimension Using PCA

In [60]:
pca = PCA(n_components=100)
pca.fit(xtrain)

PCA(copy=True, iterated_power='auto', n_components=100, random_state=None,
    svd_solver='auto', tol=0.0, whiten=False)

In [62]:
new_dim = pca.transform(xtrain)

In [63]:
print('Original dimension is :',xtrain.shape)
print('New dimension :',new_dim.shape)

Original dimension is : (943, 2914)
New dimension : (943, 100)


In [65]:
#backing to previous dimension
back_dimension = pca.inverse_transform(new_dim)
back_dimension.shape

(943, 2914)

Now it back to it's previous dimension

# Using SparsePCA

In [69]:
spca = SparsePCA(n_components=3)
spca.fit(xtest)



SparsePCA(U_init=None, V_init=None, alpha=1, max_iter=1000, method='lars',
          n_components=3, n_jobs=None, normalize_components=False,
          random_state=None, ridge_alpha=0.01, tol=1e-08, verbose=False)

In [70]:
new_dim = spca.transform(xtest)

In [71]:
print('Original dimension :', xtest.shape)
print('New dimension :', new_dim.shape)

Original dimension : (405, 2914)
New dimension : (405, 3)


In [72]:
spca.components_

array([[-1589.90063164, -1648.4207122 , -1761.23919366, ...,
        -1912.62062276, -1819.55761923, -1711.36998192],
       [ -138.5949215 ,  -129.16881982,  -118.91519592, ...,
          684.82019589,   644.82532294,   623.97369934],
       [ -490.34455389,  -482.77221579,  -497.45419975, ...,
         -686.93611031,  -606.84059791,  -551.16987113]])