# Principal Component Analysis

This notebook shows you a demo about how to use `learning` module. The very beginning step is __to put folder `learning` in your working directory__. Then you can import it for principal component analysis(PCA) case as follow,

In [1]:
from learning.pca import PrincipalCA

The documentation about `PrincipalCA` class can be accessed by

In [2]:
PrincipalCA?

[1;31mInit signature:[0m [0mPrincipalCA[0m[1;33m([0m[0mn_components[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m     
A class use to implement Principal Component Analysis

Attributes
----------
n_components : int
    the number of desired dimensions
X : float
    the input array
X_mean : float
    the standarized imput array
components : float
    the sorted eigenvectors array with [:, n_components] elements
explained_variance : float
    the sorted eigenvalues array with [:n_components] elements
x_cov_matrix : float
    the covariance matrix of X_mean
X_transform : float
    Reduced array dimension result
    
Methods
-------
fit(X, n_components)
    Train the model and result the components and explained_variance
    The proccess inside this method as follow:
        1. Compute the standarized input array (X_mean)
        2. Form the covariance matrix from X_mean
        3. Find the eigenvalues and eigenvectors of the covariance matrix
        4. Sort desce

### Module implementation

Now, I will show you how to implement the module.

In [3]:
import numpy as np

# Define a matrix
A = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]])
A

array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12]])

In [4]:
# Define the desired output dimension
dim = 2

# Create the PCS instance
model = PrincipalCA(dim)

# Fit the data
model.fit(A)

# Transform data
B = model.transform()
B

array([[ 7.79422863e+00, -4.44089210e-16],
       [ 2.59807621e+00, -2.22044605e-16],
       [-2.59807621e+00,  2.22044605e-16],
       [-7.79422863e+00,  4.44089210e-16]])

You can also access the eigenvalues(`explained_variance`) and the eigenvectors(`components`)

In [5]:
model.explained_variance

array([4.50000000e+01, 9.74548454e-17])

In [6]:
model.components

array([[-0.57735027, -0.57735027, -0.57735027],
       [ 0.        , -0.70710678,  0.70710678]])

### Benchmarking with `sklearn`

In [7]:
from sklearn.decomposition import PCA

# create the PCA instance
pca = PCA(2)

# fit on data
pca.fit(A)

# transform data
B_sklearn = pca.transform(A)
B_sklearn

array([[ 7.79422863e+00, -1.69309011e-15],
       [ 2.59807621e+00, -6.38378239e-16],
       [-2.59807621e+00,  6.38378239e-16],
       [-7.79422863e+00,  1.69309011e-15]])

In [8]:
pca.explained_variance_

array([4.50000000e+01, 1.18244258e-31])

In [9]:
pca.components_

array([[-0.57735027, -0.57735027, -0.57735027],
       [ 0.81649658, -0.40824829, -0.40824829]])