### reference

* https://machinelearningmastery.com/calculate-principal-component-analysis-scratch-python/
* https://www.datacamp.com/community/tutorials/principal-component-analysis-in-python
* https://sebastianraschka.com/Articles/2014_pca_step_by_step.html
* https://arxiv.org/pdf/1404.1100.pdf

In [1]:
import numpy as np

In [21]:
A = np.array([[1, 2], [3, 4], [5, 6]])

In [23]:
print(A)

[[1 2]
 [3 4]
 [5 6]]


In [32]:
mean = np.mean(A, axis=0)

In [33]:
mean

array([3., 4.])

In [34]:
# center means by subtracting the means
C = A - mean

In [35]:
C

array([[-2., -2.],
       [ 0.,  0.],
       [ 2.,  2.]])

In [36]:
# CALculate the covariance matrix

cov_mat = np.cov(C.T)

In [37]:
cov_mat

array([[4., 4.],
       [4., 4.]])

In [38]:
# eigen decomposition of a matrix

values, vectors = np.linalg.eig(cov_mat)

In [39]:
values

array([8., 0.])

In [40]:
vectors

array([[ 0.70710678, -0.70710678],
       [ 0.70710678,  0.70710678]])

In [50]:
# project the data

P = C.dot(vectors)

In [51]:
P

array([[-2.82842712,  0.        ],
       [ 0.        ,  0.        ],
       [ 2.82842712,  0.        ]])

In [20]:

# Principal Component Analysis
from numpy import array
from sklearn.decomposition import PCA


# define a matrix
A = array([[1, 2], [3, 4], [5, 6]])
print(A)

# create the PCA instance
pca = PCA(2)

# fit on data
pca.fit(A)

# access values and vectors
print(pca.components_)
print(pca.explained_variance_)
# transform data
B = pca.transform(A)
print(B)


[[1 2]
 [3 4]
 [5 6]]
[[ 0.70710678  0.70710678]
 [ 0.70710678 -0.70710678]]
[8.00000000e+00 2.25080839e-33]
[[-2.82842712e+00  2.22044605e-16]
 [ 0.00000000e+00  0.00000000e+00]
 [ 2.82842712e+00 -2.22044605e-16]]


In [55]:
np.testing.assert_almost_equal(P, B)

### Summary of assumptions

1. Linearity: Linearity frames the problem as a change of basis
2. Large variances have important structure
3. The principal components are orthogonal