# Introduction to Principal Component Analysis (PCA) for Machine Learning.

1. Basics of PCA.
2. Calculate PCA from scratch with NumPy.
3. Using PCA from sklearn.decomposition.
4. Some applications of PCA.
5. What are the assumptions and limitations of PCA?

## 1. Basics of PCA.

* Principal Component Analysis, or PCA for short, is a method for reducing the dimensionality of data.
* It can be thought of as a projection method where data with m-columns (features) is projected into a subspace with m or fewer columns, whilst retaining the essence of the original data.

* The first step is to **calculate the mean values of each column**.
* Next, we need to **center the values in each column by subtracting the mean** column value.
* The next step is to **calculate the covariance matrix of the centered matrix**.
    * **Correlation is a normalized measure** of the amount and direction (positive or negative) that two columns change together.
    * **Covariance is a generalized and unnormalized** version of correlation across multiple columns.
    * A **covariance matrix** is a calculation of covariance of a given matrix with covariance scores for every column with every other column, including itself.
* Finally, we calculate the **eigendecomposition of the covariance matrix**. This results in a list of eigenvalues and a list of eigenvectors.

* The eigenvectors can be sorted by the eigenvalues in descending order to provide a ranking of the components or axes of the new subspace for A.
* **If all eigenvalues have a similar value**, then we know that the existing representation may already be reasonably compressed or dense and that the projection may offer little.
* If there are **eigenvalues close to zero**, they represent components or axes of B that may be discarded.
* A total of **m or less components must be selected** to comprise the chosen subspace. Ideally, we would **select k eigenvectors, called principal components**, that have the **k largest eigenvalues**.

Other matrix decomposition methods can be used such as **Singular-Value Decomposition**, or SVD. As such, generally the values are referred to as singular values and the vectors of the subspace are referred to as principal components.

Once chosen, data can be projected into the subspace via matrix multiplication:

P = B^T . A

* A is the original data that we wish to project.
* B^T is the transpose of the chosen principal components (vectors).
* P is the projection of A.

This is called the **covariance method for calculating the PCA**, although there are alternative ways to to calculate it.

## 2. Calculate PCA from scratch with NumPy.

In [1]:
from numpy import array
from numpy import mean
from numpy import cov
from numpy.linalg import eig

A = array([[1, 2], [3, 4], [5, 6]])
print(A)

[[1 2]
 [3 4]
 [5 6]]


In [2]:
# calculate the mean of each column
M = mean(A.T, axis=1)

# center columns by subtracting column means
C = A - M

# calculate covariance matrix of centered matrix
V = cov(C.T)
print(V)

[[4. 4.]
 [4. 4.]]


In [3]:
# eigendecomposition of covariance matrix
values, vectors = eig(V)
print(vectors)
print(values)

[[ 0.70710678 -0.70710678]
 [ 0.70710678  0.70710678]]
[8. 0.]


In [4]:
# project data
P = vectors.T.dot(C.T)
print(P.T)

[[-2.82842712  0.        ]
 [ 0.          0.        ]
 [ 2.82842712  0.        ]]


## 3. Using PCA from sklearn.decomposition.

In [8]:
from sklearn.decomposition import PCA

A = array([[1, 2], [3, 4], [5, 6]])

# create a PCA instance and fit the data
pca = PCA(2)
pca.fit(A)
# access vectros and values
print(pca.components_)
print(pca.explained_variance_)

[[ 0.70710678  0.70710678]
 [ 0.70710678 -0.70710678]]
[8.00000000e+00 2.25080839e-33]


In [9]:
# transform data
B = pca.transform(A)
print(B)

[[-2.82842712e+00  2.22044605e-16]
 [ 0.00000000e+00  0.00000000e+00]
 [ 2.82842712e+00 -2.22044605e-16]]


http://scikit-learn.sourceforge.net/dev/modules/decomposition.html#principal-component-analysis-pca

## 4. Some applications of PCA.

* PCA is predominantly used as a **dimensionality reduction** technique in domains like **facial recognition, computer vision and image compression**. 
* It is also used for **finding patterns in data of high dimension** in the field of **finance, data mining, bioinformatics, psychology**, etc.

https://www.dezyre.com/data-science-in-python-tutorial/principal-component-analysis-tutorial

1. Visualize multidimensional data.
2. Compress information (images, signal processing).
3. Simplify complex business decisions (Finance: risk management of interest rate derivative portfolios).
4. Clarify convoluted scientific processes (related to neural ensembles).

https://glowingpython.blogspot.com/2011/07/principal-component-analysis-with-numpy.html

https://glowingpython.blogspot.com/2011/07/pca-and-image-compression-with-numpy.html

## 5. What are the assumptions and limitations of PCA?

1. PCA assumes a correlation between features.
2. PCA is sensitive to the scale of the features.
3. PCA is not robust against outliers.
4. PCA assumes a linear relationship between features.
5. Technical implementations often assume no missing values.

https://www.keboola.com/blog/pca-machine-learning

https://www.simplilearn.com/tutorials/machine-learning-tutorial/principal-component-analysis

https://www.mygreatlearning.com/blog/understanding-principal-component-analysis/