# PCA

## What does PCA do?

PCA uses an orthogonal transformation to covert a set of observations of possibly correlated variables/features into a set values of linearly uncorrelated variables called **principal components**.

This transformation is defined in such a way that the first principal component has the largest possible **variance** (that is, accounts for as much of the variabilities in the data as possible), and each succeeding component in turn has the highest **variance** possible under the constraint that it is orthogonal to the preceding components.

PCA is sensitive to the relative scaling of the original variables.

## What are the application scenarios of PCA?

1. Dimensionality reduction (but maintain the data' variabilities/skeletons as much as possible);

2. Data redundancies elimination;

3. Data noises cancellation;

## PCA pseudo code

1. Lets say we have $n$ observations with each of them has $m$ features:

$$
X = \
\begin{pmatrix}
  x_1^{(1)} & x_2^{(1)} & \dots & x_m^{(1)} \\
  x_1^{(2)} & x_2^{(2)} & \dots & x_m^{(2)} \\
  \vdots & \vdots & \vdots & \vdots \\
  x_1^{(n)} & x_2^{(n)} & \dots & x_m^{(n)}
\end{pmatrix}
$$

2. Proceed the mean normalization operation to each column of the $X$, that is: subtract the mean value of the column from each element of this column (so that for each column its mean value will be 0, this will benefit its variance calculation):

$$
X' = \
\begin{pmatrix}
  x_1^{(1)} - \sum\limits_{i=1}^n x_1^{(i)} \Big/ n & x_2^{(1)} - \sum\limits_{i=1}^n x_2^{(i)} \Big/ n & \dots & x_m^{(1)} - \sum\limits_{i=1}^n x_m^{(i)} \Big/ n \\
  x_1^{(2)} - \sum\limits_{i=1}^n x_1^{(i)} \Big/ n & x_2^{(2)} - \sum\limits_{i=1}^n x_2^{(i)} \Big/ n & \dots & x_m^{(2)} - \sum\limits_{i=1}^n x_m^{(i)} \Big/ n \\
  \vdots & \vdots & \vdots & \vdots \\
  x_1^{(n)} - \sum\limits_{i=1}^n x_1^{(i)} \Big/ n & x_2^{(n)} - \sum\limits_{i=1}^n x_2^{(i)} \Big/ n & \dots & x_m^{(n)} - \sum\limits_{i=1}^n x_m^{(i)} \Big/ n
\end{pmatrix}
$$

3. Calculate the covariance matrix $C$ of $X'$:

$$
C = \frac{1}{n} X'^T X'
$$

4. Calculate the covariance matrix $C$'s eigenvalues ant its corresponding eigenvectors;


5. Sort the eigenvectors from top to bottom according to its corresponding eigenvalues from large to, and then take the first $k$ rows to construct a new matrix $P$;


6. Then $Y=PX$ will be the transformation result of the original data $X$ in lower $k$-dimensional space;

## PCA mathematical explanations

PCA mathematical explanations can be found in this long image: [pca mathematical explanations](https://github.com/lnshi/ml-exercises/blob/master/ml_basics/rdm012_principal_components_analysis/pca_mathematical_explanation.png), original post can be found here: [PCA的数学原理(转)](https://zhuanlan.zhihu.com/p/21580949).

# References

- [PCA的数学原理(转)](https://zhuanlan.zhihu.com/p/21580949)

- [PCA主成分分析学习总结](https://zhuanlan.zhihu.com/p/32412043)

- [如何通俗易懂地解释「协方差」与「相关系数」的概念？](https://www.zhihu.com/question/20852004/answer/134902061)