#### Principal Component Analysis (PCA)
The role of PCA is to preserve as much information about the data as possible while greatly reducing the dimensionality of the problem, simplying the analysis. One technique used by Gavrilov et al. is to represent each stock's time series as a single point in $d$-dimensional space, where $d$ is the number of data points. Using PCA we can greatly reduce the number of data points making our analysis much easier. Keep in mind that the principal components do not necessarily have a physical meaning.

There are other applications of PCA however, for example, determining which features (e.g. P/E ratio, sales volume, etc.) are most important to a stock's performance. 

You can determine PCA by the following:
1. Calculate the correlation (or covariance) matrix. The correlation matrix is safer, since if some of your data is on different scales they will dominate your PCA and not give you the true components.
2. Get the eigenvectors and eigenvalues of your correlation matrix. Each eigenvector is a column vector with as many elements as the number of variables in the original dataset. 
3. The eigenvalues for each of the eigenvectors represent the amount of variance that the given eigenvector accounts for. We arrange the eigenvectors in decreasing order of the eigenvalues, and pick the top 2, 3 or as many eigenvalues that we are interested in depending upon how much variance we want to capture in our model. 

In [None]:
import numpy.linalg as la
eigval, eigvec = la.eig(corr)
eigval, eigvec = (list(x) for x in zip(*sorted(zip(eigval, eigvec), key=lambda pair: pair[0])))
eigval = np.asarray(eigval)
eigvec = np.asarray(eigvec)

In [None]:
#### PCA
https://www.riskprep.com/all-tutorials/36-exam-22/132-understanding-principal-component-analysis-pca