# Neu 560 (2018-02-15): SVD and PCA
## SVD and Rank-1 Matrices
We can re-express a matrix $A$ in the following way:

> $A = USV^T = (s_1u_1v_1^T) + (s_2u_2v_2^T) + \ldots (s_nu_nv_n^T) $

In other words, we can express the matrix $A$ as a sum of a series of **rank-1 matrices** equal to the original matrix. As such, SVD is optimal for finding a low-rank matrix decomposition that minimizes the sum of squared error for:

> $min \sum (A_{ij} - B_{ij})^2, \text{where rank}(B) = k$

In other words, we can take all singular values/vectors up until rank $k$, add them together, and this will be the optimal decomposition of the matrix $A$ of lower rank.

## Determinants
The determinant of a square matrix quantifies how that matrix changes the volume of a unit hypercube. The absolute value of the determinant of a square matrix $A$ is equal to the product of its singular values:

> $ \mid \ det(A) \mid = \prod s_i$

where $\{s_i\}$ are the singular values of $A$.

The more general definition of the determinant is that it is equal to the product of the eigenvalues of a matrix:

> $ det(A) = \prod e_i$

where $\{e_i\}$ are the eigenvalues of $A$. For symmetric, positive semi-definite matrices, this is also equal to the product of singular values.

## Principal Components Analysis
The goal of principal components analysis (PCA) is to find the best subspace (i.e. projection of the data) that captures the largest amount of variability. This can be formalized as the problem of finding the **unit vector** $\overrightarrow v$ that maximizes the sum of squared linear projections of the data vectors:

> $\sum (\overrightarrow x_i \cdot \overrightarrow v)^2 = \lVert X \overrightarrow v \lVert^ 2$

> $ = (X \overrightarrow v)^T (X \overrightarrow v) $

> $ = \overrightarrow v^T X^T X \overrightarrow v$
 
> $ = \overrightarrow v^T (X^T X) \overrightarrow v$

> $ = \overrightarrow v^T C \overrightarrow v$

The solution corresponds to the top eigenvector of $C$. Because $C$ is symmetric and positive-semidefinite (as a matrix formed by the its multiplication with itself), its SVD is also its eigenvector decomposition:

> $ C = USU^T$

where $U$ is its eigenvectors and $S$ are eigenvalues. If multiply both sides by one of its eigenvectors, i.e. $\overrightarrow v = \overrightarrow u_j$, then $U^T\overrightarrow u_j$ is a vector of zeros with a 1 in the $j$th component, such that:

> $\overrightarrow u_j^T C \overrightarrow u_j = \overrightarrow u_j^T(USU^T)\overrightarrow u_j$

> $ = (\overrightarrow u_j^TU) S (U^T \overrightarrow u_j)$

> $=s_j$

So, plugging in the $j$th eigenvector of $C$ gives us out $s_j$ as the sum of squared projections. To maximize this quantity, we select the largest eigenvector (i.e. its first). As such, the first eigenvector of $C$ is its first principal component. 