# PCA

Given a collection of $m$ points $\{ x^{(1)}, ..., x^{(m)} \}$ in $\pmb{R}^n$, encode points in lower dimension $\pmb{R}^l$.

$$x^{(i)} \in \pmb{R}^n \rightarrow c^{(i)} \in \pmb{R}^l$$
$$f(x)=c \Rightarrow x \approx g(f(x))$$

Represent functions as matrices: $g(c)= Dc, D\in \pmb{R}^{n\times l}$. To get a unique solution we constrain $D$ to be orthonormal.

We now want to find $c^*$:

$$
c^* = \underset{c}{argmin} \big|\big| x - g(c) \big|\big|_2 \\
\Leftrightarrow
c^* = \underset{c}{argmin} \big|\big| x - g(c) \big|\big|^2_2 
$$

$$
(x-g(c))^2 = (x-g(c))^T(x-g(c))
$$

Applying the $L2$ norm:
$$
\big|\big|(x-g(c))^T(x-g(c))\big|\big|_2 = x^Tx-x^Tg(c)-g(c)^T+g(c)^Tg(c) \\
\Leftrightarrow c^* = \underset{c}{argmin} -2x^T g(c)+g(c)^Tg(c) \\
= \underset{c}{argmin} -2x^TDc + c^TD^TDc \\
= \underset{c}{argmin} -2x^TDc + c^TI_lc \\
= \underset{c}{argmin} -2x^TDc + c^Tc
$$

Solve the optimization problem with vector calculus:
$$
\Delta_c (-2x^TDc + c^Tc) = 0 \\
-2D^Tx + 2c = 0 \\
\Rightarrow c = D^Tx
$$

Thus: $f(x) = D^Tx$

Reconstruction operation: $r(x) = g(f(x)) = DD^Tx$

To find $D^*$, we must optimize using the Frobenius norm (as now we need to factor in all points)

$$
D^* = \underset{D}{argmin} \sqrt{
\sum_{i,j} \big(x_j^{(i)} - r(x^{(i)})_j\big)^2
}
$$

such that: $D^TD=I_l$

---

## Sample with l = 1

All under constraint: $\big|\big|d\big|\big|_2=1$
$$
d^* = \underset{d}{argmin} \sum_i \big|\big| x^{(i)} - dd^Tx^{(i)}\big|\big|_2^2 \\
= \underset{d}{argmin} \sum_i \big|\big| x^{(i)} - x^{(i)T}dd\big|\big|_2^2
$$

$X\in \pmb{R}^{m \times n}, X_{i,:} = x^{(i)T}$

$$
d^* = \underset{d}{argmin} \big|\big| X - Xdd^T\big|\big|_F^2\\
= \underset{d}{argmin} Tr\big( (X-Xdd^T)^T (X-Xdd^T) \big)\\
= \underset{d}{argmax} Tr\big( d^TX^TXd \big)
$$

This can be solved using eigendecomposition of $X^TX$ and take $d$ as the largest eigenvalue

In [15]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np

X = np.array([
    [1, 2],
    [3, 3],
    [0, 3],
    [5, 2],
    [6, 2],
    [7, 5],
    [10, 3]
])


fig = plt.figure(figsize=(16, 16))
ax = fig.add_subplot(211)

ax.scatter(X[:,0], X[:,1])
plt.show()

[ 267.25174649   16.74825351]


ValueError: shapes (2,) and (1,) not aligned: 2 (dim 0) != 1 (dim 0)