$\def \dot #1#2{\left\langle #1, #2 \right\rangle}$
$\def \adot #1#2{\left\langle #1, #2 \right\rangle}$
$\def \cD {\mathcal{D}}$
$\def \cW {\mathcal{W}}$
$\def \bc {\mathbf{c}}$
$\def \bv {\mathbf{v}}$
$\def \bG {\mathbf{G}}$
$\def \bC {\mathbf{C}}$
$\def \bS {\mathbf{S}}$
$\def \bT {\mathbf{T}}$
$\def \bU {\mathbf{U}}$
$\def \bV {\mathbf{V}}$
$\def \bW {\mathbf{W}}$
$\def \bPhi {\mathbf{\Phi}}$
$\def \bPsi {\mathbf{\Psi}}$
$\def \bGamma {\mathbf{\Gamma}}$
$\def \bSigma {\mathbf{\Sigma}}$
$\def \bTheta {\mathbf{\Theta}}$
$\def \bOmega {\mathbf{\Omega}}$
$\def \bbE {\mathbb{E}}$
$\def \bbP {\mathbb{P}}$
$\def \bbR {\mathbb{R}}$
$\def \bbN {\mathbb{N}}$

# Demonstrating my PCA decomposition

From a snapshot set $\{ u_i \}_{i=1}^N$, how do we derive the proper PCA fit, noting that the covariance is properly defined as

$$
\langle v, C w \rangle := \mathbb{E}(\langle u, v \rangle \langle u, w \rangle) 
$$

but here we use the approximate (empirical) covariance:
$$
\langle v, C w \rangle = \frac{1}{N} \sum_{i=1}^N \langle u_i, v \rangle \langle u_i, w \rangle
$$

What I actually did was the eigen-decomposition of the Gram matrix $\mathbf{G}$, where $G_{i,j} = \langle u_i, u_j \rangle$, and used this to build the "PCA", but apparently that wasn't right... Well, lets see. 

Note firstly that $C: V \to V$ and $\bG : \mathbb{R}^N \to \mathbb{R}^N$. Let's take the case $V=\mathbb{R}^K$ to  make things a bit simpler, then we have $\bC: \bbR^K \to \bbR^K$ and our empirical covariance is

$$
\langle v, \bC w \rangle = v^T \bC w = \frac{1}{N} \sum_{i=1}^N \langle u_i, v \rangle \langle u_i, w \rangle = \frac{1}{N} \sum_{i=1}^N (u_i^T v)^T ( u_i^T w)
$$

Let us write $\bU = [u_1, u_2, \ldots, u_N] \in \bbR^{K\times N}$, so we have from the above

$$
v^T \bC w = v^T \bU \bU^T w,
$$

so indeed as we wrote on the board the other day we have that $\bC$ is the outer-product matrix of $\bU$. Note in this case also we have the Gram matrix $\bG = \bU^T \bU$. 

There's the SVD decomposition of $\bU = \bPhi \bSigma \bTheta^T$, with $\bPhi\in\bbR^{K\times K}$, $\bSigma\in\bbR^{N\times K}$ and $\bTheta \in \bbR^{N \times N}$. We may have fewer than $N$ singular values. Both $\bC$ and $\bG$ are evidently symmetric matrices and they decompose as

$$
\bC =  \bU \bU^T  = \bPhi \bSigma \bSigma^T \bPhi^T \quad\text{and}\quad \bG = \bU^T \bU = \bTheta \bSigma^T \bSigma \bTheta^T
$$

Note that $\bSigma \bSigma^T$ is a diagonal $K\times K$ matrix, while $\bSigma^T \bSigma$ is $N\times N$, they both are diagonal with $\sigma_i^2$ along the diagonal.

Now, we have that $\bPhi = \bU \bTheta \bSigma^{-1}$. I'm reasonably sure this all applies if we consider a more general $V$, with of course the addition of an operator $E : v \to \bbR^K$ that maps from a canonical ortho basis to the coordinates. But this doesn't complicate things too much.

First let us test this all in $\bbR^K$ for some moderate $K$.

In [1]:
import numpy as np
import scipy as sp
import importlib
import seaborn as sns
import matplotlib.pyplot as plt
import pdb

import sys
sys.path.append("../../")
import pyApproxTools as pat
importlib.reload(pat)

%matplotlib inline

In [2]:
K = 10
N = 4

# First make a random orthonormal vector
Phi_orig = sp.stats.ortho_group.rvs(dim=K)
sigma = np.sort(np.random.random(K))[::-1]
D_orig = np.diag(sigma**2)

# This is the original covariance matrix!
Cov_orig = Phi_orig * D_orig * Phi_orig.T

points = np.random.multivariate_normal(np.zeros(K), Cov_orig, N)
U = points.T
print('K={0}, N={1}, U is dim {2}'.format(K,N,U.shape))

K=10, N=4, U is dim (10, 4)


In the above code we generate $N$ random points in $\bbR^K$ that are distributed according to a randomly generated "PCA construction", that is a random ortho-basis ```Psi_orig``` and a randomly generated sequence ```sigma``` or ordered numbers between 0 and 1, from which ```Cov_orig``` is calculated in the obvious way, and $U$ are the multi-variate normal random numbers.

### Now we calculate the PCA in two ways. First by factoring $\bU^T \bU$, second by factoring $\bU\bU^T$, but lets make sure we get the same quantities
Recal $\bU \in \bbR^{K\times N}$. We are doing:

$$
\bC =  \bU \bU^T  = \bPhi \bSigma \bSigma^T \bPhi^T \quad\text{and}\quad \bG = \bU^T \bU = \bTheta \bSigma^T \bSigma \bTheta^T
$$

and as $\bU = \bPhi \bSigma \bTheta^T$ we should be able to recover the first $N$ columns of $\bPhi$ from the calculation 

$$\bPhi = \bU \bTheta \bSigma^{-1}$$ 

Recall $\bPhi\in\bbR^{K\times K}$, $\bSigma\in\bbR^{N\times K}$ and $\bTheta \in \bbR^{N \times N}$

In [9]:
G = U.T @ U
C = U @ U.T

sigma_1, Theta = np.linalg.eigh(G)
sigma_2, Phi = np.linalg.eigh(C)

# Because NumPy outputs eigenvalues in reverse (increasing) order, we reverse
sigma_1 = sigma_1[::-1]
sigma_2 = sigma_2[::-1]
Theta = Theta[:,::-1]
Phi = Phi[:,::-1]

# Embed the singular values diagonally in the appropriate (K x N) matrix
Sigma_inv = np.pad(np.diag(1.0/np.sqrt(sigma_1)), ((0,K-N), (0, 0)), 'constant')

print('Phi (first N={0} columns):\n\n'.format(N), Phi[:,:N], 
      '\n\nU Theta Sigma_inv (first N columns, rest are 0):\n\n', U @ Theta @ Sigma_inv.T[:,:N])

Phi (first N=4 columns):

 [[-8.06115876e-01  1.21752261e-02 -5.29221262e-02 -4.17588959e-01]
 [-4.08308917e-01  6.57459470e-01 -1.25476584e-01  4.05830109e-01]
 [ 2.60923814e-01  4.30203315e-01 -1.63681295e-01  2.10125703e-01]
 [-2.27530694e-03  9.14011556e-03  7.89784069e-03 -2.44751114e-03]
 [ 2.74740015e-02 -3.01082793e-02  6.14697612e-01 -7.38292424e-02]
 [-1.99519167e-01 -3.68705967e-03  6.63342282e-01  2.37323012e-01]
 [-2.68586850e-01 -5.75330991e-01 -1.76901225e-01  7.00593946e-01]
 [-4.32472380e-03 -4.64953273e-03 -3.69273383e-02 -1.28714648e-01]
 [-5.15198121e-02 -2.24706244e-01 -3.22594311e-01 -2.18151732e-01]
 [ 6.26435124e-06 -9.62260285e-05 -1.86469656e-04 -3.11497290e-04]] 

U Theta Sigma_inv (first N columns, rest are 0):

 [[ 8.06115876e-01 -1.21752261e-02 -5.29221262e-02 -4.17588959e-01]
 [ 4.08308917e-01 -6.57459470e-01 -1.25476584e-01  4.05830109e-01]
 [-2.60923814e-01 -4.30203315e-01 -1.63681295e-01  2.10125703e-01]
 [ 2.27530694e-03 -9.14011556e-03  7.89784069e-0

Thus we see that $\bPhi = \bU \bTheta \bSigma^{-1}$ for the first $N$ columns, up to a difference of sign. 

The difference of sign is due to the ambiguity of sign in the SVD decomposition, we can see for example that $ \varphi_i \sigma_i \vartheta_j^T = (-\varphi_i) \sigma_i (-\vartheta_j)^T$. 

My point in showing this is that the $N$-dimensional basis $[\varphi_1,\ldots,\varphi_N]$ of the best-fit PCA basis can be found purely from the matrix $\bG_{i,j} = \langle u_i - \bar{u}, u_j - \bar{u} \rangle$ (noting above that we've assumed that $\bar{u} = 0$). This is a much smaller $N\times N$ calculation and doesn't require some pre-built orthonormal basis of $V$. Now, the problem is of course extending to the rest of the columns of $\bPhi$, so that we can do the calculations of the sub-matrices of $\bS$ or $\bT$.

### Now we do just a few more sanity checks (e.g. that $\sigma_j$ are the same from the decomposition of both $\bG$ and $\bC$)

In [4]:
# A few further sanity checks here:
U1,Sig,V1 = np.linalg.svd(U)
Sigma = np.pad(np.diag(np.sqrt(sigma_1)), ((0,K-N), (0, 0)), 'constant')

print('Sigma from G:       ', sigma_1[:5], '...')
print('Sigma from C:       ', sigma_2[:5], '...')
print('Sigma from SVD:     ', Sig*Sig)
print('(Phi.T @ U @ Theta)^2:', np.diag(Phi.T @ U @ Theta)**2)
print('')
print('Phi is dim   ', Phi.shape)
print('Theta is dim   ', Theta.shape)
print('Sigma is dim ', Sigma.shape, '\n')

print('U:                  \n', U @ Theta)
print('Phi Sigma Theta^T:    \n', (Phi) @ Sigma, '\n')

Sigma from G:        [6.06564709 0.79465209 0.21296188 0.11724135] ...
Sigma from C:        [6.06564709e+00 7.94652090e-01 2.12961884e-01 1.17241354e-01
 2.23382199e-16] ...
Sigma from SVD:      [6.06564709 0.79465209 0.21296188 0.11724135]
(Phi.T @ U @ Theta)^2: [6.06564709 0.79465209 0.21296188 0.11724135]

Phi is dim    (10, 10)
Theta is dim    (4, 4)
Sigma is dim  (10, 4) 

U:                  
 [[-2.34624947e+00 -2.57194070e-02 -1.16881295e-01  3.89291844e-02]
 [ 2.20005333e-01 -3.18789073e-01  5.15253195e-03  7.71556582e-02]
 [-3.08373492e-01 -4.18901068e-01  1.72311121e-01 -2.00571928e-01]
 [ 3.33481448e-02  3.57466481e-03 -8.16242592e-03  3.65388113e-02]
 [-7.83196588e-02  7.08234735e-01  6.82380712e-02 -5.58631334e-02]
 [-3.66133333e-01  1.14965595e-01  5.70304594e-02 -2.13776661e-01]
 [ 5.08524518e-01 -4.02369507e-02 -3.97832961e-01 -1.39117890e-01]
 [-1.26567769e-01  5.87211359e-03 -5.60247799e-02  5.39141646e-03]
 [ 2.34781089e-04 -1.62438253e-04 -4.71005816e-04 -7.31323150