# 1.4. Principal Component Analysis
_________________________________
**Used for dimensionality reduction**

### Key Concepts:
* Singular Value Decomposition
* Low-Rank Matrix Approximations
* Principal Component Analysis

In [1]:
import numpy as np
from IPython.display import Image

#### Global Variables

In [2]:
# Generate a random mxn matrix
m = 4
n = 5
A = np.random.randint(100, size=(m, n))

print(f"Random {m}x{n} Matrix A:")
print(A)

Random 4x5 Matrix A:
[[ 1  7 56 29 44]
 [84 20 91 32 87]
 [80 33 23  4 97]
 [78 63 90 11 84]]


## Singular Value Decomposition

Eigendecomposition can only be done for Square matrices.
Singular Value Decomposition (**SVD**) can be used to decompose every matrix, regardless of shape,
or if the eigenvalues are complex or if the eigenvectors are not orthogonal.

SVD Model:
$ A = U \Sigma V ^T $

where:
* $A$: is the Input Matrix. Also referred to using $M$.
* $U$: is an Orthogonal Matrix.
* $\Sigma$: is a Diagonal Matrix. Also referred to using $D$.
* $V$: is an Orthogonal Matrix.


In [3]:
# Singular Value Decomposition Visualization
Image(url="https://upload.wikimedia.org/wikipedia/commons/thumb/c/c8/Singular_value_decomposition_visualisation.svg/1024px-Singular_value_decomposition_visualisation.svg.png", height=175)

In [4]:
def svd(A: np.ndarray):
    """
    Singular Value Decomposition
    :param A: Matrix A
    :return: (U, D, V)
    """
    At = A.transpose()

    # Compute the Eigenvalues and Eigenvectors of AAt
    L, U = np.linalg.eig(np.dot(A, At))
    # U is the matrix of eigenvectors

    # D is the squareroot of eigenvalues
    D = np.sqrt(L)
    # Remove 0 elements
    D = D[D != 0]
    # Sort
    D[::-1].sort()

    # Compute the Eigenvalues and Eigenvectors of AtA
    _, V = np.linalg.eig(np.dot(At, A))

    return U, D, V

In [5]:
UA, DA, VA = svd(A)

print(f"Orthogonal Matrix U:\n{UA}\n")
print(f"Diagonal Matrix D:  \n{DA}\n")
print(f"Orthogonal Matrix V:\n{VA}\n")

Orthogonal Matrix U:
[[ 0.23977918  0.63156756  0.64293134 -0.3609261 ]
 [ 0.59382623  0.25356739 -0.67303466 -0.36069148]
 [ 0.47311447 -0.72922924  0.33668698 -0.36197968]
 [ 0.60501162  0.07106836  0.14249742  0.78013122]]

Diagonal Matrix D:  
[258.60998183  64.63405842  36.5922661   25.89057453]

Orthogonal Matrix V:
[[-0.52264512 -0.47751572 -0.15090458  0.68913968 -0.03369155]
 [-0.26017359 -0.15618604 -0.39991233 -0.42980252 -0.75050353]
 [-0.51350879  0.74366823  0.30359556  0.180507   -0.24189511]
 [-0.13341953  0.37587692 -0.82198367 -0.00085435  0.40651923]
 [-0.61454026 -0.23077862  0.22241969 -0.55477273  0.46025884]]



### Verify with np.linalg.svg

In [6]:
UA, DA, VA = np.linalg.svd(A)

print(f"Orthogonal Matrix U:\n{UA}\n")
print(f"Diagonal Matrix D:  \n{DA}\n")
print(f"Orthogonal Matrix V:\n{VA}\n")

Orthogonal Matrix U:
[[-0.23977918 -0.63156756  0.3609261  -0.64293134]
 [-0.59382623 -0.25356739  0.36069148  0.67303466]
 [-0.47311447  0.72922924  0.36197968 -0.33668698]
 [-0.60501162 -0.07106836 -0.78013122 -0.14249742]]

Diagonal Matrix D:  
[258.60998183  64.63405842  36.5922661   25.89057453]

Orthogonal Matrix V:
[[-0.52264512 -0.26017359 -0.51350879 -0.13341953 -0.61454026]
 [ 0.47751572  0.15618604 -0.74366823 -0.37587692  0.23077862]
 [-0.03369155 -0.75050353 -0.24189511  0.40651923  0.46025884]
 [ 0.68913968 -0.42980252  0.180507   -0.00085435 -0.55477273]
 [-0.15090458 -0.39991233  0.30359556 -0.82198367  0.22241969]]



### Performance comparison

In [7]:
%%timeit -n 100
svd(A)

106 µs ± 20.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [8]:
%%timeit -n 100
np.linalg.svd(A)

22.6 µs ± 2.72 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


**np.linalg.svd is ~5x faster than the custom svd implementation.**