# 1.4. Principal Component Analysis
_________________________________
**Used for dimensionality reduction**

### Key Concepts:
* Singular Value Decomposition
* Low-Rank Matrix Approximations
* Principal Component Analysis

In [1]:
import numpy as np
from IPython.display import Image

#### Global Variables

In [2]:
# Generate a random mxn matrix
m = np.random.randint(3, 6)
n = np.random.randint(2, 5)
A = np.random.randint(100, size=(m, n))

print(f"Random {m}x{n} Matrix A:")
print(A)

Random 3x4 Matrix A:
[[ 7 79 46 57]
 [93  6 28 20]
 [55 90 33 35]]


## Singular Value Decomposition

Eigendecomposition can only be done for Square matrices.
Singular Value Decomposition (**SVD**) can be used to decompose every matrix, regardless of shape,
or if the eigenvalues are complex or if the eigenvectors are not orthogonal.

SVD Model:
$ A = U \Sigma V ^T $

where:
* $A$: is the Input Matrix. Also referred to using $M$.
* $U$: is an Orthogonal Matrix.
* $\Sigma$: is a Diagonal Matrix. Also referred to using $D$.
* $V$: is an Orthogonal Matrix.


In [3]:
# Singular Value Decomposition Visualization
Image(url="https://upload.wikimedia.org/wikipedia/commons/thumb/c/c8/Singular_value_decomposition_visualisation.svg/1024px-Singular_value_decomposition_visualisation.svg.png", height=175)

In [4]:
def svd(A: np.ndarray):
    """
    Singular Value Decomposition
    :param A: Matrix A
    :return: (U, D, V)
    """
    At = A.transpose()

    # Compute the Eigenvalues and Eigenvectors of AAt
    L, U = np.linalg.eig(np.dot(A, At))
    # U is the matrix of eigenvectors

    # D is the squareroot of eigenvalues
    D = np.sqrt(L)
    # Remove 0 elements
    D = D[D != 0]

    # Compute the Eigenvalues and Eigenvectors of AtA
    _, V = np.linalg.eig(np.dot(At, A))

    return U, D, V

In [5]:
UA, DA, VA = svd(A)

print(f"Orthogonal Matrix U:\n{UA}\n")
print(f"Diagonal Matrix D:  \n{DA}\n")
print(f"Orthogonal Matrix V:\n{VA}\n")

Orthogonal Matrix U:
[[-0.58898407 -0.61241028  0.52730581]
 [-0.41405373 -0.33166424 -0.84767821]
 [-0.69401533  0.71760189  0.0582258 ]]

Diagonal Matrix D:  
[164.69292856  26.08445512  84.60993136]

Orthogonal Matrix V:
[[ 0.49061444  0.85026086 -0.16624682  0.09335917]
 [ 0.67686843 -0.49416672 -0.54491354 -0.02678842]
 [ 0.37396431 -0.02886811  0.528154   -0.76182064]
 [ 0.40161835 -0.17894791  0.62967022  0.64046527]]



### Verify with np.linalg.svg

In [6]:
UA, DA, VA = np.linalg.svd(A)

print(f"Orthogonal Matrix U:\n{UA}\n")
print(f"Diagonal Matrix D:  \n{DA}\n")
print(f"Orthogonal Matrix V:\n{VA}\n")

Orthogonal Matrix U:
[[ 0.58898407 -0.52730581  0.61241028]
 [ 0.41405373  0.84767821  0.33166424]
 [ 0.69401533 -0.0582258  -0.71760189]]

Diagonal Matrix D:  
[164.69292856  84.60993136  26.08445512]

Orthogonal Matrix V:
[[ 0.49061444  0.67686843  0.37396431  0.40161835]
 [ 0.85026086 -0.49416672 -0.02886811 -0.17894791]
 [-0.16624682 -0.54491354  0.528154    0.62967022]
 [ 0.09335917 -0.02678842 -0.76182064  0.64046527]]



### Performance comparison

In [7]:
%%timeit -n 100
svd(A)

105 µs ± 49 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [8]:
%%timeit -n 100
np.linalg.svd(A)

22.1 µs ± 8.85 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


**np.linalg.svd is ~5x faster than the custom svd implementation.**