# 1.4. Principal Component Analysis
_________________________________
**Used for dimensionality reduction**

### Key Concepts:
* Singular Value Decomposition
* Low-Rank Matrix Approximations
* Principal Component Analysis

In [1]:
import numpy as np
from IPython.display import display, Image

#### Global Variables

In [2]:
# Generate a random mxn matrix
m = 4
n = 5
A = np.random.randint(100, size=(m, n))

print(f"Random {m}x{n} Matrix A:")
print(A)

Random 4x5 Matrix A:
[[84 97 94 91 41]
 [10 99 20 70 27]
 [22 51  7 35 82]
 [73 47 70 84  9]]


## Singular Value Decomposition

Eigendecomposition can only be done for Square matrices.
Singular Value Decomposition (**SVD**) can be used to decompose every matrix, regardless of shape,
or if the eigenvalues are complex or if the eigenvectors are not orthogonal.

SVD Model:
$ A = U \Sigma V ^T $

where:
* $A$: is the Input Matrix. Also referred to using $M$.
* $U$: is an Orthogonal Matrix.
* $\Sigma$: is a Diagonal Matrix. Also referred to using $D$.
* $V$: is an Orthogonal Matrix.


In [3]:
# Singular Value Decomposition Visualization
display(Image(url="https://upload.wikimedia.org/wikipedia/commons/thumb/c/c8/Singular_value_decomposition_visualisation.svg/1024px-Singular_value_decomposition_visualisation.svg.png", width=300, unconfined=True))

In [4]:
def svd(A: np.ndarray):
    """
    Singular Value Decomposition
    :param A: Matrix A
    :return: (U, D, V)
    """
    At = A.transpose()

    # Compute the Eigenvalues and Eigenvectors of AAt
    L, U = np.linalg.eig(np.dot(A, At))
    # U is the matrix of eigenvectors

    # D is the squareroot of eigenvalues
    D = np.sqrt(L)
    # Remove 0 elements
    D = D[D != 0]
    # Sort
    D[::-1].sort()

    # Compute the Eigenvalues and Eigenvectors of AtA
    _, V = np.linalg.eig(np.dot(At, A))

    return U, D, V

In [5]:
UA, DA, VA = svd(A)

print(f"Orthogonal Matrix U:\n{UA}\n")
print(f"Diagonal Matrix D:  \n{DA}\n")
print(f"Orthogonal Matrix V:\n{VA}\n")

Orthogonal Matrix U:
[[ 0.69814187  0.21918453 -0.66725473  0.13902224]
 [ 0.42082704 -0.47109243  0.12619556 -0.7648864 ]
 [ 0.30060372 -0.68676929  0.21903929  0.62450552]
 [ 0.49511607  0.50830897  0.70061943  0.0749298 ]]

Diagonal Matrix D:  
[266.75333316  86.46820069  53.31660005  19.34553381]

Orthogonal Matrix V:
[[-0.39590487 -0.4128478   0.43584927 -0.69220812  0.06082235]
 [-0.55475552  0.42225886 -0.50391649 -0.28877342 -0.42026403]
 [-0.41538086 -0.48521583  0.13854948  0.5705056  -0.49734469]
 [-0.54394703 -0.06511469 -0.23893548  0.26595559  0.75634593]
 [-0.25900969  0.64154487  0.69268858  0.20309803  0.01636721]]



### Verify with np.linalg.svg

In [6]:
UA, DA, VA = np.linalg.svd(A)

print(f"Orthogonal Matrix U:\n{UA}\n")
print(f"Diagonal Matrix D:  \n{DA}\n")
print(f"Orthogonal Matrix V:\n{VA}\n")

Orthogonal Matrix U:
[[ 0.69814187  0.21918453 -0.13902224 -0.66725473]
 [ 0.42082704 -0.47109243  0.7648864   0.12619556]
 [ 0.30060372 -0.68676929 -0.62450552  0.21903929]
 [ 0.49511607  0.50830897 -0.0749298   0.70061943]]

Diagonal Matrix D:  
[266.75333316  86.46820069  53.31660005  19.34553381]

Orthogonal Matrix V:
[[ 0.39590487  0.55475552  0.41538086  0.54394703  0.25900969]
 [ 0.4128478  -0.42225886  0.48521583  0.06511469 -0.64154487]
 [-0.43584927  0.50391649 -0.13854948  0.23893548 -0.69268858]
 [ 0.06082235 -0.42026403 -0.49734469  0.75634593  0.01636721]
 [-0.69220812 -0.28877342  0.5705056   0.26595559  0.20309803]]



### Performance comparison

In [7]:
%%timeit -n 100
svd(A)

78.1 µs ± 13.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [8]:
%%timeit -n 100
np.linalg.svd(A)

21.2 µs ± 7.59 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


**np.linalg.svd is ~4x faster than the custom svd implementation.**