### Singular Value Decomposition (SVD)
   
Dimensionality reduction is the main goal of both PCA and SVD. However, SVD focuses on matrix factorization and, dfferently from PCA, can handle cases where data is missing The SVD of a matrix $X$ of dimension $(n,d)$ is:

$X = UDV^T$

where $U$ and $V$ are square orthogonal (i.e., $U^TU\,=\,V^TV\,=\,I_d$, with $I_d$ the identity matrix of shape $(d,n)$).

Note that:
- $I_d$ is not necessarily a squared matrix
- The SVD of a matrix can be done for any matrix
- SVD is different from the PCA decomposition.

More information can be found [here](https://en.wikipedia.org/wiki/Singular_value_decomposition).

#### SVD Calculation

In [1]:
# libraries
import numpy as np
from scipy.linalg import svd

# define a matrix
np.random.seed(42)
A = np.random.rand(4,10)
print(f"Original matrix (of shape: {A.shape}):")
print(A)
print()

# SVD
U, D, VT = svd(A)
print(f"U matrix (of shape: {U.shape}):")
print(U)
print()
print(f"D matrix (of shape: {D.shape}):")
print(D)
print()
print(f"V Transpose matrix (of shape: {VT.shape}):")
print(VT)

Original matrix (of shape: (4, 10)):
[[0.37454012 0.95071431 0.73199394 0.59865848 0.15601864 0.15599452
  0.05808361 0.86617615 0.60111501 0.70807258]
 [0.02058449 0.96990985 0.83244264 0.21233911 0.18182497 0.18340451
  0.30424224 0.52475643 0.43194502 0.29122914]
 [0.61185289 0.13949386 0.29214465 0.36636184 0.45606998 0.78517596
  0.19967378 0.51423444 0.59241457 0.04645041]
 [0.60754485 0.17052412 0.06505159 0.94888554 0.96563203 0.80839735
  0.30461377 0.09767211 0.68423303 0.44015249]]

U matrix (of shape: (4, 4)):
[[-0.58200243 -0.46018451  0.28177358  0.60836423]
 [-0.43344428 -0.50797901 -0.15132486 -0.72882382]
 [-0.42826635  0.31259847 -0.82207696  0.20750806]
 [-0.53850456  0.65762431  0.47105616 -0.23590082]]

D matrix (of shape: (4,)):
[2.98081051 1.511071   0.58704473 0.47296868]

V Transpose matrix (of shape: (10, 10)):
[[-0.273787   -0.3775109  -0.31769415 -0.37182421 -0.29687593 -0.31597947
  -0.13929981 -0.33695382 -0.38890373 -0.26678946]
 [ 0.26999823 -0.51251825 

#### SVD for Dimensionality Reduction

Datasets with a large number of features (i.e., matrices of shape $(n, m)$, where $m\geq n$) may be reduced so that only a smaller subset of features is selected. The result is a matrix with a lower rank that is said to approximate the original matrix.

To do this we can perform an SVD operation on the original data and select the top $k$ largest singular values in the diagonal matrix. These columns can be selected from the diagonal matrix, while the rows can be selected from $V^T$.

In [3]:
# SVD
U, D, VT = svd(A)

# create m x n matrix of zeros
diagonal = np.zeros((A.shape[0], A.shape[1]))

# populate with (n, n) diagonal matrix
diagonal[:A.shape[0], :A.shape[0]] = np.diag(D)

# select the number of elements to retain
n_elements = 2
diagonal = diagonal[:, :n_elements]
VT = VT[:n_elements, :]

# reconstruct
B = U.dot(diagonal.dot(VT))
print(f"Reconstructed matrix (of shape: {B.shape}):")
print(B)
print()

# transform
T = U.dot(diagonal)
print(f"T matrix (reduced dimensionality) [U.dot(diagonal)] (of shape: {T.shape}):")
print(T)
print()

T = A.dot(VT.T)
print(f"T matrix (reduced dimensionality) [A.dot(VT.T)] (of shape: {T.shape}):")
print(T)

Reconstructed matrix (of shape: (4, 10)):
[[0.28722729 1.01131118 0.83904522 0.48160808 0.23274129 0.26648781
  0.20417551 0.78712722 0.61066753 0.54097931]
 [0.14648841 0.88115494 0.72826355 0.29998    0.07195905 0.0973089
  0.13859685 0.65895454 0.4318029  0.43095446]
 [0.4770472  0.23983084 0.20999685 0.58569151 0.57074318 0.59471945
  0.20329214 0.29254743 0.53995367 0.28749691]
 [0.70777927 0.09667485 0.09853853 0.83041795 0.87994532 0.90974542
  0.27717219 0.25139477 0.71574483 0.3165759 ]]

T matrix (reduced dimensionality) [U.dot(diagonal)] (of shape: (4, 2)):
[[-1.73483895 -0.69537147]
 [-1.29201526 -0.76759235]
 [-1.27658085  0.47235849]
 [-1.60518005  0.99371702]]

T matrix (reduced dimensionality) [A.dot(VT.T)] (of shape: (4, 2)):
[[-1.73483895 -0.69537147]
 [-1.29201526 -0.76759235]
 [-1.27658085  0.47235849]
 [-1.60518005  0.99371702]]
