<a href="https://colab.research.google.com/github/shiissaa/MAT422/blob/main/1_4.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**1.4.1 Singular Value Decomposition**

Singular value decomposition provides another way of factoring a matrix into singular vectors and singular values. It gives us information that we may want to know.

Let $A$ be a $m$ x $n$ matrix. Then $A^TA$ is symmetric and can be orthogonally diagonalized. Let $v_1,...,v_n$ be an orthonormal basis for $R^n$ consisting of eigenvectors of $A^TA$, and let $lambda_n,...,labmbda_n$ be the associated eigenvalues of $A^TA$. With this (and some more) we can see that the eigenvalues of $A$ are all nonnegative.

In [7]:
from numpy import array
from scipy.linalg import svd

A = array([[1, 2], [3, 4], [5, 6]])
print(A)
U, s, VT = svd(A)
print(U)
print(s)
print(VT)

[[1 2]
 [3 4]
 [5 6]]
[[-0.2298477   0.88346102  0.40824829]
 [-0.52474482  0.24078249 -0.81649658]
 [-0.81964194 -0.40189603  0.40824829]]
[9.52551809 0.51430058]
[[-0.61962948 -0.78489445]
 [-0.78489445  0.61962948]]


**1.4.2 Low-Rank Matrix Approximations**

The goal of low rank approximations is to have a matrix that one can store with less memory and have be computed faster with the same behavior as the original matrix.

To do this, we compute the SVD, keep k left vectors of U, keep k diagonal values of S, and keep k top vectors of V.

**1.4.3 Principal Component Analysis**

This refers to a dimensionality-reduction method, used to reduce the size and variables of a large data set while aiming to preserve as much info as possible. It's underlying mathematics can be explained with singular value decomposition.

First, standardize variables by weighting them equally in terms of range. Then, compute` the covariance matrix, which compares the relationships between the variables, identifying and reducing closely related variables. Third, identitfy the principal components, which means to compute the eigenvectors and eigenvalues of the covariance matrix. Then, generate the feature vector. Finally, project the data on the respective axes.

In [12]:
from sklearn.decomposition import PCA
import numpy as np
X = np.random.randint(10, size=(4,3))
pca = PCA(n_components = 3, svd_solver = 'full')
pca.fit(X)
PCA(n_components = 3)
print(pca.explained_variance_ratio_)
print(pca.singular_values_)

[0.5929196  0.38406524 0.02301515]
[9.06197357 7.29335562 1.78538477]
