<a href="https://colab.research.google.com/github/hanhanwu/Hanhan_COLAB_Experiemnts/blob/master/SVD_intro.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# About SVD

* If the dimensions of A are m x n:
  * U is an m x m matrix of Left Singular Vectors
  * S is an m x n rectangular diagonal matrix of Singular Values arranged in decreasing order
  * V is an n x n matrix of Right Singular Vectors
  

* decomposition: `A = U*S*VT`
  * The decomposition allows us to express our original matrix as a linear combination of low-rank matrices.
  
* SVD in Dimensional Reduction
  * Using SVD, we are able to represent our large matrix A by 3 smaller matrices U, S and V
  * This is helpful in large computations. We can obtain a k-rank approximation of A, by selecting the first k singular values and truncate the 3 matrices accordingly
  * The Rank of Matrix - The number of INDEPENDENT columns in a matrix, and none of them can be expressed as a linear function of one or more of other columns.
    * <b>The rank of a matrix can be thought of as a representative of the amount of unique information represented by the matrix. Higher the rank, higher the information.</b>

* The code below is showing 3 types of python SVD


* Reference: https://www.analyticsvidhya.com/blog/2019/08/5-applications-singular-value-decomposition-svd-data-science/?utm_source=feedburner&utm_medium=email&utm_campaign=Feed%3A+AnalyticsVidhya+%28Analytics+Vidhya%29

## Type 1 Python SVD

* This method allows you to get complete S, U, V

In [0]:
import numpy as np
from numpy.linalg import svd

In [0]:
# this matrix has rank=2, since col3 = col1+co2, 
## but col1 and col2 are independent from each other

A = np.array([[1,2,3], [4,5,6], [5,7,9]])

U, S, VT = svd(A)

In [0]:
print("Left Singular Vectors:")
print(U)
print("Singular Values:") 
print(np.diag(S))
print("Right Singular Vectors:") 
print(VT)

Left Singular Vectors:
[[-0.2354116   0.78182354 -0.57735027]
 [-0.55937325 -0.5947842  -0.57735027]
 [-0.79478485  0.18703934  0.57735027]]
Singular Values:
[[1.56633231e+01 0.00000000e+00 0.00000000e+00]
 [0.00000000e+00 8.12593979e-01 0.00000000e+00]
 [0.00000000e+00 0.00000000e+00 1.17807579e-15]]
Right Singular Vectors:
[[-0.41158755 -0.56381288 -0.71603821]
 [-0.8148184  -0.12429146  0.56623547]
 [-0.40824829  0.81649658 -0.40824829]]


In [0]:
# Return the original matrix A
# @ is used for matrix multiplication in Py3, use np.matmul with Py2
print(U @ np.diag(S) @ VT)

[[1. 2. 3.]
 [4. 5. 6.]
 [5. 7. 9.]]


## Type 2 Pyhton SVD

* Sklearn Truncated SVD - This is used for dimensional reduction directly

In [0]:
import numpy as np
from sklearn.decomposition import TruncatedSVD

In [12]:
A = np.array([[1,2,3], [4,5,6], [5,7,9]])
print("Original Matrix:")
A

Original Matrix:


array([[1, 2, 3],
       [4, 5, 6],
       [5, 7, 9]])

In [13]:
svd =  TruncatedSVD(n_components = 2)  # reduce to 2 features
A_transf = svd.fit_transform(A)

print("Singular values:")
print(svd.singular_values_)
print()

print("Transformed Matrix after reducing to 2 features:")
print(A_transf)

Singular values:
[15.66332312  0.81259398]

Transformed Matrix after reducing to 2 features:
[[ 3.68732795  0.6353051 ]
 [ 8.76164389 -0.48331806]
 [12.44897184  0.15198704]]


## Type 3 Python SVD

* Randomized SVD
  * It returns S, U, V too
  * It returns the same results as Truncatsed SVD, but faster
    * Truncated SVD uses an exact solver ARPACK, Randomized SVD uses approximation techniques.
    * ARPACK, the ARnoldi PACKage, is a numericalsoftware library written in FORTRAN 77 for solving large scale eigenvalue problems in the matrix-free fashion

In [0]:
import numpy as np
from sklearn.utils.extmath import randomized_svd

In [16]:
A = np.array([[1,2,3], [4,5,6], [5,7,9]])
u, s, vt = randomized_svd(A, n_components = 2)  # reduce to 2 features

print("Left Singular Vectors:")
print(u)

print("Singular Values:") 
print(np.diag(s))

print("Right Singular Vectors:") 
print(vt)

Left Singular Vectors:
[[ 0.2354116   0.78182354]
 [ 0.55937325 -0.5947842 ]
 [ 0.79478485  0.18703934]]
Singular Values:
[[15.66332312  0.        ]
 [ 0.          0.81259398]]
Right Singular Vectors:
[[ 0.41158755  0.56381288  0.71603821]
 [-0.8148184  -0.12429146  0.56623547]]


In [17]:
# Return the reduced matrix
# @ is used for matrix multiplication in Py3, use np.matmul with Py2
print('Reduced matrix:')
print(u @ np.diag(s) @ vt)

Reduced matrix:
[[1. 2. 3.]
 [4. 5. 6.]
 [5. 7. 9.]]


As we can see from above, even tough randomized SVD has reduced the features into 2, the generated U, S, V can still return the original matrix