# Singular Value Decomposition (SVD)

## Overview

The Singular Value Decomposition of an $m \times n$ matrix $\mathbf{A}$ is a factorization of the form:

$\mathbf{A}_{m \times n} = \mathbf{U} \Sigma \mathbf{V}^T$

where:

* $\mathbf{U}$ is an $m \times r$ orthonormal matrix, where $r$ is the rank of $\mathbf{A}$.
* $\Sigma$ is an $r \times r$ diagonal matrix containing the singular values of $\mathbf{A}$.
* $\mathbf{V}$ is an $n \times r$ orthonormal matrix.

**Notable Properties**

* The matrices $\mathbf{U}, \Sigma,$ and $\mathbf{V}$ are unique, except for possible permutations of the columns of $\mathbf{U}$ and 
$\mathbf{V}$.
* The singular values in $\Sigma$ are non-negative and sorted in descending order: $\sigma_1 \geq \ldots \geq \sigma_r > 0$.
* The matrices $\mathbf{U}$ and $\mathbf{V}$ have orthonormal columns: $\mathbf{U}^T \mathbf{U} = \mathbf{I}$ and $\mathbf{V}^T 
\mathbf{V} = \mathbf{I}$.

The SVD and its corresponding singular values of $\Sigma$ hold significant value due to their analogous nature to eigenvalues. While eigenvalues are a fundamental concept in linear algebra, speaking from the world of numerical/computational mathematics, they are only reliable when working with square matrices. 
In contrast, singular values can be used with matrices of any size, making them a powerful tool (notice that above I have specified that $\mathbf{A}$ is $m \times n$ in size). This is particularly relevant in the Data Science world, where data rarely comes in perfectly square matrices. As a result, singular values offer a more flexible and versatile alternative to eigenvalues, allowing practitioners to extract meaningful insights from data sets. As far as real world use goes the SVD and singular values are used in many real world applications. Below are just a few examples of some of the applications.

**Real World Applications of SVD**

* Dimensionality reduction: SVD can be used to reduce the dimensionality of a dataset while preserving most of the information.
* Computing the pseudoinverse of a matrix
* Data compression: SVD can be used to compress a matrix by retaining only the top few singular values and their corresponding columns in 
$\mathbf{U}$ and $\mathbf{V}$.
* Image processing: SVD is often used in image processing algorithms, such as image denoising and deblurring.

So if the singular values and the $\Sigma$ matrix in particular are so useful, let's start there and discuss how we can compute them.

## Computing the Singular Values

**Step 1: Compute a Eigenvalues for $\mathbf{A}^T \mathbf{A}$**

The first step is to compute the eigenvalues ($\lambda$'s) of the matrix $\mathbf{A}^T \mathbf{A}$, where $\mathbf{A}$ is a $m \times n$ matrix. This can be done because $\mathbf{A}^T \mathbf{A}$ will be square (i.e. $m \times m$ dimensions). There are a variety of methods for computing eigenvalues and I'll refer you to the notebook on computing eigenvalues for examples of such methods. 

**Step 2: From the Eigenvalues Compute the Singular Values**

After computing the eigenvalues, we can compute the singular values of $\mathbf{A}$ by simply taking the square roots of the eigenvalues.

$$\sigma_i = \sqrt{\lambda_i},$$for $i=1,\ldots,r$..

The matrix of singular values ($\Sigma$) can then be constructed by the following:

$$\Sigma = \begin{bmatrix}
\sigma_1 & 0 & 0 & 0 \\
0 & \sigma_2 & 0 & 0 \\
0 & 0 & \ddots & 0 \\
0 & 0 & 0 & \sigma_r
\end{bmatrix}$$

To see that this is the case, lets test it out using the `numpy.linalg.eig()` and `numpy.linalg.svd()` functions in the following code.

In [12]:
import numpy as np

# Generate random 10x4 A matrix with values between -100 and 100
A = np.array(np.random.randint (-100, 100, (10, 4)), dtype=np.float64)
print(f"A:\n {A}")
# Compute A transpose A
ATA = np.dot(A.T, A)
print(f"AT A:\n {ATA}")
# Compute the eigenvalues and eigenvectors of A transpose A
ATA_eigvals, ATA_eigvecs = np.linalg.eig(ATA)
# Take the square root of the eigenvalues to compute the singular values
ATA_sigmas = np.sqrt(ATA_eigvals)
print(f"Singular Values for A transpose A: \n {ATA_sigmas}")

# Confirm the singular values
U, Sigma, VT = np.linalg.svd(A, full_matrices=False)
print(f"Diagonal of Sigma Matrix from decomposition: \n {Sigma}")

A:
 [[ 52. -64.  -4. -62.]
 [ 49.  64.  47.   7.]
 [ 95.  38.  52.  76.]
 [-75. -21. -34. -45.]
 [ 25. -85.  72. -67.]
 [-39. -60.  77.  59.]
 [ 35.  47. -34.  33.]
 [-76. -10. -23. -61.]
 [-91.  87. -75. -35.]
 [ 46. -55.  23.  24.]]
AT A:
 [[ 39299.  -2834.  16823.  13818.]
 [ -2834.  33805. -13944.   8200.]
 [ 16823. -13944.  25037.   9236.]
 [ 13818.   8200.   9236.  26275.]]
Singular Values for A transpose A: 
 [245.40540072 204.23670993 125.96413851  81.31784199]
Diagonal of Sigma Matrix from decomposition: 
 [245.40540072 204.23670993 125.96413851  81.31784199]


Notice that the singular values, when computing the eigenvalues and then taking the square root of them, are equivalent to what we get along the diagonal of $\Sigma$ when we compute a SVD. Also note that the singular values are the same for $\mathbf{A}^T \mathbf{A}$ are the same as for $\mathbf{A}$.

## Computing $\mathbf{U}$ and $\mathbf{V}$

To get $\mathbf{V}^T$ just compute the eigenvectors of $A^T A$ and to form the columns of the $\mathbf{V}^T$ matrix. This works because if $\mathbf{A}_{m \times n} = \mathbf{U} \Sigma \mathbf{V}^T$, then...

$\mathbf{A}^T \mathbf{A}  = \mathbf{V} \Sigma \mathbf{U}^T \mathbf{U} \Sigma \mathbf{V}^T = \mathbf{V} {\Sigma}^T \Sigma \mathbf{V}^T = \mathbf{V} {\Sigma}^2 \mathbf{V}^T$ 

Remember $\mathbf{U}$ and $\mathbf{V}$ are have orthonormal columns. That is why the $\mathbf{U}^T \mathbf{U}$ cancels. The ${\Sigma}^2$ also helps explain the relationship between the eigenvalues and singular values. We can see this by examining the derivation of the Eigendecomposition for some square $A$:

$A \vec{v} = \lambda \vec{v}$

$AQ = Q\lambda$

$A = Q \Lambda Q^{-1}$ (Where $\Lambda$ is a diagonal matrix with the eigenvalues along the diagonal)

Even though we did not directly discuss the Eigendecomposition before in the notes on computing eigenvalues, hopefully (if you've seen those notes) you can see we discussed around the topic of the Eigendecomposition. Hopefully you can also see that $A$ and $\Lambda$ are similar matrices. 

But remember, for this to work we need a square $A$. We accomplish that for the SVD by computing $A^T A$, so we can get the eigenvalue of $A^T A$. So if we stick everything next to one another we can see the following:

$A^T A = Q \Lambda Q^{-1}$

$\mathbf{A}^T \mathbf{A} = \mathbf{V} {\Sigma}^2 \mathbf{V}^T$

Our orthonormal $Q$ and $V$ matrices line up nicely. So does $\Lambda$ and ${\Sigma}^2$. This tells us that the eigenvalues for $A^T A$ or $\Lambda = {\Sigma}^2$. So if we just want the singular values or $\Sigma$ the we just need to take the square root of each eigenvalue for $A^T A$. Similarly if we want to get $V^T$ then we just need to get the eigenvectors.

Then once we have $\Sigma$ and $V^T$we can compute $U$ easily by the following:

$U_i = \frac{A V_i} {\sigma_i}$

Or more efficiently:

$U = A V \Sigma^{-1}$

Normally inverting matrices should be avoided unless the matrices are orthonormal of diagonal. In this case $\Sigma$ is diagonal, so inverting it is easy since the inverse will also be diagonal, and the values along the diagonal for the inverse of the original matrix will just be $\frac{1} {\sigma_i}$. 

With the following code, we'll test out computing $V^T$, $U$, and also ensure that when we compute $U \Sigma V^T$ we get back A.

In [13]:
print(f"Eigenvectors of A transpose A (the columns): \n{ATA_eigvecs.T}\n")
print(f" V transpose matrix from np.linalg.svd: \n{VT}\n")
comp_U = np.dot(np.dot(A, ATA_eigvecs), (1/ Sigma) * np.identity(len(Sigma)))
print(f"U from direct computation:\n {comp_U}\n")
print(f"U from np.linalg.svd:\n {U}\n")
comp_svd = np.dot(np.dot(comp_U, np.identity(len(Sigma)) * Sigma), ATA_eigvecs.T)
print(f"Recomputing A to check SVD:\n {comp_svd}")

Eigenvectors of A transpose A (the columns): 
[[ 0.71414381 -0.24257887  0.53681873  0.37812665]
 [-0.1966874  -0.82506629  0.22620385 -0.47896921]
 [-0.65634953 -0.12468739  0.35693412  0.65288313]
 [ 0.14324097 -0.49484762 -0.73024346  0.44872302]]

 V transpose matrix from np.linalg.svd: 
[[-0.71414381  0.24257887 -0.53681873 -0.37812665]
 [ 0.1966874   0.82506629 -0.22620385  0.47896921]
 [-0.65634953 -0.12468739  0.35693412  0.65288313]
 [ 0.14324097 -0.49484762 -0.73024346  0.44872302]]

U from direct computation:
 [[ 0.11030482  0.34943656 -0.5402861   0.17485615]
 [ 0.19292715 -0.27009409 -0.14920941 -0.68658759]
 [ 0.46974461 -0.36563888  0.00864028 -0.11148879]
 [-0.34120751  0.22493816  0.08200072  0.05268793]
 [ 0.21103554  0.55617359 -0.18937313 -0.45499116]
 [ 0.20516108  0.22686078  0.78659615 -0.06947588]
 [ 0.03186633 -0.33862223 -0.15419593  0.26306322]
 [-0.35558181  0.2311697   0.02456321 -0.20308387]
 [-0.56880283 -0.26480832 -0.00588234 -0.20934787]
 [ 0.27552093 

## SVD Considerations for Non-Square Matrices

What about if we try to use $\mathbf{A} \mathbf{A}^T$ to compute a SVD? Let's experiment with what happeens when try to it to compute the same singular values with the below code:

In [14]:
# Compute A A transpose
AAT = np.dot(A, A.T)
print(f"AT A:\n {AAT}\n")
# Compute the eigenvalues and eigenvectors of A A transpose
AAT_eigvals, AAT_eigvecs = np.linalg.eig(AAT)
# Take the square root of the eigenvalues to compute the singular values
AAT_sigmas = np.sqrt(AAT_eigvals)
print(f"Singular Values for A A transpose: \n {AAT_sigmas}")

AT A:
 [[ 1.0660e+04 -2.1700e+03 -2.4120e+03  3.7000e+02  1.0606e+04 -2.1540e+03
  -3.0980e+03  5.6200e+02 -7.8300e+03  4.3320e+03]
 [-2.1700e+03  8.7550e+03  1.0063e+04 -6.9320e+03 -1.3000e+03 -1.7190e+03
   3.3560e+03 -5.8720e+03 -2.6610e+03 -1.7000e+01]
 [-2.4120e+03  1.0063e+04  1.8949e+04 -1.3111e+04 -2.2030e+03  2.5030e+03
   5.8510e+03 -1.3432e+04 -1.1899e+04  5.3000e+03]
 [ 3.7000e+02 -6.9320e+03 -1.3111e+04  9.2470e+03  4.7700e+02 -1.0880e+03
  -3.9410e+03  9.4370e+03  9.1230e+03 -4.1570e+03]
 [ 1.0606e+04 -1.3000e+03 -2.2030e+03  4.7700e+02  1.7523e+04  5.7160e+03
  -7.7790e+03  1.3810e+03 -1.2725e+04  5.8730e+03]
 [-2.1540e+03 -1.7190e+03  2.5030e+03 -1.0880e+03  5.7160e+03  1.4531e+04
  -4.8560e+03 -1.8060e+03 -9.5110e+03  4.6930e+03]
 [-3.0980e+03  3.3560e+03  5.8510e+03 -3.9410e+03 -7.7790e+03 -4.8560e+03
   5.6790e+03 -4.3610e+03  2.2990e+03 -9.6500e+02]
 [ 5.6200e+02 -5.8720e+03 -1.3432e+04  9.4370e+03  1.3810e+03 -1.8060e+03
  -4.3610e+03  1.0126e+04  9.9060e+03 -4.939

That does not work out nicely. First, notice how large $\mathbf{A} \mathbf{A}^T$ is. It is a $10 \times 10$ matrix, so right away this is more computationally intensive than $\mathbf{A}^T \mathbf{A}$, which is of smaller dimension. More importantly, notice that there are 10 singular values. In fact, some of these singular values even contain an imaginary component. Even though the first 4 $\sigma_i$ agree with the 4 singular values we computed before for $\mathbf{A}^T \mathbf{A}$ and we could throw away all the complex $\sigma_i$, we don't want to complicate things with complex numbers if we can avoid it. More importantly from the computational/numerical perspective, we don't want to have to compute more than we have to.

This brings up an important point, one which has been addressed with previous notes/examples when working with non-square matrices. The point is that paying attention to the dimensions of the matrices we work with matters. Specifically, when computing a singular value decomposition (SVD) for a non-square matrix, the choice of whether to use $\mathbf{A}^T \mathbf{A}$ or $\mathbf{A} \mathbf{A}^T$ depends crucially on the dimensions of the matrix we want to decompose. If we have a matrix with more rows than columns ($m > n$), as is often the case in machine learning applications (data tables), it is more efficient and numerically stable to use $\mathbf{A}^T \mathbf{A}$, since this reduces the dimensions of the matrices we will be working with and thus avoids unnecessary computations. On the other hand, if we have a matrix with more columns than rows($m < n$), as may be the case in other applications, it is better to use $\mathbf{A} \mathbf{A}^T$ for the same reasons. Put more bluntly, if we have the option to work with smaller matrices, we generally always want to make the choice of that option.

Let's see this work for a new matrix $B$ that is ($4 \times 10$) in dimension with the following code:

In [9]:
# Generate random 4x10 A matrix with values between -100 and 100
B = np.array(np.random.randint (-100, 100, (4, 10)), dtype=np.float64)
print(f"B:\n {B}")
BBT = np.dot(B, B.T)
print(f"B BT:\n {BBT}")
BBT_eigvals, BBT_eigvecs = np.linalg.eig(BBT)
BBT_sigmas = np.sqrt(BBT_eigvals)
print(f"Singular Values for A transpose A: \n {BBT_sigmas}")

# Confirm the singular values
B_U, B_Sigma, B_VT = np.linalg.svd(B, full_matrices=False)
print(f"Diagonal of Sigma Matrix from decomposition: \n {B_Sigma}")
# Confirm U
print(f"Eigenvectors of B B transpose (the columns): \n{BBT_eigvecs}\n")
print(f"U matrix from np.linalg.svd: \n{B_U}\n")
# Confirm V^T
B_VT_comp = np.dot(np.dot((1/ B_Sigma) * np.identity(len(B_Sigma)), B_U.T), B)
print(f"Computed V transpose matrix: \n{B_VT_comp}\n")
print(f"V transpose matrix from np.linalg.svd: \n{B_VT}\n")

B:
 [[ 90.   3.   8.  60. -67.  36. -43. -34. -66.  49.]
 [ 45. -73. -80.  46.   0. -43.  18.  62. -41.  64.]
 [-33.  43. -92.  67.  97.  78.  94.  13.  21.  81.]
 [-46.   9. -81.  88.  48.  69.  25. -15.  90.  90.]]
B BT:
 [[27320.  7363. -5149. -2308.]
 [ 7363. 27664.  9285.  6424.]
 [-5149.  9285. 47391. 36626.]
 [-2308.  6424. 36626. 40617.]]
Singular Values for A transpose A: 
 [288.6381152  185.83475462 135.60516204  82.19928502]
Diagonal of Sigma Matrix from decomposition: 
 [288.6381152  185.83475462 135.60516204  82.19928502]
Eigenvectors of B B transpose (the columns): 
[[ 0.06923258  0.74043592 -0.65347792 -0.14116692]
 [-0.18797977  0.66775647  0.70710513  0.1369936 ]
 [-0.72754955 -0.06332772 -0.00126341 -0.68312491]
 [-0.65615707 -0.04295978 -0.27012469  0.70331004]]

U matrix from np.linalg.svd: 
[[-0.06923258  0.74043592  0.65347792 -0.14116692]
 [ 0.18797977  0.66775647 -0.70710513  0.1369936 ]
 [ 0.72754955 -0.06332772  0.00126341 -0.68312491]
 [ 0.65615707 -0.0429597

As you can see, the computation of singular values using $A A^T$ for a matrix with more columns than rows works out really nicely since its only a $4 \times 4$ matrix and thus we only get back 4 singular values. We don't have to deal with any complex $\sigma_i$, or throw any away, and most importantly we take the easier route computationally by working with a smaller matrix. We can also see that the eigenvectors we get through `np.linalg.eig()` make up the columns of the U matrix instead of V transpose, as was the case for $A^T A$. 

This is crucial to note because when it comes to computing the SVD of a matrix, the methods for obtaining the $\mathbf{U}$ and $\mathbf{V}^{\mathrm{T}}$ matrices differ depending on the dimensionality of the original matrix. Specifically, if we start with a $m \times n$ matrix $\mathbf{A}$, where $m \geq n$, then using $\mathbf{A}^{\mathrm{T}} \mathbf{A}$ allows us to compute the columns of $V^T$ as the eigenvectors of the $\mathbf{A}^{\mathrm{T}} \mathbf{A}$ matrix, while $U$ is computed with the method shown previously. 

On the other hand, when starting with a matrix $\mathbf{A}$ where $n > m$, and we use $A A^T$, the methods for computing $\mathbf{U}$ and $\mathbf{V}^{\mathrm{T}}$ swap. So the eigenvectors we compute make up the columns of our $U$ matrix, and thus we compute $V^T$ with the same methods shown previously for the $A^T A$ case when computing $U$. This subtle distinction can have significant implications for the efficiency and accuracy of computing a SVD, highlighting the importance of careful consideration of the matrix dimensions when compting a SVD.

So that is why it is so important to pay attention to dimensions.

## Reference Links

https://pages.mtu.edu/~struther/Courses/OLD/Other/Sp2012/5627/SVD/Report/Singular%20Value%20Decomposition%20and%20its%20numerical%20computations.pdf

https://youtu.be/mBcLRGuAFUk?si=3jrTEC-5jr2tvgYr

https://youtu.be/rYz83XPxiZo?si=iWHsUpyBR9w4-CSZ

https://en.wikipedia.org/wiki/Singular_value_decomposition#Applications_of_the_SVD