# Singular Value Decomposition #

## ***Vocabulary***

**outer product**: multiplying a column vector vs a row vector.\
**inner product**: multiplying a row vector by a column vector of the same dimension, resulting in a single scalar value.\
**orthogonal**:\
**principle axes**:

# Lecture Notes #

## ***Introduction***

### **Introduction**

SVD is probably the most used and most important method for decomposing a matrix into a simpler structure, which helps you understand structural properties of the given matrix.

### **Netflix Challenge Problem**

This is equivalent to a "matrix completion" problem.

**Imagine that you're Netflix, and you'd like to predict which users will like certain movies.**

Consider a giant matrix, where the rows are people (Netflix subscribers), and the columns are movies. An entry in the matrix corresponds to the rating that person $i$ would give movie $j$. This is a real number, and higher numbers are better. Of course, this is a massive matrix. We also only know *some* of the entries, not all users watch or rate all movies. Netflix wants to take advantage of the information available to see if they can predict whether or not you'll like future movies. 

So he goal is to replace the missing entries with numbers that represent true preferences. In most cases, there is not much you can do to fill in empty values of a matrix. But in certain circumstances, with certain information, you can begin to fill in those entries.

Consider the following matrix:

$$ \begin{pmatrix} 1 & ? & ? \\ ? & 2 & ? \\ ? & 6 & 9 \\ ? & ? & 3 \\ 4 & 4 & ? \end{pmatrix} $$

And the following additional information about the matrix: Each row is a multiple of the other rows.

With that information, we can begin to deduce some of the values. For example, since the entry at [0,0] is a 1, and [4,0] is a 4, row 0 must be 1/4th of row 4. Thus, [0,1] is equal to 1.

Using this fact we can fill in the entire matrix:

$$ \begin{pmatrix} 1 & 1 & \frac{3}{2} \\ 2 & 2 & 3 \\ 6 & 6 & 9 \\ 2 & 2 & 3 \\ 4 & 4 & 6 \end{pmatrix} $$

### **Matrix Ranks**

The extra information given to us to solve the previous matrix, "Each row is a multiple of the other rows.", is equivalent to saying "The matrix has rank-1".

**Rank-0 Matrix**: The all 0's matrix\
**Rank-1 Matrix**: All rows are multiples of each other, or all columns are multiples of each other

If we have a rank-1 $m \times n$ matrix $A$, then

$$ A = u \otimes v^T $$

Where $u$ is an $m \times 1$ vector, and $v$ is an $n \times1$ vector. (Note $\otimes$ denotes an outer product operation).

This means the $ij^{th}$ entry of $A \equiv u_i * v_j$

### **Matrix Notation**

$$A = u \otimes v^T 
= uv^T
= \begin{pmatrix} u_1*v^T \\ u_2*v^T \\ \vdots \\ u_m*v^T \end{pmatrix} 
= \begin{pmatrix} u_1 v_1 & u_1 v_2 & \dots & u_1 v_n \\ u_2 v_1 & u_2 v_2 & \dots & u_2 v_n \\ \vdots & \vdots & \ddots & \vdots \\ u_m v_1 & u_m v_2 & \dots & u_m v_n \end{pmatrix} $$

## ***Rank Matrices***

Consider the case where $A$ is a rank-2 matrix. This means that $A$ is the sum of two rank-1 matrices (and $A$ is not rank-1). This means:

$$A = uv^T + wz^T$$

Thus, 

$$ A 
= \begin{pmatrix} u_1 v^T + w_1 z^T \\ u_2 v^T + w_2 z^T\\ \vdots \\ u_m v^T + w_m z^T\end{pmatrix}
= \begin{pmatrix} u_1 v_1 + w_1 z_1 & u_1 v_2 + w_1 z_2 & \dots & u_1 v_n + w_1 z_n \\ u_2 v_1 + w_2 z_1 & u_2 v_2 + w_2 z_2 & \dots & u_2 v_n + w_2 z_n \\ \vdots & \vdots & \ddots & \vdots \\ u_m v_1 + w_m z_1 & u_m v_2 + w_m z_2 & \dots & u_m v_n + w_m z_n \end{pmatrix}$$

This can also be thought of as multiplying an $m \times 2$ matrix of columns $u$ and $w$ by a $2 \times n$ matrix with rows $v^T$ and $^T$:

<br>
<center>
    <img src="images/1.12.1.png" alt="Professor Notes" />
</center>
<br>

## ***Defining the Singular Value Decomposition***

### **What is SVD**

A matrix $A$ can be factorized as:

$$A = USV^T$$ 

Where $U$ is an $m \times n$ orthogonal matrix, and the columns of $U$ are the **left singular vectors**. $V$ is an $n \times m$ orthogonal matrix, and the rows of $V^T$ are called the **right singluar vectors**. $S$ is an $m \times n$ diagonal matrix. The entries of $S$ are the **singular values of $A$**, and can be ordered, $s_1 \ge s_2 \ge ... \ge 0$, and are all greater than or equal to 0.

<br>
<center>
    <img src="images/1.12.2.png" alt="Professor Notes" />
</center>
<br>

There is only one set of singular values, and the singular values are unique. The singular vectors are not unique. 

Algebraically, $A$ can be written as:

$$A = \sum_{i=1}^{min(n,m)}s_iu_i \otimes v_i^T $$

### **What is SVD Doing**

SVD essentially breaks down the matrix AA into three components:
- Rotation/Reflection via $V^T$: The matrix $V^T$ rotates or reflects the space such that when applied to a vector, it aligns the directions with the principal axes of the matrix $A$.
- Scaling via $S$: The diagonal matrix $S$ scales these directions by the singular values. The larger the singular value, the more important that direction is in capturing the behavior of AA.
- Rotation/Reflection via $U$: Finally, $U$ applies another rotation or reflection to bring the space back into the original coordinates.

### **Computational Complexity**

The SVD can be computed in time $O(m^2n)$ or $O(n^2m)$, whicever is smaller.

### **Summary**
SVD decomposes a matrix into orthogonal directions (via $U$ and $V^T$) and scales those directions by singular values (via $S$). It’s a powerful tool for analyzing matrices, especially in dimensionality reduction, noise reduction, and understanding the underlying structure of data.

## ***Defining the Frobenius Norm***

The Frobenius Norm of a matrix is:

$$ \sqrt{\sum_{i|j}A_{ij}^2} $$

Given a matrix $A$, we want to find a matrix $A'$ such that $A'$ has rank $k$ and minimuzed $||A-A'||_F$ over all rank $k$ matrices.

How can we achieve this goal? Compute the SVD of $A$ and take the top $k$ singular vectors and values. Note that $A'$ is still size $m \times n$.

<br>
<center>
    <img src="images/1.12.3.png" alt="Professor Notes" />
</center>
<br>

## ***Matrix Completion***

### **Completing Matrices with Missing Values**

What are the options to complete matrices with missing entries?

There are some more naive options such as:
- Input 0
- Input the average of the known values
- Input the average value in that column or row

But the ideal choice is to:
1. Fill in the unknown values with some guess 
1. Find the best rank $k$ approximation to $A$ after filling in the unknown values
2. Output this best rank $k$ 

### **Intuition Behind This Approach**

Even though we may have a giant matrix $A$ that we need to complete, if it is not totally unstructured and maybe has low rank, we can hope that there was enough information in the entries that we did observe that this best rank $k$ approximation is actually a good approximation.

## ***Choosing K***

$k$ is a hyperparameter, so we must experiment with different values. One typical heuristic for choosing $k$ is to take enough singular values so that the sum of the remaining values is less than or equal to $\frac{1}{10}$ of the values you did take.

## ***Applying SVD***

### **Using SVD for Linear Regression**

**SVD for the Linear Regression Problem when A is diagonal (Easy Case)**

To illustrate how SVD works, let's consider the linear regression problem:

$$ \underset{x \in \mathbf{R}^n}{\min} ||Ax-b||^2 $$

Where $ b \in \mathbf{R}^m$, and $A$ is an $m\times n$ matrix. The easy case is that $A = D$ (diagonal):

$$ D = \begin{pmatrix} d_1 & & & \\ & d_2 & & \\ & & \ddots & \\ & & & 0 \end{pmatrix} $$

**What is the optimal value for $x_1$?**

$x_1 = \frac{b_1}{d_1}$, $x_2 = \frac{b_2}{d_2}$, etc. \
If $d_j = 0 \implies x_j = 0$.

So the solution is to multiply $b$ times $D^\dagger$ (the pseudoinverse of $D$), where:
$$D^\dagger = \begin{pmatrix} \frac{1}{d_1} & & & \\ & \frac{1}{d_2} & & \\ & & \ddots & \\ & & & 0 \end{pmatrix} $$

**Solution:**

$$ x = D^\dagger * b $$

### **Solving A in General**

Now let's solve for $A$ when $A$ is not necessarily diagonal.

$$ ||Ax-b||^2 \equiv \underset{x}{\min} ||USV^Tx-b||^2$$
$$ ||Ux|| = ||x|| \equiv \underset{x}{\min} ||SV^Tx-U^Tb||^2$$
$$ y = V^Tx \equiv Vy = x $$
$$ \equiv \underset{x}{\min} ||Sy-U^Tb||^2 $$
$$ \implies y = S^\dagger*U^Tb $$
$$ \implies x = VS^\dagger U^Tb $$

So, for the linear regression problem:

$$ x = VS^\dagger U^Tb $$

### **Using SVD to Perform PCA**

Recall in PCA we wanted to find the eigendecomposition of a covariance matrix. Starting with $X^TX$ and taking the SVD of it:

$$X^TX = (USV^T)^T*USV^T$$

Transposing the left matrix gives:

$$X^TX = VSU^T*U*SV^T$$

Since $U^TU$ is the identity matrix:

$$X^TX = VS^2V^T$$

#### **Information Gained From SVD**

Since $V$ is orthogonal and $S$ is diagonal, using SVD on $X^TX$ has yielded the eigendecomposition of $X^TX$.

And, the right singular vectors of $X$ (rows of $V^T$) **are** the principle components (top eigenvectors of $X^TX$). In this way, SVD provides more information that PCA.

And, the singular values are the square root of the eigenvalues of $X^TX$.

### **Using SVD to Perform Image Compression**

See the paper "A Singularly Valuable Decomposition: The SVD of a Matrix".

<br>
<center>
    <img src="images/1.12.4.png" alt="Professor Notes" />
</center>
<br>

# Personal Notes #