# Matrix Decomposition

Data is often represented in matrix form as well, e.g.,
where the rows of the matrix represent different people and the columns
describe different features of the people, such as product rating, height, and socioeconomic
status. In this chapter, we present three aspects of matrices: 
- how to summarize matrices
- how matrices can be decomposed
- how these decompositions can be used for matrix approximations.

An analogy for matrix decomposition is the factoring of numbers, such as the factoring of
21 into prime numbers $7 \dot 3$. For this reason matrix decomposition is also
matrix factorization often referred to as _matrix factorization_.

## Determinant and Trace

A [determinant](https://www.mathsisfun.com/algebra/matrix-determinant.html) is
a mathematical object in the analysis and solution of systems of linear
equations. Determinants are only defined for square matrices $A \in{\rm I\!R}^{nxn}$
i.e., matrices with the same number of rows and columns. we write the determinant as $det(A)$ or sometimes as $\vert A \vert$ so that (The determinant notation $\vert A \vert$ must not be confused with the absolute value.)

\begin{equation}
det(A) = \begin{vmatrix}
a_{11} & a_{12} & \cdots & a_{1n}\\
a_{21} & a_{22} & \cdots & a_{2n}\\
\vdots & \vdots &  & \vdots\\
a_{n1} & a_{n2} & \cdots & a_{nn}\\
\end{vmatrix}
\end{equation}

The determinant of a square matrix $A \in{\rm I\!R}^{nxn}$ is a function that maps $A$ onto a real number.

We call a square matrix $T$ an upper-triangular matrix if $T_{ij} = 0$ for upper-triangular
$i > j$, i.e., the matrix is zero below its diagonal. Analogously, we define a matrix
lower-triangular matrix as a matrix with zeros above its diagonal. For a tri- lower-triangular
angular matrix $T \in{\rm I\!R}^{nxn}$, the determinant is the product of the diagonal
elements, i.e.,

\begin{equation}
det(T) =  \prod_{i=1}^{n}T_{ii}
\end{equation}

Computing the determinant of an $n\times n$ matrix requires a general algorithm
to solve the cases for $n > 3$. Below reduces the problem of computing the determinant
of an $n \times n$ matrix to computing the determinant of $(n-1)\times(n-1)$
matrices. By recursively applying the Laplace expansion (Theorem 4.2),
we can therefore compute determinants of $n \times n$ matrices by ultimately
computing determinants of $2 \times 2$ matrices.

**Laplace Expansion**: Let us compute the determinant of:

\begin{equation}
det(A) = \begin{vmatrix}
1 & 2 & 3 \\
3 & 1 & 2 \\
0 & 0 & 1 \\
\end{vmatrix}
\end{equation}

using the Laplace expansion along the first row ([see](https://www.mathsisfun.com/algebra/matrix-determinant.html)) yields:

\begin{equation}
\begin{vmatrix}
1 & 2 & 3 \\
3 & 1 & 2 \\
0 & 0 & 1 \\
\end{vmatrix}
= (-1)^{1+1} \cdot 1
\begin{vmatrix}
1 & 2 \\
0 & 1 \\
\end{vmatrix} + (-1)^{1+2} \cdot 2
\begin{vmatrix}
3 & 2 \\
0 & 1 \\
\end{vmatrix} + (-1)^{1+3} \cdot 3
\begin{vmatrix}
3 & 1 \\
0 & 0 \\
\end{vmatrix}
\end{equation}

We use above to compute the determinants of all $2 \times 2$ matrices and obtain
\begin{equation}
det(A) = 1(1 - 0) - 2(3 - 0) + 3(0 􀀀 0) = -5
\end{equation}
For completeness we can compare this result to computing the determinant
\begin{equation}
det(A) = 1\cdot1\cdot1+3\cdot0\cdot3+0\cdot2\cdot2-0\cdot1\cdot3-1\cdot0\cdot2-3\cdot2\cdot1 = 1-6 = -5
\end{equation}

It pattern for calculating the determinant is:

![image.png](attachment:372aa962-efeb-46bb-a3fb-40147928ffb3.png)


**Trace of a square matrix**

The trace of a square matrix $A \in {\rm I\!R}^{n\times n}$ is defined as:
\begin{equation}
tr(A) := \sum_{i=1}^na_{ii}
\end{equation}
The trace is the sum of the diagonal elements of A

## Eigenvalues and Eigenvectors

For a square matrix $A$, an Eigenvector and Eigenvalue make this equation true:

\begin{equation}
Av = \lambda v
\end{equation}

- $v$ is the corresponding eigenvector
- $\lambda$ is the eigenvalue

For this matrix:
\begin{equation}
A = \begin{bmatrix}
-6 & 3 \\
4 & 5 \\
\end{bmatrix}
\end{equation}
an eigenvector is:
\begin{equation}
A = \begin{bmatrix}
1 \\
4 \\
\end{bmatrix}
\end{equation}
with a matching eigenvalue of 6

$Av$ gives us:

\begin{equation}
A = \begin{bmatrix}
-6 & 3 \\
4 & 5 \\
\end{bmatrix}
\begin{bmatrix}
1 \\
4 \\
\end{bmatrix} =
\begin{bmatrix}
-6\times 1 + 3 \times 4 \\
4 \times 1 + 5 \times 4 \\
\end{bmatrix} = 
\begin{bmatrix}
6 \\
24  \\
\end{bmatrix}
\end{equation}

$\lambda v$ gives us:
\begin{equation}6
\begin{bmatrix}
1 \\
4  \\
\end{bmatrix} =
\begin{bmatrix}
6 \\
24  \\
\end{bmatrix}
\end{equation}
Notice how we multiply a matrix by a vector and get the same result as when we multiply a scalar by that vector.

**add example of how this is calculated**

### How do we find the eigen values?

**Eigenspace and Eigenspectrum**

For $A\in{\rm I\!R}^{n\times n}$, the set
of all eigenvectors of $A$ associated with an eigenvalue $\lambda$ spans a subspace
eigenspace of ${\rm I\!R}^{n}$, which is called the eigenspace of $A$ with respect to $\lambda$ and is denoted
by$E_\lambda$. The set of all eigenvalues of A is called the eigenspectrum, or just
spectrum spectrum, of $A$.

**Computing Eigenvalues, Eigenvectors, and Eigenspaces**:

Let us find the eigenvalues and eigenvectors of the 2 x 2 matrix

\begin{equation}
A = \begin{bmatrix}
4 & 2 \\
1 & 3 \\
\end{bmatrix}
\end{equation}

1. **Characteristic Polynomial**: From our definition of the eigenvector $x \neq 0$ and eigenvalue $\lambda$ of $A$, there will be a vector such that $Ax=\lambda x$ i.e $(A-\lambda I)x=0$. Since $x\neq 0$, this requires that the kernal (null space) of $A - \lambda I$ contains more elements than just 0. This means that $A-\lambda I$ is not invertible and therefore $det(A-\lambda I)=0$. Hence, we need to compute the roots of the characteristic polynomial to find eigenvalues.
2. **Eigenvalues**: The characteristic polynomial is:
\begin{equation}
P_A(\lambda) = det(A-\lambda I)\\
= det \left( \begin{bmatrix}
4 & 2 \\
1 & 3 \\
\end{bmatrix} - 
\begin{bmatrix}
\lambda & 0 \\
0 & \lambda \\
\end{bmatrix} \right) = 
\begin{vmatrix}
4 - \lambda & 2 \\
1 & 3 - \lambda \\
\end{vmatrix} \\
= (4 - \lambda)(3 - \lambda) - 2 \cdot 1
\end{equation}
We factorize the characteristic polynomial and obtain
\begin{equation}
p(\lambda) = (4-\lambda)(3-\lambda)-2 \cdot 1 = 10 - 7\lambda + \lambda^2 = (2-\lambda)(5-\lambda)
\end{equation}
giving the roots $\lambda_1 = 2$ and $\lambda_2 = 5$
3. **Eigenvectors and Eigenspaces**: We find the eigenvectors that correspond to these eigenvalues by looking at vectors $x$ such that
\begin{equation}
\begin{bmatrix}
4 - \lambda & 2 \\
1 & 3 - \lambda \\
\end{bmatrix}x=0
\end{equation}
For $\lambda=5$ we obtain
\begin{equation}
\begin{bmatrix}
4 - 5 & 2 \\
1 & 3 - 5 \\
\end{bmatrix}
\begin{bmatrix}
x_1 \\
x_2 \\
\end{bmatrix} = 
\begin{bmatrix}
-1 & 2 \\
1 & -2 \\
\end{bmatrix}
\begin{bmatrix}
x_1 \\
x_2 \\
\end{bmatrix} = 0
\end{equation}
We solve this homogeneous system and obtain a solution space
\begin{equation}
E_5=span[
\begin{bmatrix}
2 \\
1 \\
\end{bmatrix}
]
\end{equation}
This eigenspace is one-dimensional as it possesses a single basis vector. Analogously, we find the eigenvector for $\lambda = 2$ by solving the homogeneous
system of equations
\begin{equation}
\begin{bmatrix}
4 - 2 & 2 \\
1 & 3 - 2 \\
\end{bmatrix} x =
\begin{bmatrix}
2 & 2 \\
1 & 1 \\
\end{bmatrix}
x = 0
\end{equation}

This means any vector $x= \begin{bmatrix}x_1 \\ x_2 \end{bmatrix}$ where, $x_2=-x_1$, such as $x= \begin{bmatrix}1 \\ -1 \end{bmatrix}$, is an eigenvector with an eigenvalue of 2. The corresponding eigenspace is given as
\begin{equation}
E_2=span[
\begin{bmatrix}
1 \\
-1 \\
\end{bmatrix}
]
\end{equation}

The two eigenspaces $E_5$ and $E_2$ in above are one-dimensional
as they are each spanned by a single vector. However, in other cases
we may have multiple identical eigenvalues and the
eigenspace may have more than one dimension.

### Graphical Intuition in Two Dimensions

Let us gain some intuition for determinants, eigenvectors, and eigenvalues
using different linear mappings. The figure below depicts five transformation matrices $A_1, \cdots, A_5$ and their impact on  a square grid of points, centered at the origin:
- $A_1 = \begin{bmatrix} \frac{1}{2} & 0 \\ 0 & 2\end{bmatrix}$. The vertical axis is extended by a factor of 2 (eigenvalue $\lambda_1 = 2$), and the horizontal axis is compressed by a factor $\frac{1}{2}$ (eigenvalue $\lambda_2 = \frac{1}{2}$)
- $A_2 = \begin{bmatrix} 1 & \frac{1}{2} \\ 0 & 1\end{bmatrix}$ corresponds to a shearing mapping i.e. it shears the points along the horizontal axis to the right if they are on the positive half of the vertical axis, and to the left vice versa
- $A_3 = \begin{bmatrix}  cos(\frac{\pi}{6}) & -sin(\frac{\pi}{6}) \\  sin(\frac{\pi}{6}) &  cos(\frac{\pi}{6})\end{bmatrix}$. This matrix rotates the points by $\frac{\pi}{6}$ rad = $30^{\circ}$ counter-clockwise
- $A_4 = \begin{bmatrix} 1 & -1 \\ -1 & 1\end{bmatrix}$ represents a mapping in the standard basis that collapses a two-dimensional domain onto one dimension.Since one eigenvalue is zero, the space in direction of the (blue) eigenvector corresponding to $\lambda_1 = 0$ collapses, while the orthogonal (red) eigenvector stretches space by a factor $\lambda_2 = 2$
- $A_5 = \begin{bmatrix} 1 & \frac{1}{2} \\ \frac{1}{2} & 1\end{bmatrix}$ s a shear-and-stretch mapping that scales space by 75% since $|det(A_5)| = \frac{3}{4}$. It stretches space along the (red) eigenvector of $\lambda_2$ by a factor of 1.5 and compresses it along the orthogonal (blue) eigenvector by a factor 0.5

![image.png](attachment:1b73dd4c-65f4-4a92-8019-c88e62221f87.png)

## Cholesky Decomposition

There are many ways to factorize special types of matrices that we encounter
often in machine learning. In the positive real numbers, we have
the square-root operation that gives us a decomposition of the number
into identical components, e.g., $9 = 3 \cdot 3$. For matrices, we need to be
careful that we compute a square-root-like operation on positive quantities.
For symmetric, positive definite matrices (see Section 3.2.3), we can
 choose from a number of square-root equivalent operations. The Cholesky
decomposition decomposition/Cholesky factorization provides a square-root equivalent operation on symmetric, positive definite matrices that is useful in practice.

## Single Value Decomposition

The singular value decomposition (SVD) of a matrix is a central matrix
decomposition method in linear algebra. It can be
applied to all matrices, not only to square matrices, and it always exists.
Moreover, as we will explore in the following, the SVD of a matrix $A$,
which represents a linear mapping $\Phi : V \rightarrow W$, quantifies the change
between the underlying geometry of these two vector spaces

![image.png](attachment:475b2755-d31b-47da-aee6-7f935133651f.png)

The diagonal entries of $\sigma_i, i=1, \cdots, r$ of $\sum$ are called singular values,
$u_i$ are called the left-singular vectors, and $v_j$ are called the right-singular vectors.
By convention, the singular calues are ordered, i.e. $\sigma_1  \geqslant \sigma_2 \geqslant \sigma_r \geqslant 0$
The singular value matrix $\sum$ is unique, but it requires some attention. Here $\sum$ is rectangular and is the same
size as $A$. This means $\sum$ has a diagonal submatrix that contains the singular values and needs additional zero
padding. Specifically, if $m > n$ then the $\sum$ has a diagonal structure up to row $n$ and then consists of $\mathbf{O}^\top$
row vectors from $n+1$ to $m$ below so that
\begin{equation}
\sum = 
\begin{bmatrix}
\sigma_1 & 0 & 0 \\
0 & \ddots & 0 \\
0 & 0 & \sigma_n \\
0 & \cdots & 0 \\
\vdots & & \vdots \\
0 & \cdots & 0
\end{bmatrix}
\end{equation}
if $M < n$, the matrix $\sum$ has a diagonal structure up to column $m$ and columns that consist of $0$ from $m+1$ to n:
\begin{equation}
\sum = 
\begin{bmatrix}
\sigma_1 & 0 & 0 & 0 & \cdots & 0 \\
0 & \ddots & 0 & 0 & & 0 \\
0 & 0 & \sigma_n & 0 & \cdots & 0 \\
\end{bmatrix}
\end{equation}

**The SVD exists for any matrix $A \in {\rm I\!R}^{mxn}$**

Example SVD for text dataset:

![image.png](attachment:0909496f-fb6b-4721-b896-1eb8d7f4c1a6.png)

### Construction of SVD

From above the matrix $U$ and $V$ can be considered mixtures that approximate $A$ when multiplied together scaled by the sigma value ($\sum$)

- $U$ contains information about the column space of $A$
- $V$ contains information about the row space of $A$
- $\simga$ tells you how importanct each of the columns are and is heirarchically arranged from most important to least

**Add example here**