# Eigenvalues and Eigenvectors
Let $\mathbf{A}$ be a square matrix. If there is a scalar $\lambda$ and a vector $\vec{v}\neq\vec{0}$ such that

$$\mathbf{A}\vec{v} = \lambda\vec{v}$$

then we say that $\lambda$ and $\vec{v}$ are an eigenvalue/eigenvector pair for the matrix $\mathbf{A}$.

Re-write the eigenvalue/eigenvector equation as follows

$$(\mathbf{A}-\lambda\mathbf{I})\vec{v} = \vec{0}.$$

The only way for the above linear system to have a nonzero solution $\vec{v}$ is if the coefficient matrix is singular (i.e. not invertible), in which case there are an infinite number of nonzero solutions.

Theorem: Every $\mathbf{A}\in\mathbb{C}^{n\times n}$ has (at least) one eigenvalue.

Let $\vec{v}\neq\vec{0}$ and consider the vectors $\vec{v},\mathbf{A}\vec{v},\ldots,\mathbf{A}^n\vec{v}$. Because they are $n+1$ vectors in an $n$-dimensional space, must be linearly dependent, i.e. there are coefficients $c_i$ (not all zero) such that

$$c_0\vec{v} + c_1\mathbf{A}\vec{v} + \ldots + c_n\mathbf{A}^n\vec{v}=\vec{0}.$$

Simplify:

$$(c_0\mathbf{I} + c_1\mathbf{A} + \ldots + c_n\mathbf{A}^n)\vec{v}=p(\mathbf{A})\vec{v} = \vec{0}.$$

Factor the polynomial: $p(x) = c_n(x-x_1)\cdots(x - x_n)$.

$$c_n(\mathbf{A} - x_1\mathbf{I})\cdots(\mathbf{A} - x_n\mathbf{I})\vec{v} = \vec{0}.$$

Since $\vec{v}\neq\vec{0}$, at least one of the factors must be singular (non-invertible). I.e. there is a number $x_i$ such that $\mathbf{A}-x_i\mathbf{I}$ is singular, i.e. one of the $x_i$ is an eigenvalue of $\mathbf{A}$. (Note that if $c_n=0$ we can just replace $n$ by the largest nonzero coefficient and the argument still works.)

Theorem: If $\lambda_1,\ldots,\lambda_m$ are distinct eigenvalues of $\mathbf{A}$ corresponding to eigenvectors $\vec{v}_1,\ldots,\vec{v}_m$, then $\vec{v}_1,\ldots,\vec{v}_m$ are linearly independent.

Proof: Suppose that 
$$a_1\vec{v}_1+\ldots+a_m\vec{v}_m=\vec{0}.$$

Multiply from the left by $(\mathbf{A} - \lambda_2\mathbf{I})\cdots(\mathbf{A} - \lambda_m\mathbf{I})$

$$a_1(\lambda_1-\lambda_2)\cdots(\lambda_1-\lambda_m)\vec{v}_1 = \vec{0}\Rightarrow a_1=0.$$

Since the ordering was arbitrary, conclude that all $a_i$ must be zero, i.e. the vectors are linearly independent.

Corollary: An $n\times n$ matrix cannot have more than $n$ eigenvalues.

The **geometric multiplicity** of an eigenvalue is the dimension of the nullspace of $\mathbf{A} - \lambda\mathbf{I}$. (Sometimes called the 'number of eigenvectors'.)

Theorem: The eigenvalues of $\mathbf{A}$ are the same as the eigenvalues of $\mathbf{A}^T$.

Proof: The rank of any matrix equals the rank of its transpose. The eigenvalues and their geometric multiplicity are therefore preserved under transposition.

Note that the _eigenvectors_ of a matrix are not the same as the eigenvectors of its transpose, and for complex matrices the eigenvalues are preserved under the _transpose_ but not the _conjugate transpose_.

Complex eigenvalues of real matrices come in complex-conjugate pairs. Suppose that $\mathbf{A}\vec{v} = \lambda\vec{v}$. Then

$$\mathbf{A}\vec{\bar{v}} = \bar{\lambda}\vec{\bar{v}}$$

i.e. $\bar{\lambda}$ is an eigenvalue with eigenvector $\vec{\bar{v}}$.

If the sum of the geometric multiplicities of all the eigenvalues of an $n\times n$ matrix equals $n$, then the matrix is **diagonalizable**.

Suppose that the matrix is diagonalizable. Then there are linearly independent eigenvectors

$$\mathbf{A}\vec{v}_1 = \lambda_1\vec{v}_1,\cdots,\mathbf{A}\vec{v}_n=\lambda_n\vec{v}_n$$

(where there might be repeats in the list $\lambda_1,\ldots,\lambda_n$.)

Define the matrix $\mathbf{S} = [\vec{v}_1\;\cdots\;\vec{v}_n]$. It is $n\times n$ and has linearly independent columns, so it is invertible. The list of vector equations above can be arranged as columns of a matrix equation:

$$\mathbf{AS} = \mathbf{S\Lambda}\quad\Rightarrow\quad \mathbf{A} = \mathbf{S\Lambda S}^{-1}.$$
where $\mathbf{\Lambda}$ is diagonal. 

A nonzero vector $\vec{w}$ such that $\vec{w}^T\mathbf{A} = \lambda\vec{w}^T$ is called a **left eigenvector** of **A**. Equivalently, $\mathbf{A}^T\vec{w} = \lambda\vec{w}$.

If $\mathbf{A} = \mathbf{S\Lambda S}^{-1}$ then the rows of $\mathbf{S}^{-1}$ are left eigenvectors of **A**:

$$\mathbf{S}^{-1}\mathbf{A}= \mathbf{\Lambda S}^{-1}$$

The rows of the above matrix equation are in the form $\vec{w}_i^T\mathbf{A} = \lambda_i\vec{w}_i^T$.

Square matrices **A** and **B** are **similar** when there is an invertible matrix **T** such that

$$\mathbf{A} = \mathbf{TBT}^{-1}.$$

Similar matrices have the same eigenvalues (but not eigenvectors).

Proof: Suppose that $\mathbf{B} -\lambda\mathbf{I}$ is singular. Then $\mathbf{T}(\mathbf{B}-\lambda\mathbf{I})\mathbf{T}^{-1}=\mathbf{A} -\lambda\mathbf{I}$ is also singular. 

# Spectral theorem for normal matrices

A matrix **A** is **normal** when it commutes with its complex-conjugate transpose:

$$\mathbf{A}^*\mathbf{A} = \mathbf{AA}^*.$$

There exists a unitary matrix $\mathbf{U}$ that diagonalizes $\mathbf{A}$ iff **A** is normal. I.e. **A** has an orthonormal eigenvector basis iff it is normal.

Proof uses the Schur decomposition ([proof here](https://en.wikipedia.org/wiki/Schur_decomposition)); for every **A** there is a unitary matrix **U** such that

$$\mathbf{A} = \mathbf{UTU}^*$$

where **T** is upper-triangular. If **A** is normal then

$$\mathbf{T}^*\mathbf{T} = \mathbf{TT}^*$$

which is only possible if $\mathbf{T}$ is diagonal. The other direction is easy: diagonal matrices commute.

Real symmetric matrices and complex Hermitian matrices (are normal and) have real eigenvalues.
Suppose that $\mathbf{A}$ is Hermitian and it has a complex eigenvalue:
$$\mathbf{A}\vec{v} = \lambda\vec{v}.$$
Now take the complex dot product with $\vec{v}$:
$$\vec{v}^T\bar{\mathbf{A}}\vec{\bar{v}} = \bar{\lambda}\|\vec{v}\|^2.$$
Now take the complex-conjugate transpose of both sides:
$$\vec{v}^T\mathbf{A}^T\vec{\bar{v}} =\vec{v}^T\bar{\mathbf{A}}\vec{\bar{v}} = \lambda\|\vec{v}\|^2.$$
Together these equations imply that $\lambda = \bar{\lambda}$, i.e. that $\lambda$ is real. When **A** is real symmetric, the eigenvectors must also be real.

For a real symmetric matrix there is an orthogonal matrix **Q** such that $\mathbf{A} = \mathbf{Q\Lambda Q}^T$.

Similarly, real skew-symmetric and complex anti-Hermitian matrices are normal and have pure imaginary eigenvalues (possibly including 0).

Real orthogonal and complex unitary matrices are normal and have eigenvalues on the unit circle in the complex plane.

# Eigenvalues and Symmetric Positive Definite Matrices
Suppose that $\mathbf{A}$ is a real symmetric matrix.
If $\vec{x}^T\mathbf{A}\vec{x}>0$ for every $\vec{x}\neq\vec{0}$ then $\mathbf{A}$ is *symmetric positive definite* (SPD). We can now give an alternative characterization of SPD matrices using eigenvalues:

A real symmetric matrix is SPD when all its eigenvalues are positive:

$$\vec{x}^T\mathbf{A}\vec{x} = \vec{x}^T\mathbf{Q\Lambda Q}^T\vec{x} = \vec{y}^T\mathbf{\Lambda}\vec{y},\qquad\left(\vec{y} = \mathbf{Q}^T\vec{x}\right)$$
$$=\lambda_1 y_1^2+\ldots+\lambda_ny_n^2.$$

If $\lambda_i$ is negative or zero then we can make $\vec{x}^T\mathbf{A}\vec{x}\le0$ by setting $\vec{y} = \vec{e}_i$, i.e. $\vec{x} = \mathbf{Q}\vec{e}_i\neq\vec{0}$.
So the only way to get $\vec{x}^T\mathbf{A}\vec{x}>0$ is to have all the eigenvalues be positive.

The connection between "Positive Definite" ($\vec{x}^T\mathbf{A}\vec{x}\ge0$) and "Real Positive Eigenvalues" _only works for real symmetric and complex Hermitian matrices_. 

If you have a non-symmetric matrix with real positive eigenvalues it doesn't guarantee $\vec{x}^T\mathbf{A}\vec{x}\ge0$. Real positive eigenvalues doesn't even imply diagonalizable.

Similarly, if you have a non-symmetric matrix that satisfies $\vec{x}^T\mathbf{A}\vec{x}\ge0$, it doesn't imply that the eigenvalues are real and positive.

# Singular Values and the 2-norm
Recall that the formula for the matrix norm induced by a vector norm is

$$\|\mathbf{A}\| = \max_{\|\vec{u}\|=1}\|\mathbf{A}\vec{u}\|.$$

We have nice formulas for the 1-norm of $\infty$-norm of a matrix, but not for the 2-norm. There is a nice formula for the 2-norm in terms of singular values.

Consider the function 
$$\|\mathbf{A}\vec{u}\|_2^2 = \vec{u}^T\mathbf{A}^T\mathbf{A}\vec{u} = f(\vec{u}).$$
We want to optimize this function, but we also want have $\|\vec{u}\|_2 = 1$.
This is an *equality-constrained* optimization problem. We will use the method of *Lagrange multipliers*.

![Lagrange Multipliers](LagrangeMultipliers2D.svg)

(Image from [Wikipedia](https://en.wikipedia.org/wiki/Lagrange_multiplier#/media/File:LagrangeMultipliers2D.svg).)

The idea of Lagrange multipliers is that you have some function $f(\vec{u})$ that you want to optimize but you also have some constraint on the allowable values of $\vec{u}$, given in the form $g(\vec{u}) = 0$.

The usual condition for an critical point is just $\nabla f = \vec{0}$; for the constrained problem the condition is

$$\nabla f = \lambda\nabla g$$

where $\lambda$ is some unknown number called a 'Lagrange multiplier.'

In our problem $f(\vec{u}) = \vec{u}^T\mathbf{A}^T\mathbf{A}\vec{u}$ and $g(\vec{u}) = \vec{u}^T\vec{u} -1$, so the critical points are defined by

$$2\mathbf{A}^T\mathbf{A}\vec{u} = 2\lambda\vec{u}.$$

Any pair $(\lambda,\vec{u})$ that solves this equation is a critical point, and clearly also an eigenvalue/vector pair for the matrix $\mathbf{A}^T\mathbf{A}$.
We want the critical point that gives us the largest function value $f(\vec{u})$; at a critical point we will have $f(\vec{u}) = \vec{u}^T\mathbf{A}^T\mathbf{A}\vec{u} = \lambda\vec{u}^T\vec{u} = \lambda$.
So the largest eigenvalue $\lambda$ of $\mathbf{A}^T\mathbf{A}$ is the solution.

Back to the 2-norm: 
$$\|\mathbf{A}\|_2 = \max_{\|\vec{u}\|_2=1}\|\mathbf{A}\vec{u}\|_2 = \sqrt{\lambda_{\text{max}}}$$
where $\lambda_{\text{max}}$ is the largest eigenvalue of $\mathbf{A}^T\mathbf{A}$. This is not the usual notation!

# Singular Value Decomposition (SVD)
**Review** Start with a real $m\times n$ matrix $\mathbf{A}$, then form the Gram matrix $\mathbf{K} = \mathbf{A}^T\mathbf{A}$.

$\mathbf{K}$ is a symmetric positive (semi-)definite matrix, so it has real, non-negative eigenvalues and real eigenvectors. The eigenvectors can be chosen to be orthogonal, and in the context of SVD they are always chosen that way.

The square roots of the *nonzero* eigenvalues of $\mathbf{K}$ are called **singular values** of $\mathbf{A}$, and the eigenvectors of $\mathbf{K}$ are called **singular vectors** of $\mathbf{A}$. The singular vectors are always chosen to be orthonormal.

Denote the singular vectors by $\vec{q}_1,\ldots,\vec{q}_r$ and the singular values by $\sigma_1\ge\ldots\ge\sigma_r>0$.
Let
$$\mathbf{A}\vec{q}_i = \vec{w}_i.$$

The singular vectors are not eigenvectors of $\mathbf{A}$, so $\vec{w}_i$ are not $\sigma_i\vec{q}_i$.
But the $\vec{w}_i$ are orthogonal:

$$\vec{w}_i\cdot\vec{w}_j = \left(\mathbf{A}\vec{q}_i\right)\cdot\left(\mathbf{A}\vec{q}_j\right) = \vec{q}_i^T\mathbf{A}^T\mathbf{A}\vec{q}_i = \sigma_i^2\vec{q}_j\cdot\vec{q}_i$$

If $i\neq j$ then this is zero because the singular vectors are orthogonal. This shows that the $\vec{w}_i$ are orthogonal.

If $i=j$ then we have $\vec{w}_i\cdot\vec{w}_i = \|\vec{w}_i\|_2^2 = \sigma_i^2$, so $\|\vec{w}_i\|=\sigma_i$.
So we can write $\vec{w}_i = \sigma_i\vec{p}_i$ where $\vec{p}_i$ are unit vectors.

Putting it all together we have

$$\mathbf{A}\vec{q}_i = \sigma_i\vec{p}_i$$

where the $\vec{p}_i$ are orthonormal.
If we write these vector equations left to right as a matrix equation we have

$$\mathbf{AQ} = \mathbf{P\Sigma}$$
where $\mathbf{\Sigma}$ is an $r\times r$ diagonal matrix whose diagonal values are $\sigma_i$.

What the book (for APPM 3310) essentially does at this point is simply assert that this is equivalent to 

$$\mathbf{A} = \mathbf{P\Sigma Q}^T$$

which is not true: it's not equivalent. But you can look up the extra steps (e.g. [on Wikipedia](https://en.wikipedia.org/wiki/Singular_value_decomposition#Proof_of_existence)) to prove that the above equation *is* true, and it's called the **singular value decomposition.**

The number of singular values equals the rank of the matrix.

The columns of **P** are an orthonormal basis for the range of **A**.

The columns of **Q** are an orthonormal basis for the corange of **A**.

The singular values of **A** are the same as the singular values of $\mathbf{A}^T$, and the SVD of $\mathbf{A}^T$ is

$$\mathbf{A}^T = \mathbf{Q\Sigma P}^T.$$

The 2-norm of a square matrix is its largest singular value $\sigma_1$. This definition extends to non-square matrices.

There are multiple versions of the SVD that look the same except that the matrices are of different size.

In the SVD we've just covered $\mathbf{A}$ is $m\times n$, $\mathbf{Q}$ is $n\times r$, $\mathbf{\Sigma}$ is $r\times r$, and $\mathbf{P}$ is $m\times r$.

There's also a 'full' version of the SVD that looks the same:

$$\mathbf{A}=\mathbf{P\Sigma Q}^T$$

but $\mathbf{P}$ is $m\times m$ (and orthogonal), $\mathbf{\Sigma}$ is $m\times n$, and $\mathbf{Q}$ is $n\times n$ (and orthogonal).

In this version the first $r$ diagonal elements of $\mathbf{\Sigma}$ are the singular values, and all other entries are zero.

The first $r$ columns of $\mathbf{Q}$ are an o.n. basis for the corange, and the remaining columns are an o.n. basis for the kernel.

The first $r$ columns of $\mathbf{P}$ are an o.n. basis for the range, and the remaining columns are an o.n. basis for the cokernel.

# SVD and least squares

Recall that when $\mathbf{A}$ has full column rank, the solution of the linear least squares problem $\mathbf{A}\vec{x} = \vec{b}$ is

$$\vec{x} = (\mathbf{A}^T\mathbf{A})^{-1}\mathbf{A}^T\vec{b}.$$

Plugging in the SVD of $\mathbf{A}$ and simplifying yields

$$\vec{x} = \mathbf{Q\Sigma}^{-1}\mathbf{P}^T\vec{b}.$$

The matrix $\mathbf{A}^+ = \mathbf{Q\Sigma}^{-1}\mathbf{P}^T$ is called the Moore-Penrose pseudoinverse of $\mathbf{A}$. If $\mathbf{A}$ is invertible then $\mathbf{A}^+=\mathbf{A}^{-1}$.

# Eckart-Young Theorem
Let $\mathbf{A}=\mathbf{P\Sigma Q}^T$ be $m\times n$. 

Define $\mathbf{A}_k=\mathbf{P}_k\mathbf{\Sigma}_k\mathbf{Q}_k^T$ by taking the first $k$ columns of $\mathbf{P}$ and $\mathbf{Q}$, and the leading $k\times k$ part of $\mathbf{\Sigma}$.

$\mathbf{A}_k$ is a minimizer of $\|\mathbf{A} - \hat{\mathbf{A}}\|_F$ over all matrices $\hat{\mathbf{A}}$ with rank $\le k$. (If $\sigma_k > \sigma_{k+1}$ then this is the unique minimizer.)

The same is true if we replace the Frobenius norm with the matrix 2 norm.

# Normal matrices and diagonalizability
So far we can classify square matrices as
- Normal
- Diagonalizable but not normal
- Not diagonalizable

The distinction between the first two is a bit artificial for the following reason:

Update the definition of **normal** to mean **commutes with its adjoint**.

Theorem: Every diagonalizable matrix is normal with respect to some inner product. (Proof: Exercise.)