# Non-diagonalizable matrices and generalized eigenvectors
Some square matrices do not have an eigenvector basis, so they're not diagonalizable.

To obtain a basis and associated factorization we need to define 'generalized eigenvectors.' If $\lambda$ is an eigenvalue, then **generalized eigenvector** is a nonzero vector $\vec{v}$ such that

$$(\mathbf{A} - \lambda\mathbf{I})^k\vec{v} = \vec{0}$$

for some $k>0$. Eigenvectors are included ($k=1$). The index of a generalized eigenvector is the smallest $k$ such that $(\mathbf{A} - \lambda\mathbf{I})^k\vec{v} = \vec{0}$.

Theorem: Let $\mathbf{A}$ be $n\times n$ and $\lambda$ be an eigenvalue of $\mathbf{A}$. The null space (kernel) of $(\mathbf{A} - \lambda\mathbf{I})^n$ contains all the generalized eigenvectors associated with $\lambda$.

This is basically telling us that the index of a generalized eigenvector is always $\le n$.

Proof: Suppose that $\vec{v}$ is a generalized eigenvector, i.e. $(\mathbf{A}-\lambda\mathbf{I})^k\vec{v} = \vec{0}$ for some $k>0$. 

Consider the smallest $k$ such that $(\mathbf{A}-\lambda\mathbf{I})^k\vec{v} = \vec{0}$. If $k\le n$ there is nothing to prove, so suppose that $k>n$, and consider the vectors

$$\vec{v},(\mathbf{A}-\lambda\mathbf{I})\vec{v},\ldots,(\mathbf{A}-\lambda\mathbf{I})^{k-1}\vec{v}.$$

These vectors are nonzero by assumption, and we will show that they are linearly independent. 

Consider a linear combination yielding $\vec{0}$:

$$a_0\vec{v} + a_1(\mathbf{A}-\lambda\mathbf{I})\vec{v}+\ldots + a_{k-1}(\mathbf{A}-\lambda\mathbf{I})^{k-1}\vec{v}=\vec{0}.$$

Multiply by $(\mathbf{A}-\lambda\mathbf{I})^{k-1}$. This annihilates all terms but the first, so $a_0=0$. Then multiply by $(\mathbf{A}-\lambda\mathbf{I})^{k-2}$ and conclude that $a_1=0$. Continuing, we find that all $a_i$ must be zero.

We have $k$ linearly independent vectors in an $n$-dimensional space, so we must have $k\le n$, which proves the theorem.

Theorem: Generalized eigenvectors associated with distinct eigenvalues are linearly independent. (Proof omitted; similar to the proof for eigenvectors.)

At this point we don't actually know that generalized eigenvectors with index $k>1$ exist. The following theorem says they do:

Theorem: The generalized eigenvectors of $\mathbf{A}$ span the whole space. (Proof omitted.)

# Jordan Form

Suppose that $\vec{u}$ is a generalized eigenvector with index $k>1$. Define $\vec{w} = (\mathbf{A}-\lambda\mathbf{I})^{k-1}\vec{u}$. Then

$$(\mathbf{A} - \lambda\mathbf{I})^k\vec{u} = (\mathbf{A} - \lambda\mathbf{I})\vec{w} = \vec{0}$$

so $\vec{w}$ must be an eigenvector.

Starting with an eigenvector $\vec{w}_1 = \vec{v}$, solve

$$(\mathbf{A} - \lambda\mathbf{I})\vec{w}_i = \vec{w}_{i-1}.$$

At some point ($i=k+1$) you won't be able to find a solution any more. The generalized eigenvectors $\vec{w}_1,\ldots,\vec{w}_k$ are a **Jordan chain**.

The vectors in a Jordan chain are linearly independent. (Proof omitted.)

Consider all the vectors in a Jordan chain and write the vector equations

$$\mathbf{A}\vec{w}_1 = \lambda\vec{w}_1,\;\mathbf{A}\vec{w}_2 = \lambda\vec{w}_2+\vec{w}_1,\cdots,\mathbf{A}\vec{w}_k = \lambda\vec{w}_k + \vec{w}_{k-1}.$$

Write this in matrix form as

$$\mathbf{AW} = \mathbf{WJ}(\lambda,k)$$

where $\mathbf{W} = [\vec{w}_1\cdots\vec{w}_k]$ and

$$\mathbf{J}(\lambda,k) = \left[\begin{array}{ccccc}\lambda&1&0&\cdots&0\\0&\ddots&\ddots&\ddots&\vdots\\\vdots&\ddots&\ddots&\ddots&0\\\vdots&&\ddots&\ddots&1\\0&\cdots&\cdots&0&\lambda\end{array}\right].$$

The matrix $\mathbf{W}$ has linearly independent columns, but it's not square, so we can't get to $\mathbf{A} = \mathbf{WJ}(\lambda,k)\mathbf{W}^{-1}$.

To get that kind of a factorization, we write the foregoing matrix equation for **all** the Jordan chains

$$\mathbf{AW}_1=\mathbf{W}_1\mathbf{J}(\lambda_1,k_1),\cdots,\mathbf{AW}_m=\mathbf{W}_m\mathbf{J}(\lambda_m,k_m)$$

Stacking these equations left to right into a single matrix equation yields

$$\mathbf{AS} = \mathbf{SJ}$$

where $\mathbf{S} = [\mathbf{W}_1\cdots\mathbf{W}_m]$ and

$$\mathbf{J} = \left[\begin{array}{cccc}\mathbf{J}(\lambda_1,k_1)&0&\cdots&0\\0&\mathbf{J}(\lambda_2,k_1)&0&\\\vdots&&\ddots&\vdots\\0&\cdots&\cdots&\mathbf{J}(\lambda_m,k_m)\end{array}\right].$$

The columns of $\mathbf{S}$ are linearly independent and the matrix is square so we can write

$$\mathbf{A} = \mathbf{SJS}^{-1}.$$

This is called the **Jordan form** (or sometimes the Jordan canonical form). Sometimes just the **J** matrix is called the Jordan form.

# Functions of matrices

In the broadest sense, a 'function of a matrix' is a function that takes a matrix as an argument. That is not the _usual_ sense of the term though. 

The _usual_ meaning of 'function of a matrix' is that we have some function $f:\mathbb{C}\to\mathbb{C}$ and we want to define another function $g:\mathbb{C}^{n\times n}\to\mathbb{C}^{n\times n}$ that has special properties that connect it closely to $f$.

The two functions are so closely connected that we will abuse notation and say $f=g$, e.g. if$f(x) = e^x$ then we will say $g(\mathbf{A}) = f(\mathbf{A}) = e^\mathbf{A}$.

Consider an expression like

$$(a_0\mathbf{I} + a_1\mathbf{A} + \cdots + a_m\mathbf{A}^m)\vec{v}.$$

It seems very natural to write

$$p(\mathbf{A}) = a_0\mathbf{I} + a_1\mathbf{A} + \cdots + a_m\mathbf{A}^m$$

which connects a scalar function $p(x) = a_0 + a_1x + \cdots a_mx^m$ with a matrix function. (Note that the scalar function is *not* applied element-by-element.)

Continuing the foregoing example, suppose that **A** is diagonalizable, i.e.

$$\mathbf{A} = \mathbf{S\Lambda S}^{-1}.$$

Plugging this in and simplifying yields

$$p(\mathbf{A}) = \mathbf{S}\left(a_0\mathbf{I} + a_1\mathbf{\Lambda} + \cdots + a_m\mathbf{\Lambda}^m\right)\mathbf{S}^{-1}.$$

Applying $p$ to the diagonal matrix $\mathbf{\Lambda}$ just applies $p$ to the diagonal elements of $\mathbf{\Lambda}$, i.e. the eigenvalues of $\mathbf{A}$.

Going further, suppose that we have a convergent power series representation of a function $f$

$$f(x) = \sum_{n=0}^\infty a_n (x - c)^n$$

If $\mathbf{A}$ is diagonalizable we can write

$$f(\mathbf{A}) = \mathbf{S}\left(\sum_{n=0}^\infty a_n(\mathbf{\Lambda} - c\mathbf{I})^n\right)\mathbf{S}^{-1} = \mathbf{S}f(\mathbf{\Lambda})\mathbf{S}^{-1}$$

where the final $f$ is applied to the diagonal elements of $\mathbf{\Lambda}$. 

This isn't a good _definition_ of a matrix function though, because
- It only works if the power series is convergent at all the eigenvalues of **A**, and
- It isn't defined for non-diagonalizable matrices.

What happens if we apply the power series approach to a non-diagonalizable matrix?

Consider a single Jordan block

$$\mathbf{J}(\lambda,k) = \lambda\mathbf{I} + \mathbf{N}_k$$

where

$$\mathbf{N}_k = \left[\begin{array}{cccc}0&1&&\\&\ddots&\ddots&\\&&\ddots&1\\&&&0\end{array}\right]$$

Notice that $\mathbf{N}_k^k = \mathbf{0}$.

So for $n\ge k-1$

$$(\lambda\mathbf{I} + \mathbf{N}_k)^n = \lambda^n\mathbf{I} + \left(\begin{array}{c}n\\1\end{array}\right)\lambda^{n-1}\mathbf{N}_k + \cdots + \left(\begin{array}{c}n\\k-1\end{array}\right)\lambda^{n-k+1}\mathbf{N}_k^{k-1}$$

If we sum the Taylor series we find

$$f(\mathbf{J}(\lambda,k)) = \left[\begin{array}{ccccc}f(\lambda)&f'(\lambda)&\frac{f''(\lambda)}{2!}&\cdots&\frac{f^{(k-1)}(\lambda)}{(k-1)!}\\0&\ddots&\ddots&\ddots&\vdots\\\vdots&\ddots&\ddots&\ddots&\frac{f''(\lambda)}{2!}\\\vdots&&\ddots&\ddots&f'(\lambda)\\0&\cdots&\cdots&0&f(\lambda)\end{array}\right].$$

Definition of a (primary) function of a matrix:

Suppose that $\mathbf{A}$ has eigenvalues $\lambda_i$, each with a Jordan block of size $k_i$. Suppose that $f^{(j)}(\lambda_i)$ exists for all $i$ and for all $j=1,\ldots,k_i-1$. Then we define $f(\mathbf{A})$ by applying the function $f$ to the Jordan blocks of $\mathbf{A}$ as on the previous slide.

If $f(\mathbf{A})$ and $g(\mathbf{A})$ are well-defined, then 
$$f(\mathbf{A})g(\mathbf{A})=g(\mathbf{A})f(\mathbf{A})$$

$$f(\mathbf{A}^T) = f(\mathbf{A})^T$$

If $\mathbf{B}$ commutes with $\mathbf{A}$ then $\mathbf{B}$ commutes with $f(\mathbf{A})$

If $\mathbf{A}$ is upper triangular then $f(\mathbf{A})$ is also upper triangular and $f(\mathbf{A})_{ii} = f(a_{ii})$.

The **spectral radius** $\rho(\mathbf{A})$ of a matrix **A** is the largest absolute value of the eigenvalues of **A**.

The spectral radius is not a norm. But it has an interesting relation to norms.

Theorem: For any matrix operator norm $\|\cdot\|$ we have

$$\rho(\mathbf{A})\le\|\mathbf{A}\|.$$

Proof: Recall that $\|\mathbf{A}\| = \max_{\|\vec{u}\|=1}\|\mathbf{A}\vec{u}\|$.

Let $\vec{u}$ be a unit vector such that $\mathbf{A}\vec{u} = \lambda_{max}\vec{u}$ where $\rho(\mathbf{A}) = |\lambda_{max}|$.

Clearly

$$\rho(\mathbf{A}) = |\lambda_{max}| = \|\lambda_{max}\vec{u}\| = \|\mathbf{A}\vec{u}\|\le\|\mathbf{A}\|.$$

There is also an interesting result that for any $\epsilon>0$ and any **A** there is a matrix norm $\|\cdot\|$ such that 

$$\rho(\mathbf{A})\le \|\mathbf{A}\|\le\rho(\mathbf{A}) + \epsilon.$$

The spectral radius is thus the infimum of $\|\mathbf{A}\|$ over all matrix norms.

Proof (Horn & Johnson 5.6.10): Consider the Jordan decomposition

$$\mathbf{A} = \mathbf{SJS}^{-1}.$$

Define $\mathbf{D}_t = $diag$(t,t^2,\ldots,t^n)$ for $t>0$, and then define the matrix norm

$$\|\mathbf{A}\|:= \|\mathbf{D}_t\mathbf{S}^{-1}\mathbf{ASD}_t^{-1}\|_1$$

(This is a norm and is submultiplicative; therefore is a matrix norm.) Note that the matrix whose 1-norm we are taking is block-diagonal with blocks in the form

$$\left[\begin{array}{ccccc}\lambda&t^{-1}&0&\cdots&0\\0&\ddots&\ddots&\ddots&\vdots\\\vdots&\ddots&\ddots&\ddots&0\\\vdots&&\ddots&\ddots&t^{-1}\\0&\cdots&\cdots&0&\lambda\end{array}\right].$$
The max absolute column sum is $\rho(\mathbf{A})+t^{-1}$, so by taking $t^{-1}<\epsilon$ we can make $\|\mathbf{A}\|<\rho(\mathbf{A})+\epsilon$.