# Diagonalization

If we can find a solution $x \ne 0$ to

$$
Ax = \lambda x
$$

then, for this vector, the matrix $A$ **acts like a scalar**.  $x$ is called an **eigenvector** of $A$, and $\lambda$ is called an **eigenvalue**.

In fact, for an $m \times m$ matrix $A$, we typically find $m$ linearly independendent eigenvectors $x_1,x_2,\ldots,x_m$ and $m$ corresponding eigenvalues $\lambda_1, \lambda_2, \ldots, \lambda_m$.   Such a matrix is called **diagonalizable**.  Most matrices are diagonalizable; we will deal with the rare "defective" (non-diagonalizable) case later.

Given such a **basis of eigenvectors**, the key idea for using them is:

1. Take any vector $x$ and expand it in this basis: $x = c_1 x_1 + \cdots c_m x_n$, or $x = Xc$ or $c = X^{-1}x$ where $X$ is the matrix whose *columns are the eigenvectors*.

2. For each eigenvector $x_k$, the matrix $A$ acts like a scalar $\lambda_k$.  Multiplication or division corresponds to multiplying/dividing $x_k$ by $\lambda_k$.  **Solve your problem for each eigenvector by treating A as the scalar λ**.

3. Add up the solution to your problem (sum the basis of the eigenvectors).  That is, multiply the new coefficients by $X$.

This process of expanding in the eigenvectors, multiplying (or whatever) by λ, and then summing up the eigenvectors times their new coefficients, is expressed algebraically as the following **diagonalization** of the matrix $A$:

$$
A = X \Lambda X^{-1}
$$

where $\Lambda$ is the **diagonal matrix of the eigenvalues** and $X = \begin{pmatrix} x_1 & x_2 & \cdots & x_m \end{pmatrix}$ from above.

## Expanding in an Eigenvector Basis

For example, consider the matrix

$$
A = \begin{pmatrix} 1 & 1 \\ -2 & 4 \end{pmatrix}
$$

whose eigenvalues are $\lambda_1 = 2$ and $\lambda_2 = 3$ and whose eigenvectors are $x_1 = \begin{pmatrix}1\\1\end{pmatrix}$ and $x_2 = \begin{pmatrix}1\\2\end{pmatrix}$.

We put these eigenvectors into a matrix $X = \begin{pmatrix} x_1 & x_2 \end{pmatrix} = \begin{pmatrix} 1 & 1 \\ 1 & 2 \end{pmatrix}$.  The matrix is invertible: $A$ is *diagonalizable*, since the eigenvectors form a *basis*. 

In [1]:
A = [1 1
    -2 4]
eigvals(A)

2-element Array{Float64,1}:
 2.0
 3.0

In [2]:
X = [1 1
     1 2]

2×2 Array{Int64,2}:
 1  1
 1  2

To write any vector $x$ in the basis of eigenvectors, we just want $x = Xc$ or $c = X^{-1} x$.  For example:

In [3]:
x = [1,0]
c = X \ x

2-element Array{Float64,1}:
  2.0
 -1.0

i.e. $x = \begin{pmatrix} 1 \\ 0 \end{pmatrix} = 2 \begin{pmatrix} 1 \\ 1 \end{pmatrix} - \begin{pmatrix} 1 \\ 2 \end{pmatrix}$, which is obviously correct.

$Ax = \lambda_1 c_1 x_1 + \lambda_2 c_2 x_2$, or equivalently
$$
Ax = X \begin{pmatrix} \lambda_1 c_1 \\ \lambda_2 c_2 \end{pmatrix} = 
     X \underbrace{\begin{pmatrix} \lambda_1 &  \\  &  \lambda_2  \end{pmatrix}}_\lambda c
     = X \Lambda X^{-1} x
$$

In [4]:
A*x

2-element Array{Int64,1}:
  1
 -2

In [5]:
Λ = diagm([2, 3])

2×2 Array{Int64,2}:
 2  0
 0  3

In [6]:
X * Λ * inv(X) * x

2-element Array{Float64,1}:
  1.0
 -2.0

Since this is true for *every* $x$, it means $\boxed{A = X \Lambda X^{-1}}$:

In [7]:
X * Λ / X   # / X is equivalent to multiplying by inv(X), but is more efficient

2×2 Array{Float64,2}:
  1.0  1.0
 -2.0  4.0

Another way to see this is to consider $$AX = \begin{pmatrix} Ax_1 & Ax_2 \end{pmatrix} = \begin{pmatrix} \lambda_1 x_1 & \lambda_2 x_2 \end{pmatrix} = X \Lambda$$

In [8]:
A*X

2×2 Array{Int64,2}:
 2  3
 2  6

In [9]:
X*Λ

2×2 Array{Int64,2}:
 2  3
 2  6

It follows that $A = X\Lambda X^{-1}$ or $\boxed{\Lambda = X^{-1} A X}$:

In [10]:
X \ A * X

2×2 Array{Float64,2}:
 2.0  0.0
 0.0  3.0

The key thing is that this works for **any matrix, as long as the eigenvectors form a basis**. Such a matrix is called **diagonalizable**, and it turns out that this is true for almost all matrices; we will deal with the rare exceptions.

Thus, eigenproblems join LU factorization (Gaussian elimination) and QR factorization (Gram–Schmidt): the eigensolutions are **equivalent to a matrix factorization**.  This is extremely useful in helping us think *algebraically* about using eigenvalues and eigenvectors, because it lets us work with them *all at once*.

## Change of Basis and Similar Matrices

A key concept in linear algebra, *especially* for eigenproblems, is a *change of basis*.   If $S$ is an $m\times m$ invertible matrix, then we know its columns form a *basis* for $\mathbb{R}^m$.  Writing any $x$ in this basis is simply $x=Sc$, i.e. coordinates $c = S^{-1} x$ in the new basis (the new "coordinate system").

If we have a matrix $A$ representing a linear operation $y=Ax$, we can also try to write the *same* linear operation in a *new* coordinate system.  That is, if we write $x=Sc$ and $y=Sd$, then what is the matrix relating $c$ and $d$?  This is easy to compute:
$$
d = S^{-1} y = S^{-1} Ax = S^{-1} A S c,
$$
so the matrix $\boxed{B = S^{-1} A S}$ is represent the same operation as $A$ but in the new coordinate system.

In linear algebra, we say that $A$ and $B$ are **similar matrices**.  That is, B is similar to A if and only if $B = S^{-1} A S$ for some invertible matrix S.   It also follows that $A = SBS^{-1} = (S^{-1})^{-1} B (S^{-1})$, i.e. if B is similar to A using S, then A is similar to B using $S^{-1}$.

For example, we can choose a random 2×2 S to write our matrix A from above in a new coordinate system:

In [11]:
S = randn(2,2)

2×2 Array{Float64,2}:
 -0.639884  -0.210787
  1.39416    0.82622 

In [12]:
det(S) # a random S is almost certainly nonsingular

-0.23481474230161664

In [13]:
B = S \ A * S   # same as inv(S) * A * S, but more efficient since it avoids the explicit inverse

2×2 Array{Float64,2}:
 -8.80879  -5.5106
 23.1624   13.8088

A key fact is that **similar matrices have the same eigenvalues** (but different eigenvectors):

In [14]:
eigvals(B)

2-element Array{Float64,1}:
 2.0
 3.0

This is easy to show in a variety of ways.

For example, if $Ax=\lambda x$, since $A=SBS^{-1}$, we have $ SBS^{-1} x = \lambda x$, or
$$
B(S^{-1}x) = \lambda (S^{-1}x)
$$
so **S⁻¹x is an eigenvector of B with the *same* eigenvalue λ**!

In contrast, multiplying $A$ only on *one* side by some matrix $S$ will typically change the eigenvalues:

In [15]:
eigvals(A * S)

2-element Array{Float64,1}:
 -0.29501
  4.77574

Another way of seeing this is by looking at the characteristic polynomial:

$$
\det(A - \lambda I) = \det(SBS^{-1} - \lambda I) = \det \left[ S (B - \lambda I) S^{-1}   \right]
= \det(S) \det(B - \lambda I) \det(S^{-1}) = \det(B - \lambda I)
$$

i.e. **similar matrices have the same characteristic polynomial**.

(Here, we used the product rule for determinants and the fact that $\det(S^{-1}) = 1/\det(S)$.)

If we set λ=0, we see that $\det A = \det B$, i.e. **similar matrices have the same determinant**.

## Determinant = Product of λ’s

In the new language, we see that diagonalization $A = X \Lambda X^{-1}$ is the same thing as saying that **A is similar to a diagonal matrix (of eigenvalues)**.  That is, there is a basis (coordinate system) in which A is diagonal.

From above, $\det A = \det \Lambda = \lambda_1 \lambda_2 \cdots \lambda_m$.  That is, the **determinant is the product of the eigenvalues**.

(Technically, we have only shown this for diagonalizable matrices, but it turns out to be true in general.)

Let's try it:

In [16]:
det(A)

6.0

This is the same as the product of A's eigenvalues:

In [17]:
2 * 3

6

Equivalently, using Julia's `prod` function (which computes the product of the elements of a list):

In [18]:
prod(eigvals(A))

6.0

We can also try it for some large matrices:

In [19]:
Abig = randn(100,100)
det(Abig)

4.702690681622565e77

In [20]:
prod(eigvals(Abig))

4.702690681622322e77 - 3.9459888742460383e61im

It also follows that the log of the determinant is the sum of the logs of the eigenvalues.

Julia has a built-in function `logabsdet` to compute the log of the absolute value of the determinant, which should be the sum of the logs of the absolute values of the eigenvalues.  (There is also a `logdet` function to compute the log without the absolute value, but then we need to deal with complex numbers.)

In [21]:
logabsdet(Abig) # the log of the absolute value of the determinant

(178.8471869909019, 1.0)

In [22]:
s = sum(log.(abs.(eigvals(Abig)))) # the sum of the logs of |λ|

178.84718699090183

If we try to compute the determinant of an even bigger matrix, we run into a problem: in the default precision, a computer can't represent numbers bigger than $10^{308}$ or so, and instead gives `Inf` (for "infinity"):

In [23]:
Abigger = randn(1000,1000)
det(Abigger)

-Inf

In [24]:
realmax(Float64) # the problem is that there is a maximum value the computer can represent, beyond which it gives "Inf"

1.7976931348623157e308

But the log is fine:

In [25]:
logabsdet(Abigger) # but the log is fine

(2952.3477303699215, -1.0)

In [26]:
s = sum(log.(abs.(eigvals(Abigger)))) # the sum of the logs of |λ|

2952.3477303699187

## Trace = Sum of λ’s

Another important quantity for a square matrix, which we haven't covered yet, is the [trace](https://en.wikipedia.org/wiki/Trace_(linear_algebra)).  The trace is defined as the **sum of the diagonal elements** of any matrix.  By plugging in the definition of matrix multiplication, one can quickly show that the trace has a crucial property:

$$
\operatorname{trace} (AB) = \operatorname{trace}(BA)
$$

It follows that **similar matrices have the same trace**, since if $A=SBS^{-1}$ then 

$$
\operatorname{trace} (A) = \operatorname{trace}(SBS^{-1}) = \operatorname{trace}(S^{-1}SB) = \operatorname{trace}(B)
$$

In particular, since A and Λ are similar, this means that the **trace of a matrix is the sum of the eigenvalues**!

Let's check:

In [27]:
trace(A)

5

Yup, this is the sum of the eigenvalues:

In [28]:
2 + 3

5

In [29]:
sum(eigvals(A))

5.0

Let's try it for our bigger example:

In [30]:
trace(Abig)

10.688843348188461

In [31]:
sum(eigvals(Abig))

10.68884334818848 + 0.0im

## Diagonalization and Matrix Powers