---
title: 6.7 Similarity, Eigenbases, and Diagonalization
subject:  Eigenvalues
subtitle: 
short_title: 6.7 Similarity, Eigenbases, and Diagonalization
authors:
  - name: Nikolai Matni
    affiliations:
      - Dept. of Electrical and Systems Engineering
      - University of Pennsylvania
    email: nmatni@seas.upenn.edu
license: CC-BY-4.0
keywords: Eigenvalues, Eigenvectors
math:
  '\vv': '\mathbf{#1}'
  '\bm': '\begin{bmatrix}'
  '\em': '\end{bmatrix}'
  '\R': '\mathbb{R}'
---

[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/nikolaimatni/ese-2030/HEAD?labpath-/05_Ch_6_Eigenvalues_and_Eigenvectors/077-diagonalization.ipynb)

{doc}`Lecture notes <../lecture_notes/Lecture 12 - Eigvenvalues and Eigenvectors part 2 (complex eigenvalues and eigenvectors, similarity transformation, diagonalization and eigenbases).pdf>`

## Reading

Material related to this page, as well as additional exercises, can be found in ALA 8.3.

## Learning Objectives

By the end of this page, you should know:
- how to define eigenbases,
- how to define similar matrices,
- how to define the geometric and algebraic multiplicity of an eigenvalue,
- when and how a square matrix $A$ can be diagonalized as $A = PDP^{-1}$, where $D$ is diagonal.

# Eigenbases

Most of the vector space bases that are useful in applications are assembled from the eigenvectors of a particular matrix. In this section, we focus on matrices with a "complete" set of eigenvectors and show how these form a basis for $\mathbb{R}^n$ (or in the complex case, $\mathbb{C}^n$); in these cases, the set of their eigenvectors are known as eigenbases:

:::{prf:definition} Eigenbasis
:label: eigenbasis-defn

Let $\vv{v_1}, ..., \vv{v_k}$ be the eigenvectors of a matrix $A \in \mathbb{R}^n$. If $ \text{span} \{ \vv{v_1}, ..., \vv{v_k} \} = \mathbb{R}^n$, i.e., the eigenvectors of $A$ span the entire space $\mathbb{R}^n$, then its eigenvectors form an *eigenbasis* for $\mathbb{R}^n$.

If the eigenvectors of $A$ form a basis for $\mathbb{R}^n$, then $A$ is *complete*.
:::

Such *eigenbases* allow us to rewrite the linear transformation determined by a matrix in a simple diagonal form; matrices what allow us to do this are called *diagonalizable*, a definition which we will formalize shortly. We focus on matrices with real eigenvalues and eigenvectors to start, and will return to matrices with complex eigenvalues/eigenvectors in a few pages.

Our starting point is the following theorem, which we will state as a fact. It is a generalization of the pattern we saw in [an example before](#eigen-ex2) that the eigenvectors corresponding to distinct eigenvalues are linearly independent:

:::{prf:theorem} The eigenvectors of a matrix with distinct eigenvalues
:label: distinct-eigenvalue-thm

If the matrix $A \in \mathbb{R}^{n\times n}$ has $n$ distinct real eigenvalues $\lambda_1, ..., \lambda_n$, then the corresponding real eigenvectors $\vv{v_1}, ..., \vv{v_n}$ form a basis for $\mathbb{R}^n$.
:::

:::{prf:example} The eigenbasis from a $2\times 2$ matrix with distinct eigenvalues
:label: eigen-ex6

In a [previous example](#eigen-ex2), we saw that $A = \bm 3&1\\1&3 \em$ has eigenvalue/vector pairs

\begin{align*}
    \lambda_1 = 4, \vv{v_1} = \bm 1\\ 1\em, \quad \lambda_2 = 2, \vv{v_2} = \bm -1\\1\em
\end{align*}

Here, $\vv{v_1}$ and $\vv{v_2}$ are linearly independent, and hence form a basis for $\mathbb{R}^2$, since $\dim \mathbb{R}^2 = 2$.
:::

However, we also saw [an example](#eigen-ex3) where a $3\times 3$ matrix only had two distinct eigenvalues, but still had three linearly independent eigenvectors:

:::{prf:example} The eigenbasis from a $3\times 3$ matrix a repeated eigenvalue
:label: eigen-ex7

Recall the $3\times 3$ matrix $A = \bm 2&-1&-1\\0&3&1\\0&1&3 \em$. We showed it had the following eigenvalue/vector pairs:

\begin{align*}
    \lambda_1 = 2&, \quad \vv{v_1} = \bm 1\\0\\1\em,\quad \vv{\hat {v_1}} = \bm 0\\1\\-1\em\\
    \lambda_2 = 4&, \quad \vv{v_2} = \bm -1\\1\\1\em
\end{align*}

The collection $\vv{v_1}, \vv{\hat {v_1}}, \vv{v_2} \in \mathbb{R}^3$ are linearly independent, and hence form a basis for $\mathbb{R}^3$ since $\dim \mathbb{R}^3 = 3$.
:::

# Algebraic and Geometric Multiplicities

Notice that in this last example $\dim V_{\lambda_1} = 2$ (why?) for the double eigenvalue $\lambda_1 = 2$ (i.e., the [eigenspace](#eigenspace-defn) corresponding to $\lambda_1$ had dimension of 2), and similarly, $\dim V_{\lambda_2} = 1$ for the simple eigenvalue $\lambda_2 = 4$, so that there is a "real" eigenvector for each time an eigenvalue appears as a factor of the characteristic polynomial.

These notions can be captured in the idea of algebric and geometric multiplicity:

:::{prf:definition} Algebraic and Geometric Multiplicity
:label: multiplicity-defn

The number of times an eigenvalue $\lambda_i$ appears as a solution to the characteristic polynomial is called its *algebraic multiplicity*. For example, if a matrix has characteristic polynomial $(x - 2)^2(x - 3) = 0$, then the eigenvalue $\lambda_1 = 2$ has algebraic multiplicity 2, while $\lambda_2 = 3$ has algebraic multiplicity 1.

For an eigenvalue $\lambda$ for a matrix $A$, its geometric multiplicity is the dimension of its [eigenspace](#eigenspace-defn) $V_\lambda$.

One key fact is that the geometric multiplicity of $\lambda$ is always at most the algebraic multiplicity.
:::

Our observation is that if the algebraic and geometric multiplicity match for each eigenvalue, then we can form a basis for $\mathbb{R}^n$.

:::{prf:definition} Eigenbasis theorem
:label: eigenbasis-existence-thm

The eigenvectors of a matrix $A \in \mathbb{R}^{n\times n}$ form a basis for $\mathbb{R}^n$ if and only if, for each distinct eigenvalue $\lambda_i$, the algebraic multiplicty of $\lambda_i$ matches its geometric multiplicity $\dim V_{\lambda_i}$.
:::

For the next little bit, we will assume that our matrix $A$ satisfies the above theorem. What does this buy us? To answer this question, we need to introduce the idea of *similarity transformations*.

# Similar Matrices

Given a vector $\vv x \in\mathbb{R}^n$ with coordinates $x_i$ with respect to the standard basis, i.e., $\vv x = x_1 \vv{e_1} + x_2 \vv{e_2} + ... + x_n \vv{e_n}$, we can find the coordinates $y_1,..., y_n$ of $\vv x$ with respect to a new basis $\vv{b_1}, ..., \vv{b_n}$ by solving the following linear system:

\begin{align*}
    y_1 \vv{b_1} + y_2 \vv{b_2} + ... + y_n \vv{b_n} = \vv x \iff B \vv y = \vv x
\end{align*},

where $V = \bm \vv{b_1} & \vv{b_2} & ... & \vv{b_n} \em$. Since the $\vv{b_i}$ form a basis of $\mathbb{R}^n$, they are linearly independent, which means that $B$ is nonsingular.

Now, suppose I have a matrix $A \in \mathbb{R}^{n\times n}$, which I use to define the linear transformation $f : \mathbb{R}^n \to \mathbb{R}^n$, given by a by $f(\vv x) = A \vv x$. Here the $f$'s inputs $\vv x \in \mathbb{R}^n$ and outputs $f(\vv x) \in \mathbb{R}^n$ are both expressed with the standard basis $\vv{e_1}, ..., \vv{e_n}$, and its matrix representative is $A$.

What if we would like to implement this linear transformation with respect to the basis $B$, that is, define a function $g : \mathbb{R}^n\to \mathbb{R}^n$ with inputs $\vv y\in \mathbb{R}^n$ in $B$-coordinates, and outputs $g(\vv y) \in \mathbb{R}^n$ in $B$-coordinates? To accomplish this, we need to convert both input $\vv{x}$ and output $f(\vv x)$ to $B$-coordinates.

* Relating inputs $\vv x$ to $B$-coordinate inputs $\vv y$ is easy: $\vv x = B\vv y$. 

* Relating outputs $f(\vv x)$ to $B$-coordinate outputs $g(\vv y)$ is easy too: $f(\vv x) = Bg(\vv y)$.

Putting these together, we see that

\begin{align*}
    f(\vv x) = A\vv x \iff Bg(\vv y) = AB \vv y
\end{align*}

which lets us solve for $g(\vv y) = B^{-1} A B \vv y$.

We conclude that if $A$ is the matrix representation of a linear transformation in the standard basis, then $B^{-1} A B$ is the matrix representation in the basis $B$.

![alt text](../figures/04-linear_transformation_basis.png)

:::{prf:example} Rewriting an linear transformation in another basis
:label: eigen-ex8

Consider $A = \bm 1&2\\0&1\em$ and $f(\vv x) = A\vv x$. This transformation maps $\bm x_1\\x_2\em \mapsto \bm x_1 + 2x_2 \\x_2\em$. Consider the basis $\vv{b_1} = \bm 1\\2\em, \vv{b_2} = \bm 0\\1\em$ illustrated in blue below:

![alt text](../figures/04-blue_basis.png)

The basis matrix $B$ is $B = \bm 1&0\\2&1\em$, and $B^{-1} = \bm 1&0\\-2&1 \em$. The matrix representation for $g(\vv y)$ is then:

\begin{align*}
    B^{-1}AB &= \bm 1&0\\-2&1\em \bm 1&2\\0&1\em \bm 1&0\\2&1\em\\
    &= \bm 5&2\\-8&-3\em
\end{align*}

and the map $g(\vv y) = B^{-1}AB\vv y$ takes $\bm y_1\\y_2\em \mapsto \bm 5y_1 + 2y_2\\-8y_1 - 3y_2\em$.
:::

# Diagonalization

In the above example, our change of basis didn't really help us understand what the linear transformation $f(\vv x)$ is doing any better than our starting point. However, we'll see how that if we use the basis defiend by the eigenvectors of a matrix, some magic happens! We'll start with an example, and then extract out a general conclusion.

:::{prf:example} Rewriting an linear transformation in an eigenbasis
:label: eigen-ex9

Consider the linear transformation $h(x_1, x_2) = \bm x_1 - x_2 \\ 2x_1 + 4x_2\em$. It has matrix representation $A = \bm 1&-1\\2&4\em$4 with respect to the standard basis of $\mathbb{R}^2$. 

The eigenvalues of $A$ are computed by solving $\det(A - \lambda I) = 0$:

\begin{align*}
    \det \bm 1 - \lambda & -1\\2 & 4 - \lambda\em  (1 - \lambda)(4 - \lambda) + 2 = \lambda^2 - 5\lambda + 6 = (\lambda - 2)(\lambda - 3) = 0
\end{align*}

so that $\lambda_1 = 2$ and $\lambda_2 = 3$. Solving the appropriate eigenvector equations $(A - \lambda_i I)\vv{v_i} = \vv 0$, we obtain the following eigenvalue/eigenvector pairs:

\begin{align*}
\lambda_1 = 2, \vv{v_1} = \bm 1\\-1\em \quad\text{and}\quad \lambda_2 = 3, \vv{v_2} = \bm 1\\-2\em
\end{align*}

Let's see what happens if we express $A$ in coordinate system defined by the [eigenbasis](#eigenbasis-defn) $V = \bm \vv{v_1} & \vv{v_2} \em = \bm 1&1\\-1&-2\em$.

First, we compute $V^{-1} = \frac{1}{1(-2) - 1(-1)}\bm -2&-1 \\1&1\em = \bm 2&1\\-1&-1\em$, 

and then find $V^{-1}AV = \bm 2&1\\-1&-1\em \bm 1&-1 \\2&4\em \bm 1&1\\-1&-2\em = \bm 2&0\\0&3\em$.

This matrix is diagonal! THis means it applies a simple stretching action in the coordinates defined by the eigenvectors. The eigenvalues for this new matrix are also $\lambda_1 = 2$ and $\lambda_2 = 3$, but in this case, eigenvectors are much simpler: $\vv{\hat{v_1}} = \bm 1\\0\em$ and $\vv{\hat{v_2}} = \bm 0\\1\em$.
:::

The above example showed us an example of a very important property of an eigenbasis: they *diagonalize* the original matrix representative! Working with diagonal matrices is very convenient, and thus diagonalization is very useful when we can do it.

Although we only saw a $2\times 2$ example, the idea is applicable to general $n\times n$ matrices, in the idea of *diagonalizable* matrices.

:::{prf:definition} Diagonalizable
:label: diagonalizable-defn

We say that a square matrix $A$ is *diagonalizable* if there exists a nonsingular matrix $V$ and a diagonal matrix $D = \text{diag}(\lambda_1, ..., \lambda_n)$ such that

\begin{align*}\label{expr:diagonalizable}
V^{-1} A V = D \quad\text{or equivalently}\quad A = VDV^{-1} \tag{D}
\end{align*}
:::

Let's try to understand condition [(D)](expr:diagonalizable) a little bit more by writing it as 

\begin{align*}
AV = VD
\end{align*}

Now, for $V = \bm \vv{v_1} & \vv{v_2} & \dots & \vv{v_n} \em$, this becomes: 

\begin{align*}
\bm A\vv{v_1} & A\vv{v_2} & \dots & A\vv{v_n}\em = \bm \lambda_1 \vv{v_1} & \lambda_2 \vv{v_2} & \dots & \lambda_n \vv{v_n} \em
\end{align*}

Focusing on the $k^{th}$ column of this $n\times n$ matrix equation, we see something familiar:

\begin{align*}
A\vv{v_k} = \lambda_k \vv{v_k},
\end{align*}

that is, the columns of $V$ must be eigenvectors, and the diagonal elements $\lambda_i$ must be eigenvectors! Therefore, we immediately get the following characterization of when a matrix is diagonalizable:

:::{prf:theorem} A necessary and sufficient condition for diagonalizability
:label: diagonalizable-thm

A matrix $A \in \mathbb{R}^{n\times n}$ is diagonalizable if and only if it has $n$ linearly independent eigenvectors. In other words, $A$ is diagonalizable if and only if $A$ is [complete](#eigenbasis-defn).

Equivalently, $A$ is digonalizable if and only if, for each eigenvalue $\lambda$, its [geometric multiplicity matches its algebraic multiplicity](#multiplicity-defn).

In this case, we can diagonalize $A$ as:

\begin{align*}
    A = \bm \vv{v_1} & \vv{v_2} & \dots & \vv{v_n} \em \bm \lambda_1 & 0 & \dots & 0 \\ 0 & \lambda_2 & \dots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \dots & \lambda_n \em \bm \vv{v_1} & \vv{v_2} & \dots & \vv{v_n} \em^{-1}
\end{align*}

where $\lambda_1$ is the eigenvalue corresponding to $\vv{v_1}$, and so on.
:::

Next, let's look at some examples of diagonalizable and nondiagonalizable matrices:

:::{exercise} Checking diagonalizability of $2\times 2$ matrices
:label:eigen-ex9

For each of the matrices below, determine whether or not they are [diagonalizable](#diagonalizable-defn).

1. $A = \bm 1&0 \\ 1&1 \em$

2. $B = \bm 3&-2\\2&-1 \em$

3. $C = \bm 0&0\\0&0 \em$

```{solution} eigen-ex9
:class: dropdown

First, let's check if $A$ is diagonalizable. Solving for the eigenvalues of $A$,

\begin{align*}
\det(A - \lambda I) = 0 \iff (1 - \lambda)(-\lambda) - 1 = 0 \\
\iff \lambda^2 - \lambda - 1 = 0 \iff \lambda = \frac{1\pm \sqrt 5}{2}
\end{align*}

We see that since $A$ has $2$ distinct eigenvalues (which is the dimension of $A$) it immediately follows from [this fact](#distinct-eigenvalue-thm) that the eigenvectors of $A$ span $\mathbb{R}^2$, hence $A$ is diagonalizable.

Next, let's check if $B$ is diagonalizable. Solving for the eigenvalues of $B$,

\begin{align*}
\det(B - \lambda I) = 0 \iff (3 - \lambda)(-1 - \lambda) - (-2)(2) = 0 \\
\iff \lambda^2 - 2\lambda + 1 = 0 iff \lambda = 1
\end{align*}

Since $A$ has only $1$ distinct eigenvalue, we must check the dimension of the corresponding eigenspace. Solving the eigenvector equation for $\lambda_1 = 1$,

\begin{align*}
    (B - I) \vv x = \vv 0 \iff \bm 2&-2\\2&-2 \em \vv x = \vv 0 \iff \vv x = a\bm 1\\1 \em, a \in \mathbb{R}
\end{align*}

We see that the corresponding eigenspace only has a dimension of $1$, meaning that the eigenvectors of $B$ do not space $\mathbb{R}^2$, hence $B$ is not diagonalizable.

Next, let's check if $C$ is diagonalizable. Clearly the eigenvalues of $C$ are just $0$, and the corresponding eigenspace is the entire space $\mathbb{R}^2$. Hence $C$ is diagonalizable, and one such diagonalization is given by:

\begin{align*}
    \bm 0&0\\0&0 \em = \bm 1&0\\0&1\em\bm 0&0\\0&0\em \bm 1&0\\0&1\em^{-1}
\end{align*}

As we see, there is no direct connection between invertibility and diagonalizaibility, in the sense that one does not imply the other. You can have invertible matrices which are not diagonalizable (like $B$) and diagonalizable matrices which are not invertible (like $C$).
```
:::

#### Python Break!

Here, we'll show how to use `numpy.linalg` (or `scipy.linalg`) to diagonalize a matrix. 

In [8]:
import numpy as np

# given a square matrix A, returns a tuple of matrices (P, D) such that A = PDP^{-1}
def diagonalize(A):
    evals, evecs = np.linalg.eig(A)
    return evecs, np.diag(evals)

A = np.array([
    [2, -1, -1],
    [0, 3, 1],
    [0, 1, 3]
])

P, D = diagonalize(A)

print('P:')
print(P, '\n')
print('D:')
print(D, '\n')
print('PDP^{-1}:')
print(P @ D @ np.linalg.inv(P))

P:
[[ 1.         -0.57735027  0.        ]
 [ 0.          0.57735027 -0.70710678]
 [ 0.          0.57735027  0.70710678]] 

D:
[[2. 0. 0.]
 [0. 4. 0.]
 [0. 0. 2.]] 

PDP^{-1}:
[[ 2. -1. -1.]
 [ 0.  3.  1.]
 [ 0.  1.  3.]]


As you can see, finding a diagonalization in Python is really easy! The `numpy.linalg.eig` function returns the eigenvectors in a matrix and the eigenvalues (conveniently, it has the eigenvalues in the same order as the corresponding eigenvectors).