# **18-The Matrix Exponential**

---

### **Introduction**

In this notebook we study general $n \times n$ systems.

---

### **Author**
**Junichi Koganemaru**  

---

### **Last Updated**
**April 9, 2025**

In this notebook we study general $n \times n$ constant coefficient systems. To be specific, we want to consider equations of the form $\boldsymbol{X}'(t) = \boldsymbol{A} \boldsymbol{X}(t)$, where $\boldsymbol{A}$ is a constant $n \times n$ matrix and $\boldsymbol{X}: I \to \mathbb{R}^n$ is a vector-valued function. Previously we studied the case when $n = 2$, in this notebook we discuss the theory for general $\mathbb{N} \ni n > 2$. 

First, we will go over the definition of matrix-vector multiplication and matrix-matrix multiplication. 

### Matrix-vector and matrix-matrix multiplication

> **Definition (Matrix-vector multiplication):**  
> Let $A$ be an $m \times n$ matrix of the form  
> $$  
> A = \begin{pmatrix} 
> a_{11} & a_{12} & \ldots & a_{1n} \\ 
> a_{21} & a_{22} & \ldots & a_{2n} \\ 
> \vdots & \vdots & \ddots & \vdots \\ 
> a_{m1} & a_{m2} & \ldots & a_{mn} 
> \end{pmatrix},  
> $$  
> and let $\boldsymbol{x}$ be an $n \times 1$ vector of the form  
> $$  
> \boldsymbol{x} = \begin{pmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{pmatrix}.  
> $$  
> Then $A\boldsymbol{x}$ is defined to be a vector such that its entry in the $i$-th row is given by  
> $$  
> (A\boldsymbol{x})_i = a_{i1}\,x_1 + a_{i2}\,x_2 + \ldots + a_{in}\,x_n.  
> $$  

To visualize this, focus on the $i$-th row:
$$
\begin{pmatrix} 
\vdots & \vdots & \vdots & \vdots \\
a_{i1} & a_{i2} & \ldots & a_{in} \\
\vdots & \vdots & \vdots & \vdots 
\end{pmatrix} 
\begin{pmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{pmatrix} 
=
\begin{pmatrix} 
\vdots \\
a_{i1}\,x_1 + a_{i2}\,x_2 + \ldots + a_{in}\,x_n \\
\vdots 
\end{pmatrix}.
$$

**Remark:**  Note that we need  
$$
\text{the number of columns in } A = \text{the number of rows in } \boldsymbol{x},
$$  
otherwise this definition breaks down. In our original notation, each row has as many coefficients as there are variables. If these two numbers don't match up, we simply say that the matrix-vector product $A\boldsymbol{x}$ is undefined or that the components are incompatible.

An alternative way to think about this is in terms of the "column picture."

> **Proposition** 
> Let $A \in \mathcal{M}_{m \times n}(\mathbb{R})$ and let $\boldsymbol{v} \in \mathbb{R}^n$. Suppose $(\boldsymbol{v})_i = v_i$, and think of the $i$-th column of $A$ as a vector in $\mathbb{R}^n$ denoted by $\boldsymbol{a}_i$ for all $1 \le i \le n$. Then  
> \begin{align}  
> A \boldsymbol{v} = v_1 \boldsymbol{a}_1 + \ldots + v_n \boldsymbol{a}_n.  
> \end{align}  


In other words, the entries of the vector $\boldsymbol{v}$ specify how to combine the columns of the matrix $A$ to form the matrix-vector product $A \boldsymbol{v}$.  

To visualize this, write  

$$\begin{pmatrix} \boldsymbol{a_1} & \mid & \boldsymbol{a_2} & \mid & \ldots & \mid & \boldsymbol{a_n} \end{pmatrix} \begin{pmatrix} v_1 \\ v_2 \\ \vdots \\ v_n \end{pmatrix} = v_1 \boldsymbol{a_1} + \ldots + v_n \boldsymbol{a_n}.$$



> **Definition:** Let $A \in \mathcal{M}_{m \times n}(\mathbb{R}), B \in \mathcal{M}_{n \times p}(\mathbb{R})$. The *matrix-matrix product* (or simply the matrix product) between $A,B$, denoted by $AB$, is a matrix in $\mathcal{M}_{m \times p}(\mathbb{R})$ for which its entries are given by  
> $$  
> (AB)_{ij} = (A)_{i1} (B)_{1j} + (A)_{i2}(B)_{2j} + \ldots + (A)_{in}B_{nj}  = \sum_{k=1}^n (A)_{ik}(B)_{kj}, \; \text{for all} \; 1 \le i \le m, 1 \le j \le p.  
> $$  

A few remarks before we move on.  

**Remark:** $k$ is a dummy variable in the equation above, while $i,j$ are not as they refer to the position of the entry we want to focus on.  

**Remark:** Note that there's a compatibility condition for a matrix-matrix product to be well-defined: the number of columns of the first matrix must match the number of rows of the second matrix.  

**Remark:**  While this definition is useful for proving identities, it is too slow to be applied for performing computations.  

To visualize what we've written down, focus on the $i$-th row of $A$ and the $j$-th column of $B$. This gives us the element in the $i$-th row and $j$-th column of $AB$:  
$$  
\begin{pmatrix} 
\vdots & \vdots & \vdots & \vdots \\
a_{i1} & a_{i2} & \ldots & a_{in}\\
\vdots & \vdots & \vdots & \vdots 
\end{pmatrix} 
\begin{pmatrix} 
\vdots & b_{1j} & \vdots \\ 
\vdots & \vdots & \vdots \\ 
\vdots & b_{nj} & \vdots 
\end{pmatrix} =  
\begin{pmatrix} 
\vdots & \vdots & \vdots \\ 
\vdots & a_{i1} b_{1j} + a_{i2} b_{2j} + \ldots + a_{in} b_{nj} & \vdots  \\ 
\vdots & \vdots & \vdots 
\end{pmatrix}.  
$$  

Next we give a few different ways of thinking about matrix multiplication.

> **Proposition:** Let $A \in \mathcal{M}_{m \times n}(\mathbb{R}), B \in \mathcal{M}_{n \times p}(\mathbb{R})$. For any $j$ between $1$ and $p$, we think of the $j$-th column of $B$ as a vector in $\mathbb{R}^n$ and denote it by the vector $\boldsymbol{b}_j$. Then the $j$-th column of the matrix product $AB$ is the vector $A \boldsymbol{b}_j$.  

In other words,  
$$  
A \begin{pmatrix} \boldsymbol{b}_1 & \mid & \boldsymbol{b}_2 & \mid & \ldots & \mid & \boldsymbol{b}_p \end{pmatrix} = \begin{pmatrix} A \boldsymbol{b}_1 & \mid & A \boldsymbol{b}_2 & \mid & \ldots & \mid & A \boldsymbol{b}_p \end{pmatrix}.  
$$  

**Remark:** Since $A \boldsymbol{b}_j$ is a matrix-vector product, this vector is formed by the entries of the vector $\boldsymbol{b}_j$ specifying how to combine the columns of $A$.  

**Remark:** This means that if we focus on the columns of the matrix $AB$, we can think of each column as a linear combination of the columns of $A$.  


Next we discuss the "row picture" of matrix multiplication.

> **Proposition:**  
> Let $A = (a_1 \; \ldots \; a_n)$ be an $1 \times n$ matrix (we call these *row vectors*) and let $B$ be an $n \times p$ matrix. Denote the rows of $B$ with the row vectors $\boldsymbol{b}_1^T, \ldots , \boldsymbol{b}_n^T$. Then the matrix product $AB$ is a row vector where  
> $$  
> AB = a_1 \boldsymbol{b}_1^T + \ldots + a_n \boldsymbol{b}_n^T.  
> $$  
> In other words, the entries of the row vector $A$ specify how to combine the rows of $B$ to form the matrix product $AB$.  

We can visualize this as follows:  
$$  
\begin{pmatrix}  
a_1 & \ldots & a_n  
\end{pmatrix} \begin{pmatrix}  
\boldsymbol{b}_1^T \\  
\hline \vdots \\  
\hline  
\boldsymbol{b}_n^T  
\end{pmatrix} = a_1 \boldsymbol{b}_1^T + \ldots + a_n \boldsymbol{b}_n^T.  
$$  

> **Proposition:**  
> Let $A \in \mathcal{M}_{m \times n}(\mathbb{R}), B \in \mathcal{M}_{n \times p}(\mathbb{R})$. For any $i$ between 1 and $m$, we think of the $i$-th row of $A$ as a row vector and denote it by $\boldsymbol{a}_i^T$. Then the $i$-th row of the matrix product $AB$ is the row vector $\boldsymbol{a}_i^T B$.  

In other words  
$$  
\begin{pmatrix}  
\boldsymbol{a}_1^T \\  
\hline \vdots \\  
\hline  
\boldsymbol{a}_n^T  
\end{pmatrix} B = \begin{pmatrix}  
\boldsymbol{a}_1^T B \\  
\hline \vdots \\  
\hline  
\boldsymbol{a}_n^T B  
\end{pmatrix}.  
$$  


**Remark:** Since $\boldsymbol{a}_i^T B$ is a matrix product between a row vector and a matrix, this row vector is formed by the entries of the row vector $\boldsymbol{a}_i^T$ specifying how to combine the rows of $B$. Therefore the rows of the matrix product $AB$ are linear combinations of the rows of $B$.  

In summary, there are three equivalent ways of thinking about matrix multiplication.  

1. Entry wise: the element in the $i$-th row and the $j$-th column of $AB$ is formed by using elements in the $i$-row of $A$ and the $j$-th column of $B$.  
2. Column wise: the $j$-th column of $AB$ is formed by using elements of the $j$-th column of $B$ specifying how to combine the columns of $A$.  
3. Row wise: the $i$-th row of $AB$ is formed by using elements of the $i$-th row of $A$ specifying how to combine the rows of $B$.  


### The Matrix Exponential

Now we are ready to define the matrix exponential.

First we recall the the power series representation of the exponential function $e^x$, 
$$
e^x = 1 + x + \frac{x^2}{2!} + \frac{x^3}{3!} + \cdots = \sum_{j=0}^\infty \frac{x^j}{j!}, \; x \in \mathbb{R}
$$
The radius of convergence of this power series is infinite. We will use this to motivate the defintion of the matrix exponential.

> **Definition:** Let $A \in \mathcal{M}_{n \times n}(\mathbb{R})$. The *matrix exponential* of $A$, denoted by $e^A$, is defined via the power series
> $$
> \mathcal{M}_{n \times n}(\mathbb{R}) \ni e^{A} = I + A + \frac{1}{2!}A^2 + \frac{1}{3!} A^3 + \cdots = \sum_{j=0}^\infty \frac{1}{j!} A^j.
> $$

**Remark:** Since the matrix exponential is defined in terms of an infinite series, one in principle needs to ask if the series converges and in what sense. This is beyond the scope of this course, but with some tools one can show that this series converges for any matrix $A$.

With the notion of the matrix exponential, we can define the matrix-valued function $\Phi: \mathbb{R} \to \mathcal{M}_{n \times n}(\mathbb{R})$ via $\Phi(t) = e^{t A}$ for a given matrix $A$. 

The next proposition records some fundamental properties of the matrix-valued function $\Phi$.

> **Proposition:** Let $A \in \mathcal{M}_{n \times n}(\mathbb{R})$ and let the function $\Phi: \mathbb{R} \to \mathcal{M}_{n \times n}(\mathbb{R})$ be defined as above. The following hold. 
> 1. The derivative of $\Phi$ is $A e^{tA}$ or $e^{tA} A$, i.e. $\frac{d}{dt} e^{tA} = e^{tA} A = A e^{tA}$ (this is the product of two matrices).  
> 2. $\Phi$ equal to the identity matrix for $t = 0$, i.e. $e^{0 \cdot A} = I$, the identity matrix.  
> 3. $e^{(t+s) A} = e^{tA} e^{sA}$ for all $t,s \in \mathbb{R}$.  
> 4. The inverse of $e^{tA}$ is the matrix $e^{-tA}$: $( e^{tA} )^{-1} = e^{-tA}$.  
> 5. If $A = P D P^{-1}$ where $\det P \neq 0$ and $D$ is a diagonal matrix, then $e^{A} = P e^{D} P^{-1}$.  
> 6. **If $AB = BA$**, then $e^{A+B} = e^A e^B$. Note in general we do not have equality since matrices in general do not commute.  

The matrix exponential is important because of the following proposition.

> **Proposition:**  The unique solution satisfying the IVP  
> $$
> \begin{cases}
> \boldsymbol{X}'(t) = A \boldsymbol{X}(t), \; t \in \mathbb{R} \\
> \boldsymbol{X}(t_0) = \boldsymbol{X}_0
> \end{cases}
> $$
> is given by the function $X : \mathbb{R} \to \mathbb{R}^n$ defined via
> $$
> \boldsymbol{X}(t) = e^{(t-t_0)A} \boldsymbol{X}_0.
> $$

The point of the proposition above is to show that one can solve the IVP if one can compute $e^{tA}$ for a given matrix $A$. Unfortunately, in general computing $e^{tA}$ explicitly is quite cumbersome as it involves using *Jordan canonical form* of $A$ and the notion of generalized eigenvectors. This is outside the scope of this course, so we'll only consider a few special cases.

If $A$ has $n$ real distinct eigenvalues and $n$ corresponding real linearly independent eigenvectors, we can make the following claim.

> **Proposition:**  If $A$ has real distinct eigenvalues $\lambda_1, ..., \lambda_n$ and $n$ corresponding real linearly independent eigenvectors $\boldsymbol{v}_1, ..., \boldsymbol{v}_n$, then we can **diagonalize** $A = P D P^{-1}$ and  
> $$
> e^{tA} = P e^{tD} P^{-1}  
> $$
> where $P = \begin{pmatrix} \boldsymbol{v}_1 \rvert & ...& \lvert \boldsymbol{v}_n \end{pmatrix}$ and  
> $$
> e^{tD} = \begin{pmatrix}
> e^{t \lambda_1} & 0 & \cdots & 0\\
> 0 & e^{t \lambda_2} & \cdots & 0 \\
> 0 & \vdots  & \vdots & \vdots \\
> 0 & \cdots & \cdots & e^{t \lambda_n}
> \end{pmatrix}.
> $$

An equivalent (and perhaps more familiar) way of writing the solution is to write  
$$
\boldsymbol{y}(t) = c_1 e^{t\lambda_1} \boldsymbol{v}_1  + c_2 e^{t \lambda_2} \boldsymbol{v}_2 + ... + c_n e^{t \lambda_n} \boldsymbol{v}_n 
$$
where $c_1, ..., c_n$ are constants determined by the initial condition. 

#### The inhomogeneous problem
For the inhomogeneous problem  
$$
\begin{cases}
\boldsymbol{X}'(t) = A \boldsymbol{X}(t) + \boldsymbol{F}(t), \; t \in \mathbb{R} \\
\boldsymbol{X}(t_0) = \boldsymbol{X}_0,
\end{cases}
$$
one can show that the unique solution is given by  
$$
\boldsymbol{X}(t) = \underbrace{e^{(t-t_0)A} \boldsymbol{X}_0}_{= \boldsymbol{X}_c} + \underbrace{ e^{tA} \int_{t_0}^t e^{-s A} \boldsymbol{F}(s) \; ds }_{ = \boldsymbol{X}_p}. 
$$
This is sometimes referred to as the variation of parameters formula or Duhamel's formula. One can readily check (using the product rule, fundamental theorem of calculus and Proposition) that  
$$
\boldsymbol{X}'(t) = A e^{(t-t_0) A} \boldsymbol{X}_0 + A e^{t A} \int_{t_0}^t e^{-s A} \boldsymbol{F}(s) \; ds + e^{t A} e^{-tA} \boldsymbol{F}(t)
$$
$$
= A \left( e^{(t-t_0) A} \boldsymbol{X}_0 + e^{t A} \int_{t_0}^t e^{-s A} \boldsymbol{F}(s) \; ds \right) + e^{(t-t) A} \boldsymbol{F}(t)  
$$
$$
= A \boldsymbol{X} + \boldsymbol{F}(t), \; t \in \mathbb{R}
$$
and  
$$
\boldsymbol{X}(t_0) = e^{ (t_0 - t_0) A}  \boldsymbol{X}_0  + e^{tA} \int_{t_0}^{t_0} e^{-s A} \boldsymbol{F}(s) \; ds = I \boldsymbol{X}_0 + e^{tA} \boldsymbol{0} = \boldsymbol{X}_0.
$$
Therefore, as long as one can calculate the matrix exponential $e^{tA}$ explicitly, one can write down the unique solution for the inhomogeneous problem. 


> **Example:**  Consider the inhomogeneous problem  
> $$  
> \begin{cases}  
> \boldsymbol{X}'(t) = A \boldsymbol{X}(t) + \boldsymbol{F}(t), \; t \in \mathbb{R} \\  
> \boldsymbol{X}(0) = \boldsymbol{X}_0,  
> \end{cases}  
> $$  
> for  
> $$  
> A = \begin{pmatrix}  
> 0 & 1 \\  
> -1 & 0  
> \end{pmatrix},\quad  \boldsymbol{F} = \begin{pmatrix} 1 \\ 0 \end{pmatrix}, \quad \boldsymbol{X}_0 = \begin{pmatrix} 0 \\ 1 \end{pmatrix}.  
> $$  
> According to the discussion from the previous section, the unique solution to the system is given by  
> $$  
> \boldsymbol{X}(t) = e^{tA} \boldsymbol{X}_0 + e^{tA} \int_0^t e^{-sA} \boldsymbol{F}(s) \; ds.  
> $$  
> From the previous example, we've found that  
> $$  
> e^{tA} = \begin{pmatrix}  
> \cos t  & \sin t \\  
> -\sin t & \cos t  
> \end{pmatrix},  
> $$  
> therefore the solution to the system is $X: \mathbb{R} \to \mathbb{R}^2$ given by
> $$  
> \begin{align*}  
> \boldsymbol{X}(t) &= e^{tA} \boldsymbol{X}_0 + e^{tA} \int_0^t e^{-sA} \boldsymbol{F}(s) \; ds\\  
> &= \begin{pmatrix}  
> \cos t  & \sin t \\  
> -\sin t & \cos t  
> \end{pmatrix} \begin{pmatrix} 0 \\ 1 \end{pmatrix}  + \begin{pmatrix}  
> \cos t  & \sin t \\  
> -\sin t & \cos t  
> \end{pmatrix} \int_0^t \begin{pmatrix}  
> \cos (-s)  & \sin (-s) \\  
> -\sin (-s) & \cos (-s)  
> \end{pmatrix}\begin{pmatrix} 1 \\ 0 \end{pmatrix} \; ds \\  
> &= \begin{pmatrix}  
> \cos t  & \sin t \\  
> -\sin t & \cos t  
> \end{pmatrix} \begin{pmatrix} 0 \\ 1 \end{pmatrix}  + \begin{pmatrix}  
> \cos t  & \sin t \\  
> -\sin t & \cos t  
> \end{pmatrix} \int_0^t \begin{pmatrix}  
> \cos s  & -\sin s \\  
> \sin s & \cos s  
> \end{pmatrix}\begin{pmatrix} 1 \\ 0 \end{pmatrix} \; ds \\  
> &=\begin{pmatrix} \sin t \\ \cos t \end{pmatrix} + \begin{pmatrix}  
> \cos t  & \sin t \\  
> -\sin t & \cos t  
> \end{pmatrix} \int_0^t \begin{pmatrix} \cos s \\ \sin s \end{pmatrix} \; ds \\  
> &=\begin{pmatrix} \sin t \\ \cos t \end{pmatrix} + \begin{pmatrix}  
> \cos t  & \sin t \\  
> -\sin t & \cos t  
> \end{pmatrix} \begin{pmatrix}  
> \sin t \\  
> -\cos (t) + 1  
> \end{pmatrix} \\  
> &= \begin{pmatrix} \sin t \\ \cos t \end{pmatrix} + \begin{pmatrix}  
> \cos t  \sin t - \sin t \cos t + \sin t  \\  
> -\sin^2 t - \cos^2 t + \cos t  
> \end{pmatrix} \\  
> &= \begin{pmatrix}  
> 2\sin t \\  
> 2\cos t - 1  
> \end{pmatrix}.  
> \end{align*}
> $$