# **Higher Order Linear Equations**

---

### **Introduction**

This notebook introduces concepts and techniques useful for studying higher order equations. 

---

### **Author**
**Junichi Koganemaru**  

---

### **Last Updated**
**January 28, 2025**

## Introduction

In this notebook we aim to study general $n$-th order linear differential equations of the form

$$
a_n(t) y^{(n)}(t) + a_{n-1}(t) y^{(n-1)}(t) + \cdots + a_1(t) y'(t) + a_0(t) y(t) = f(t), \; t \in \mathbb{R}
$$
where $a_0, ..., a_n, f: \mathbb{R} \to \mathbb{R}$ are assumed to be continuous. 

The focus for us will be the case when $n = 2$, as many physical systems are modeled via second order differential equations. However, we will see that the theory for higher order equations is completely analogous.

First, we go over an example to illustrate why the concepts we introduce in the next section are useful.

> **Example:**
> Consider the second order linear differential equation
> $$
> y''(t) + 2y'(t) + 2y(t) = f(t), \; t \in \mathbb{R},
> $$
> where $f$ is assumed to be continuous. If $y: \mathbb{R} \to \mathbb{R}$ is a solution to the equation above, consider the **vector-valued function** $\boldsymbol{Y}: \mathbb{R} \to \mathbb{R}^2$ defined via
> $$
> \boldsymbol{Y}(t) = \begin{pmatrix} y(t) \\ y'(t) \end{pmatrix}.
> $$
> Here $Y$ takes in a value $t \in \mathbb{R}$ and gives back a **column vector** in $\mathbb{R}^2$. We will carefully introduce operations that can be performed on vector-valued functions later, for now we just need to know how to differentiate them. For vector valued functions, their derivatives are defined **component wise**, i.e. 
> $$
> \boldsymbol{Y}'(t) = \begin{pmatrix} \frac{d}{dt} [y(t)] \\ \frac{d}{dt} [y'(t)] \end{pmatrix} = \begin{pmatrix} y'(t) \\ y''(t) \end{pmatrix}, \; t \in \mathbb{R}.
> $$
> Note that since $y$ is a solution to the differential equation, we have that $y''(t) = -2y'(t) - 2y(t) + f(t)$. Therefore, we have 
> $$
> \boldsymbol{Y}'(t) = \begin{pmatrix} y'(t) \\ y''(t) \end{pmatrix} = \begin{pmatrix} y'(t) \\ -2y'(t) - 2y(t) + f(t) \end{pmatrix} =  \begin{pmatrix} 0 \cdot y(t) + 1 \cdot y'(t) \\ - 2 \cdot y(t) -2 \cdot y'(t)  \end{pmatrix} + \begin{pmatrix} 0 \\ f(t) \end{pmatrix}.
> $$
> Note that the first term on the right hand side only depends on the entries of $\boldsymbol{Y}$. Using some notation that we will go over later, we can write the above equation as
> $$
> \boldsymbol{Y}'(t) =  \begin{pmatrix}  0 & 1 \\ -2 & -2 \end{pmatrix} \begin{pmatrix} y(t) \\ y'(t) \end{pmatrix} + \begin{pmatrix} 0 \\ f(t) \end{pmatrix},
> $$
> where 
> $$ 
> A = \begin{pmatrix}  0 & 1 \\ -2 & -2 \end{pmatrix}
> $$
> is a **matrix** recording the coefficients of the components of $\boldsymbol{Y}$. We thus obtain a **first order vector-valued** equation  
> $$
> \boldsymbol{Y}'(t) = A \boldsymbol{Y}(t) + \boldsymbol{F}(t) \Longleftrightarrow \boldsymbol{Y}'(t) - A  \boldsymbol{Y}(t) =\boldsymbol{F}(t), \; t \in \mathbb{R},
> $$
> where $F: \mathbb{R} \to \mathbb{R}^2$ is defined via $\boldsymbol{F}(t) = \begin{pmatrix} 0 \\ f(t) \end{pmatrix}$. This is a **first order linear differential equation** for the vector-valued function $\boldsymbol{Y}$.

The upshot of this example is that we can rewrite a second order scalar differential equation as a first order vector-valued differential equation. 

In fact, this can be done for general $n$-th order linear equations as well, which means that we can always trade the order of the differential equation for the number of components of the vector-valued function $\boldsymbol{Y}$. 

While this is extremely powerful, as it shows that any $n$-th order scalar equation can be studied as a first order vector-valued equation, it requires one to be comfortable with the language of linear algebra. We will take this approach later as it gives us a unified framework for studying linear equations.

For now, we will only study second order equations as scalar equations, however the example shows that the theory of $n$-th order linear equations must be intricately connected to concepts coming from linear algebra. We will introduce some preliminary notions below. 

## Preliminaries 

### Linear Independence
Linear independence is a notion from linear algebra that encodes the "dependencies" among a collection of objects.

It is then useful in this discussion to borrow some terminology from linear algebra.

> **Definition (Linear combination of functions)**
> Given $n$ continuous functions $y_1, \ldots, y_n: I \to \mathbb{R}$ on an interval $I$, a linear combination of these functions is another continuous function $y: I \to \mathbb{R}$ which can be written as 
> $$
> y(t) = c_1 y_1(t) + c_2 y_2(t) + \ldots + c_n y_n(t), \; t \in I
> $$
> where $c_1, \ldots, c_n \in \mathbb{R}$ are constants.

To motivate our discussion, let's first consider an example.

> **Example**
> Consider the functions $y_1, y_2, y_3: \mathbb{R} \to \mathbb{R}$ defined via $y_1(t) = t, y_2(t) = t+1, y_3(t) = 2t+1$ for $t \in \mathbb{R}$, and suppose that $y$ is a linear combination of $y_1, y_2, y_3$: 
> $$
> y(t) = c_1 y_1(t) + c_2 y_2(t) + c_3 y_3(t), \; t \in \mathbb{R} 
> $$
> for some constants $c_1, c_2, c_3$. Notice that here we have 
> $$
> y_1 + y_2 = y_3,
> $$
> so in the equation above, we can replace $y_3$ and write 
> $$
> y(t) = c_1 y_1(t) + c_2 y_2(t) + c_3(y_1(t) + y_2(t)) = C_1 y_1(t) + C_2 y_2(t), \; t \in \mathbb{R}
> $$
> where $C_1 = c_1 + c_3, C_2 = c_2 + c_3$.
   

  What this shows us is that because $y_3$ is "dependent" on $y_1$ and $y_2$, even though in the definition of $y$ there seems to be three "building blocks", in reality only two "building blocks" are required to create $y$. In other words, any linear combination of $y_1, y_2, y_3$ can always be written as a linear combination of $y_1, y_2$.
  
  In this sense, the appearance of $y_3$ is redundant.

We will see later that general solutions to homogeneous linear differential equation is given as a linear combination of functions. What we want to do is find a way to capture the notion of "non-redundancy". This is where the notion of **linear independence** comes in. First, let us give an intuitive definition as to what it means for a set of functions to be linearly independent. 

> **Definition (Linear independence I)**
> A set of continuous functions defined over an interval $I$ is said to be *linearly independent* if none of the functions can be written as the linear combination of the other functions in the set, on the interval $I$.

In the previous example, the set $\{y_1, y_2, y_3\}$ is not linearly independent over $\mathbb{R}$ because $y_3$ is a linear combination of $y_1$ and $y_2$. 


The previous definition, while conceptually clear, is hard to apply in practice. Instead we will often use the following alternative definition.

> **Definition (Linear independence II)**
> A set of continuous functions $y_1, \ldots, y_n: I \to \mathbb{R}$ is said to be linearly independent on an interval $I$ if 
> $$
> c_1 y_1(t) + \ldots + c_n y_n(t) = 0 \; \text{for all} \; t \in I
> $$
> implies $c_1 = c_2 = \ldots = c_n = 0$.


Let's try to see why the two definitions are equivalent to each other. Suppose there exists some non-zero $c_i$'s such that 
$$
c_1 y_1(t) + \ldots + c_k y_k(t)= 0, \; t \in I.
$$
Up to relabeling of the indices, we can suppose without loss of generality that $c_1 \neq 0$. Then we can immediately write 
$$
y_1(t) = -\frac{1}{c_1} \left( c_2 y_2(t) + \ldots + c_k y_k(t) \right), \; t \in I
$$
meaning that $y_1$ is a linear combination of other functions in the set. On the other hand, if a function is already a linear combination of the other functions, then we can find non-zero coefficients to produce the zero function. So the two definitions are equivalent to each other.

**Remark:** According to this definition, any set containing the zero function is automatically linearly dependent.


> **Example**
> Consider the functions $y_1, y_2, y_3$ defined via $y_1(t) = e^t, y_2(t) = e^{-t}, y_3(t) = e^{t} + e^{-t}$ for $t \in \mathbb{R}$. Notice that
> $$
> y_3(t) = y_1(t) + y_2(t) \Longleftrightarrow y_1(t) + y_2(t) - y_3(t) = 0, \; t \in \mathbb{R}
> $$
> for all real values of $t$. So if $c_1 y_1(t) + c_2 y_2(t) + c_3 y_3(t) = 0$ for all $t \in \mathbb{R}$, it's not necessarily true that $c_1 = c_2 = c_3 = 0$. Therefore these three functions are linearly dependent.

> **Example**
> Consider the functions $y_1, y_2, y_3$ defined via $y_1(t) = \sin^2(t), y_2(t) = \cos^2(t), y_3(t) = 1$ for $t \in \mathbb{R}$. Since 
> $$
> y_1(t) + y_2(t) = \sin^2(t) + \cos^2(t) = 1 = y_3(t)
> $$
> for all $t \in \mathbb{R}$, we see that these functions are linearly dependent.


## The Wronskian

 In practice, it is usually cumbersome to check linear independence via either of the previous definition when the set contains more than two functions. So instead, we'll again borrow some ideas from linear algebra and encode the dependence/independence of these functions in a single object - the Wronskian.

> **Definition**
> Let $I \subseteq \mathbb{R}$ be an interval and let $y_1, ..., y_n \in C^{n-1}(I ; \mathbb{R})$. Then the *Wronskian* of this collection of functions is a function $W: I \to \mathbb{R}$ defined as a **determinant** of a matrix,
> $$
> W(y_1, y_2, ..., y_n) (t) := \det \begin{pmatrix}
> y_1(t) & y_2(t) & \cdots & y_n(t) \\
> y_1'(t) & y_2'(t) & \cdots & y_n'(t) \\
> \vdots & \vdots & \ddots &\vdots  \\
> y_1^{(n-1)}(t) & y_2^{(n-1)}(t) & \cdots & y_n^{(n-1)}(t) 
> \end{pmatrix}, \; t \in I.
> $$

The following proposition establishes a connection between the Wronskian and linear independence.

> **Proposition**
> Let $y_1, \ldots , y_n: I \to \mathbb{R}$ be $n$ **analytic functions** over an interval $I$. Then the set of functions are linearly dependent if and only if $W(y_1,\ldots ,y_n)$ is identically zero on $I$.

In other words,
1. If the Wronskian of $n$ analytic functions is identically zero over an interval $I$, then the set of functions is linearly dependent.
2. Otherwise, the set of functions is linearly independent.

This means as long as we can compute the Wronksian of a set of analytic functions, we can determine whether they are linearly independent or not.

**Remark:** Analytic functions are smooth functions that admit a local power series representation at every point in their respective domains. In practice, many functions that we encounter are analytic, so this theorem is quite useful. Constant functions, trigonometric functions, polynomials, and exponential functions are all examples of analytic functions.


If the functions in question are solutions to linear differential equation, we can say something even stronger. 

> **Theorem (Cauchy–Kovalevskaya)** 
> Let $y_1, ..., y_n \in C^{n-1}(\mathbb{R})$ be $n$ solutions to an $n$-th order homogeneous linear differential equation on an interval $I$ of the form 
> $$
> y^{(n)}(t) + a_{n-1}(t) y^{(n-1)}(t) + \ldots  + a_1(t)y'(t) + a_0(t) y(t) = 0, \; t \in I
> $$
> where $\{a_i\}_{i=1}^{n-1}$ are *analytic* over $I$. Then $y_1, ..., y_n$ are analytic and the set of solutions is linearly dependent if and only if $W(y_1, y_2, ..., y_n)(t) = 0$ for some $t \in I$. 

**Remark**: This theorem is stated only for functions that are solutions to homogeneous equations of a specific form (leading coefficient must be 1 and the coefficients must be analytic), which is why in the statement we only need that the Wronskian is zero at one point instead of the interval. It turns out that in this situation, vanishing at one point immediately implies that the Wronskian is identically zero on the entire interval.


## Determinants 
Next we discuss how to compute the Wronskian by going over how to compute determinants. In this class we will only introduce how to compute determinants of $2 \times 2$ and $3 \times 3$ matrices.

### Determinants of $2 \times 2$ matrices

We first consider the determinant of $2 \times 2$ matrices.

> **Definition**  
> Consider the square matrix  
> $$  
> A = \begin{pmatrix}  
> a & b \\  
> c & d  
> \end{pmatrix}.  
> $$  
> The *determinant* of $A$ is a number associated to the matrix $A$, defined by  
> $$  
> \det A := ad - bc.  
> $$  

**Remark** 
The determinant is sometimes denoted with vertical bars: 
$$
\det(A) = \begin{vmatrix}
a & b \\
c & d
\end{vmatrix}.
$$

> **Example:**
> Let 
> $$
> A = \begin{pmatrix} 
> 1 & 2 \\
> 3 & 4 
> \end{pmatrix}, B = \begin{pmatrix} 
> 2 & 3 \\
> 4 & 5
> \end{pmatrix}. 
> $$
> Then
> $$
> \det A = 1 \times 4 - 2 \times 3 = -2, \det B = 2 \times 5 - 3 \times 4 = -2.
> $$



### Determinants of $3 \times 3$ matrices

Suppose $A$ is a $3 \times 3$ matrix of the form
$$
A = \begin{pmatrix}
a & b & c\\
d & e & f\\
g & h & i
\end{pmatrix}.
$$
The simplest way to calculate $\det(A)$ is to use
$$
\det(A) = \begin{vmatrix}
a & b & c\\
d & e & f\\
g & h & i
\end{vmatrix} = a  \begin{vmatrix}
e & f \\
h & i
\end{vmatrix} - d \begin{vmatrix}
b & c \\
h & i
\end{vmatrix}  + g\begin{vmatrix}
b & c \\
e & f
\end{vmatrix}
= a(ei - fh) - d(bi - ch) + g( bf - ce).
$$

One way to interpret this is that we are expanding along the first column, and we're multiplying each element in the column by the determinant of the matrix obtained from ignoring the row and column containing that element. Here's the way to visualize it (the smaller matrices are called \emph{minors})
$$
\begin{vmatrix}
a & \cdots & \cdots \\
\vdots & \textcolor{red}{e} & \textcolor{red}{f}\\
\vdots & \textcolor{red}{h} & \textcolor{red}{i}
\end{vmatrix}  \quad \quad \begin{vmatrix}
\vdots & \textcolor{red}{b} & \textcolor{red}{c}\\
d & \cdots & \cdots\\
\vdots & \textcolor{red}{h} & \textcolor{red}{i}
\end{vmatrix} \quad \quad 
\begin{vmatrix}
\vdots & \textcolor{red}{b} & \textcolor{red}{c}\\
\vdots & \textcolor{red}{e} & \textcolor{red}{f}\\
g & \cdots & \cdots
\end{vmatrix}
$$

Notice that the signs in the expansion flipped from $+a$ to $-d$ to $+g$. This is important. 

In general one can expand along any row and any column, as long as you have the right sign in front of the elements in the expansion. The signs associated with each element is given in the matrix on the right:
$$
\begin{pmatrix}
a & b & c\\
d & e & f\\
g & h & i
\end{pmatrix} \quad \quad \begin{pmatrix}
+ & - & +\\
- & +& -\\
+ & - & +
\end{pmatrix}
$$

So in the equation above
$$
\det(A) = \textcolor{red}{+a}  \begin{vmatrix}
e & f \\
h & i
\end{vmatrix} \textcolor{red}{-d} \begin{vmatrix}
b & c \\
h & i
\end{vmatrix}  \textcolor{red}{+g} \begin{vmatrix}
b & c \\
e & f
\end{vmatrix},
$$
we took
$$
\begin{pmatrix}
\textcolor{red}{a} & b & c\\
\textcolor{red}{d} & e & f\\
\textcolor{red}{g} & h & i
\end{pmatrix} \quad \quad \begin{pmatrix}
\textcolor{red}{+} & - & +\\
\textcolor{red}{-} & +& -\\
\textcolor{red}{+} & - & +
\end{pmatrix}.
$$

If we were to expand along the second column then we'd have the minors as 
$$
\begin{pmatrix}
\cdots & b & \cdots \\
\textcolor{blue}{d} & \vdots  & \textcolor{blue}{f}\\
\textcolor{blue}{g} & \vdots & \textcolor{blue}{i}
\end{pmatrix}  \quad \quad \begin{pmatrix}
\textcolor{blue}{a} & \vdots &  \textcolor{blue}{c} \\
\cdots & e  & \cdots\\
\textcolor{blue}{g} & \vdots & \textcolor{blue}{i}
\end{pmatrix} \quad \quad 
\begin{pmatrix}
\textcolor{blue}{a} & \vdots &  \textcolor{blue}{c} \\
	\textcolor{blue}{g} & \vdots & \textcolor{blue}{i}
   \end{pmatrix}  \quad \quad \begin{pmatrix}
	\textcolor{blue}{a} & \vdots &  \textcolor{blue}{c} \\
	\cdots & e  & \cdots\\
	\textcolor{blue}{g} & \vdots & \textcolor{blue}{i}
   \end{pmatrix} \quad \quad 
   \begin{pmatrix}
   \textcolor{blue}{a} & \vdots &  \textcolor{blue}{c} \\
	\textcolor{blue}{d} & \vdots  & \textcolor{blue}{f}\\
   \cdots& h & \cdots
   \end{pmatrix}
   $$
   and thus 
   $$
   \det(A) =  \textcolor{red}{-b}  \begin{vmatrix}
   d & f \\
   g & i
   \end{vmatrix}  \textcolor{red}{+e} \begin{vmatrix}
   a & c \\
   g & i
   \end{vmatrix} \textcolor{red}{-h} \begin{vmatrix}
   a & c \\
   d & f
   \end{vmatrix}.
   $$

> **Example:**
> Let 
> $$
> A = \begin{pmatrix} 
> 1 & 2 & 3 \\
> 4 & 5 & 6 \\
> 7 & 8 & 9
> \end{pmatrix}.
> $$
> According to the discussion above, we can evaluate $\det$ along the second row:
> $$
> \det A = -4 \times \det \begin{pmatrix} 
> 2 & 3 \\
> 8 & 9
> \end{pmatrix} + 5 \times \det \begin{pmatrix} 
> 1 & 3 \\
> 7 & 9
> \end{pmatrix} - 6 \times  \det \begin{pmatrix} 
> 1 & 2 \\
> 7 & 8
> \end{pmatrix}.
> $$
> We can also evaluate along the second column:
> $$
> \det A = -2 \times \det \begin{pmatrix} 
> 4 & 6 \\
> 7 & 9
> \end{pmatrix} + 5 \times \det  \begin{pmatrix} 
> 1 & 3 \\
> 7 & 9
> \end{pmatrix} - 8 \times \det \begin{pmatrix} 
> 1 & 3 \\
> 4 & 6
> \end{pmatrix}.  
> $$


## Examples calculating Wronskians

> **Example**
> Consider the functions $y_1, y_2: \mathbb{R} \to \mathbb{R}$ defined via 
> $$
> y_1(x) = e^x, \; y_2(x) = e^{2x}.
> $$
> Then 
> $$
> W(y_1,y_2) (x) = \det \begin{pmatrix}
> e^x & e^{2x} \\
> e^x & 2e^{2x}
> \end{pmatrix} = 2e^{3x} - e^{3x} = e^{3x} \neq 0 \; \text{for all} \; x \in \mathbb{R}.
> $$
> Therefore the two functions are linearly independent over any interval $I \subseteq \mathbb{R}$.  

> **Example**
> Consider the functions $y_1, y_2: \mathbb{R} \to \mathbb{R}$ defined via 
> $$
> y_1(t) = t^2, \; y_2(t) = t^3.
> $$
> Then 
> $$
> W(y_1,y_2) (t) = \det \begin{pmatrix}
> t^2 & t^3 \\
> 2t & 3t^2
> \end{pmatrix} = 3t^4 - 2t^4 = t^4 \neq 0 \; \text{for all} \; t \neq 0.
> $$
> Therefore the two functions are linearly independent on any interval $I \subseteq \mathbb{R}$, even if it includes $t = 0$.  

**Remark:**
Since the Wronskian in the previous example doesn't vanish identically on any interval $I$ containing $0$, this also means that on any interval $I$ containing 0, there is no equation of the form 
$$
y''(t) + a_1(t) y'(t) + a_0(t)y(t) = 0, \; t \in I
$$
with analytic coefficients that has these two functions as solutions, as otherwise either the Wronskian is either never zero or it vanishes identically. 

However, if $I$ does not contain 0, then the Wronskian never vanishes and this is possible. One can check that $y_1, y_2: (0,\infty) \to \mathbb{R}$ are solutions to 
$$
y''(t) - \frac{5}{2t} y'(t) + \frac{3}{2t^2} y(t) = 0, \; t > 0.
$$
on the interval $I = (0,\infty)$. 


> **Example**
> Consider the functions $y_1, y_2, y_3: \mathbb{R} \to \mathbb{R}$ defined via 
> $$
> y_1(t) = \sin^2(t), \; y_2(t) = \cos^2(t), \; y_3(t) = 1.
> $$
> Then 
> $$
> W(y_1,y_2, y_3) (t) = \det \begin{pmatrix}
> \sin^2(t) & \cos^2(t) & 1 \\
> 2 \sin(t) \cos(t) & -2 \sin(t)\cos(t) & 0 \\
> 2 \cos(2t) & -2 \cos(2t) & 0
> \end{pmatrix}
> = \det \begin{pmatrix}
> 2 \sin(t) \cos(t) & -2 \sin(t)\cos(t) \\
> 2 \cos(2t) & -2 \cos(2t)
> \end{pmatrix}
> = 0 \; \text{for all} \; t \in \mathbb{R}.
> $$
> Therefore the three functions are linearly dependent on any interval $I \subseteq \mathbb{R}$.