---
title: 9.1 Eigenvalues of Symmetric Matrices
subject:  Symmetric Matrices
subtitle: 
short_title: 9.1 Eigenvalues of Symmetric Matrices
authors:
  - name: Nikolai Matni
    affiliations:
      - Dept. of Electrical and Systems Engineering
      - University of Pennsylvania
    email: nmatni@seas.upenn.edu
license: CC-BY-4.0
keywords: 
math:
  '\vv': '\mathbf{#1}'
  '\bm': '\begin{bmatrix}'
  '\em': '\end{bmatrix}'
  '\R': '\mathbb{R}'
---

[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/nikolaimatni/ese-2030/HEAD?labpath=/08_Ch_9_Symmetric_Matrices/101-eigen_symm.ipynb)

{doc}`Lecture notes <../lecture_notes/Lecture 16 - Eigenvalues of Symmetric Matrices, Spectral Theorem, Quadratic Forms and Positive Definite Matrices, Optimization Principles for Eigenvalues of Symmetric Matrices.pdf>`

## Reading

Material related to this page, as well as additional exercises, can be ALAA 8.5.

## Learning Objectives

By the end of this page, you should know:
- 
- the Spectral theorem 

## Symmetric Matrix

A square matrix $A$ is said to be symmetric if $A = A^T$. For example, all 2$\times$2 symmetric and 3$\times$3 symmetric matrices are of the form:

$$
\begin{bmatrix}
a & b \\
b & c
\end{bmatrix}
\quad \text{and} \quad
\begin{bmatrix}
a & b & c \\
b & d & e \\
c & e & f
\end{bmatrix}
$$

Symmetric matrices arise in many practical contexts: an important one we will spend time on next lecture are _covariance matrices_. For now, we simply take them as a family of interesting matrices.

Symmetric matrices enjoy many interesting properties, including the following one which will be the focus of this lecture:

:::{prf:theorem}
Let $A = A^T \in \mathbb{R}^{n\times n}$ be a symmetric $n\times n$ matrix. Then:
1. All eigenvalues of $A$ are real.
2. Eigenvectors corresponding to distinct eigenvalues of $A$ are orthogonal.
3. There is an orthonormal basis of $\mathbb{R}^n$ consisting of $n$ eigenvectors of $A$.
In particular, all real symmetric matrices are complete and real diagonalizable.
:::

We'll spend the rest of this lecture exploring the consequences of this remarkable theorem, before diving into applications over the next few lectures.

First, we work through a few simple examples to see this theorem in action.

:::{prf:example}
$A = \begin{bmatrix} 3 & 1 \\ 1 & 3 \end{bmatrix}$. We've seen this matrix in previous examples. It has eigenvalues $\lambda_1 = 4$ and $\lambda_2 = 2$ with corresponding eigenvectors $\vv v_1 = (1,1)$ and $\vv v_2 = (-1,1)$. We easily verify that $\vv v_1^T \vv v_2 = 0$, and hence are orthogonal. We construct an orthonormal basis by dividing each eigenvector by its Euclidean norm:

$$
\vv u_1 = \frac{\vv v_1}{\|\vv v_1\|} = \frac{1}{\sqrt{2}} \begin{bmatrix} 1 \\ 1 \end{bmatrix}
\quad \text{and} \quad
\vv u_2 = \frac{\vv v_2}{\|\vv v_2\|} = \frac{1}{\sqrt{2}} \begin{bmatrix} -1 \\ 1 \end{bmatrix}
$$

:::

:::{prf:example} 
Consider the symmetric matrix $A = \begin{bmatrix} 5 & -4 & 2 \\ -4 & 5 & 2 \\ 2 & 2 & -1 \end{bmatrix}$. Computing the eigenvalues/eigenvectors of $A$ (e.g., using `np.linalg.eig`) we see that

$$
\lambda_1 = 9, \vv v_1 = \begin{bmatrix} 1 \\ -1 \\ 0 \end{bmatrix}, \quad
\lambda_2 = 3, \vv v_2 = \begin{bmatrix} 1 \\ 1 \\ 1 \end{bmatrix}, \quad \text{and} \quad
\lambda_3 = -3, \vv v_3 = \begin{bmatrix} 1 \\ 1 \\ -2 \end{bmatrix}.
$$

You can check that these vectors are pairwise orthogonal: $\vv v_i^T \vv v_j = 0$ for $i \neq j$, and hence form an orthogonal basis for $\mathbb{R}^3$. An orthonormal basis is obtained by the corresponding unit norm eigenvectors:
$$
\vv u_1 = \frac{1}{\sqrt{2}} \begin{bmatrix} -1 \\ 1 \\ 0 \end{bmatrix}, \quad
\vv u_2 = \frac{1}{\sqrt{3}} \begin{bmatrix} 1 \\ 1 \\ 1 \end{bmatrix}, \quad \text{and} \quad
\vv u_3 = \frac{1}{\sqrt{6}} \begin{bmatrix} 1 \\ 1 \\ -2 \end{bmatrix}.
$$

## The Spectral Theorem

The theorem above tells us that every real, symmetric matrix admits an eigenvector basis, and hence is diagonalizable. Furthermore, we can always choose eigenvectors that form an orthonormal basis—hence, the diagonalizing matrix takes a particularly simple form.

Remember that a matrix $Q \in \mathbb{R}^{n \times n}$ is **orthogonal** if and only if its columns form an orthonormal basis of $\mathbb{R}^n$. Alternatively, we can characterize orthogonal matrices by the condition that $Q^T Q = Q Q^T = I$, i.e., $Q^{-1} = Q^T$.

If we use this orthonormal eigenbasis when diagonalizing a symmetric matrix $A$, we obtain its _spectral factorization_:

:::{prf:theorem}
:label: spectral_thm
Let $A$ be a real symmetric matrix. Then there exists an orthogonal matrix $Q$ such that
\begin{equation}
\label{ST_eqn}
A = Q \Lambda Q^{-1} = Q \Lambda Q^T \qquad (\text{ST})
\end{equation}
where $\Lambda$ is a real diagonal matrix. The eigenvalues of $A$ appear on the diagonal of $\Lambda$, while the columns of $Q$ are the corresponding orthonormal eigenvectors.
:::

:::{note} Historical Remark
The term "spectrum" refers to the eigenvalues of a matrix, or more generally, a linear operator. This terminology originates in physics: the spectral energy lines of atoms, molecules, and nuclei are characterized as the eigenvalues of the governing quantum mechanical Schrödinger operator.
:::

:::{prf:example}
For $A = \begin{bmatrix} 3 & 1 \\ 1 & 3 \end{bmatrix}$ seen above, we build $Q = \frac{1}{\sqrt{2}} \begin{bmatrix} 1 & -1 \\ 1 & 1 \end{bmatrix}$, and write

$$
\begin{bmatrix} 3 & 1 \\ 1 & 3 \end{bmatrix} = A = Q \Lambda Q^T = 
\begin{bmatrix} \frac{1}{\sqrt{2}} & -\frac{1}{\sqrt{2}} \\ 
\frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} \end{bmatrix}
\begin{bmatrix} 4 & 0 \\ 0 & 2 \end{bmatrix}
\begin{bmatrix} \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} \\ 
-\frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} \end{bmatrix}.
$$
:::

## Geometric Interpretation 

You can always choose $Q$ to have $\det Q = 1$; such a $Q$ represents a rotation. Thus the diagonalization of a symmetric matrix can be interpreted as a rotation of the coordinate system so that the orthogonal eigenvectors align with the coordinate axes. Therefore, the linear transformation $L(\vv x) = A\vv x$ for which $A$ has all positive eigenvalues can be interpreted as a combination of stretches in $n$ mutually orthogonal directions. One way to visualize this is to consider what $L(\vv x)$ does to the unit Euclidean sphere $S = \{ \vv x \in \mathbb{R}^n \mid \|\vv x\| = 1\}$: stretching it in orthogonal directions will transform it into an ellipsoid : $E = L(S) = \{ A\vv x \mid \|\vv x\| = 1\}$ whose principal axes are the directions of stretch, i.e., the eigenvectors of $A$.

:::{figure}../figures/09-ellipse.jpg
:label:ellipse
:alt:Ellipse
:width: 400px
:align: center
:::



[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/nikolaimatni/ese-2030/HEAD?labpath=/07_Ch_8_Iteration/092-Markov_Chains.ipynb)

In [None]:
\section*{Quadratic Forms \& Positive Definite Matrices (ALA 3.4, LAA 7.2)}

One common place where symmetric matrices arise in application is in defining quadratic forms, which pop up in engineering design (in design criteria and optimization), signal processing (as output noise power), physics (as potential \& kinetic energy), differential geometry (as normal curvature of surfaces), economics (as utility functions), and statistics (in confidence ellipsoids).

A quadratic form is a function mapping $\mathbb{R}^n$ to $\mathbb{R}$ of the form

\[
q(x) = x^T k x \qquad (QF)
\]

where $k = k^T \in \mathbb{R}^{n \times n}$ is an $n \times n$ symmetric matrix. Such quadratic forms arise frequently in applications of linear algebra. For example, setting $k = I_n$ and $x = Ax - b$, we recover the least-squares objective

\[
q(Ax-b) = (Ax-b)^T(Ax-b) = \|Ax-b\|^2.
\]

\textbf{Example:} For $x \in \mathbb{R}^3$, let $q(x) = 5x_1^2 + 3x_2^2 + 2x_3^2 - x_1x_2 + 6x_2x_3$. Find a matrix $k = k^T \in \mathbb{R}^{3\times3}$ such that $q(x) = x^T k x$.

The approach is to recognize that the coefficients of $x_1^2$, $x_2^2$, and $x_3^2$ go on the diagonal of $k$. To make $k$ symmetric, the coefficients for $x_ix_j$, $i\neq j$, should be evenly split between the $(i,j)$ and $(j,i)$ entries of $k$.

Using this strategy, we obtain:

\[
q(x) = x^T k x = \begin{bmatrix} x_1 \\ x_2 \\ x_3 \end{bmatrix}^T 
\begin{bmatrix} 
5 & -\frac{1}{2} & 0 \\
-\frac{1}{2} & 3 & 3 \\
0 & 3 & 2
\end{bmatrix}
\begin{bmatrix} x_1 \\ x_2 \\ x_3 \end{bmatrix}.
\]

\section*{The Geometry of Quadratic Forms}

We'll focus on understanding the geometry of quadratic forms on $\mathbb{R}^2$. Let $k = k^T \in \mathbb{R}^{2\times2}$ be an invertible $2\times2$ symmetric matrix, and let's consider quadratic form:

\[
q(x) = \begin{bmatrix} x_1 \\ x_2 \end{bmatrix}^T 
\begin{bmatrix} k_{11} & k_{12} \\ k_{12} & k_{22} \end{bmatrix}
\begin{bmatrix} x_1 \\ x_2 \end{bmatrix} = k_{11}x_1^2 + 2k_{12}x_1x_2 + k_{22}x_2^2 \qquad (2D).
\]

What kinds of functions do these define? We study this question by looking at the level sets of $q(x)$. The $\alpha$-level set of $q(x)$ is the set of all $x \in \mathbb{R}^2$ such that $q(x) = \alpha$:

\[
C_\alpha = \{x \in \mathbb{R}^2 : q(x) = \alpha\}. \qquad (a)
\]

It is possible to show that such level sets correspond to either an ellipse, a hyperbola, two intersecting lines, a single point, or no points at all. If $k$ is a diagonal matrix, the graph of (a) is in standard position, as seen below:

[Insert Figure 2 here]

\textbf{Figure 2}: An ellipse and a hyperbola in standard position.

If $k$ is not diagonal, the graph of (a) is rotated out of standard position, as shown below:

[Insert figure for rotated ellipse/hyperbola here]

The principle axes of these rotated graphs are defined by the eigenvectors of $k$, and amount to a new coordinate system (or change of basis) with respect to which the graph is in standard position.

\textbf{Example:} The ellipse in Fig. 3a is the graph of equation $5x_1^2 - 4x_1x_2 + 5x_2^2 = 48$. This is given by the quadratic form

\[q(x) = x^T k x = \begin{bmatrix} x_1 \\ x_2 \end{bmatrix}^T \begin{bmatrix} 5 & -2 \\ -2 & 5 \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \end{bmatrix}.\]

The eigenvalues/vectors of $k$ are:

\[\lambda_1 = 3, \quad v_1 = \begin{bmatrix} 1 \\ 1 \end{bmatrix} \quad \text{and} \quad \lambda_2 = 7, \quad v_2 = \begin{bmatrix} -1 \\ 1 \end{bmatrix}.\]

We define the orthonormal basis: $u_1 = \frac{v_1}{\|v_1\|} = \frac{1}{\sqrt{2}} \begin{bmatrix} 1 \\ 1 \end{bmatrix}$ and $u_2 = \frac{v_2}{\|v_2\|} = \frac{1}{\sqrt{2}} \begin{bmatrix} -1 \\ 1 \end{bmatrix}$

and set $Q = [u_1 \quad u_2] = \frac{1}{\sqrt{2}} \begin{bmatrix} 1 & -1 \\ 1 & 1 \end{bmatrix}$. Then $q(x) = x^T Q \Lambda Q^T x$, and the change of variables $y = Q^T x$ produces the quadratic form

\[q(y) = y^T \Lambda y = 3y_1^2 + 7y_2^2.\]

These axes are shown in Fig 3a (labeled $y_1$ and $y_2$).

\textbf{Classifying Quadratic Forms}

Depending on the eigenvalues of the symmetric matrix $k$ defining a quadratic form $q(x) = x^T k x$, the resulting function can look very different. The figure below shows four different quadratic forms plotted as functions with domain $\mathbb{R}^2$, i.e., we are plotting $(x, y, q(x,y))$.

Notice that except for $x = 0$, the values $q(x)$ are all positive in Fig. 4(b) and all negative in Fig. 4(a). If we take horizontal cross-sections of these plots, we get an ellipse (these are the level sets $C_k$ we saw earlier), the vertical cross-sections of $q(x)$ are hyperbolas.

This simple 2D example illustrates the following definitions. A quadratic form $Q$ is:
\begin{enumerate}
    \item[(a)] positive definite if $q(x) > 0$ for all $x \neq 0$
    \item[(b)] negative definite if $q(x) < 0$ for all $x \neq 0$
    \item[(c)] indefinite if $q(x)$ assumes both positive and negative values.
\end{enumerate}

Also, $q(x)$ is said to be positive (negative) semidefinite if $q(x) \geq 0$ ($\leq 0$) $\forall x$. In particular, we now allow $q(x) = 0$ for nonzero $x$.

The following theorem leverages the spectral factorization of a symmetric matrix to characterize quadratic forms in terms of the eigenvalues of $k$.

\textbf{Theorem:} Let $k = k^T \in \mathbb{R}^{n \times n}$ be a symmetric $n \times n$ matrix. Then a quadratic form is
\begin{enumerate}
    \item[(a)] positive definite if and only if all eigenvalues of $k$ are positive,
    \item[(b)] negative definite if and only if all eigenvalues of $k$ are negative,
    \item[(c)] indefinite if and only if $k$ has both positive and negative eigenvalues.
\end{enumerate}

\textbf{Proof:} Let $k = Q \Lambda Q^T$ be the spectral factorization of $k$: $Q$ has columns defining an orthonormal eigenbasis for $\mathbb{R}^n$ and $\Lambda = \text{diag}(\lambda_1, \ldots, \lambda_n)$ is a diagonal matrix of eigenvalues of $k$. Then:

\[q(x) = x^T k x = x^T Q \Lambda Q^T x = (Q^T x)^T \Lambda (Q^T x) = y^T \Lambda y\]
\[\underbrace{y}_y \underbrace{y}_y\]
\[= \lambda_1 y_1^2 + \cdots + \lambda_n y_n^2. \qquad (*)\]

Since $Q$ is orthogonal, there is a one-to-one correspondence between all nonzero $x$ and nonzero $y$ ($y = Q^T x$, $x = Qy$).

Thus, the values of $q(x)$ for $x \neq 0$ coincide with the values of $(*)$: the sign of $q(x)$ is therefore controlled by the signs of the eigenvalues $\lambda_1, \ldots, \lambda_n$ as described in the theorem.

The classification of a quadratic form is used to also classify the matrix $k$ defining it. Thus, a positive definite matrix is a symmetric matrix for which the quadratic form $q(x) = x^T k x$ is positive definite, or equivalently, for which all of its eigenvalues are positive (by the theorem above). Equivalent definitions are defined for negative definite, positive/negative semidefinite, and indefinite matrices.

\textcolor{red}{WARNING: The condition that $k$ is positive definite, i.e., that $k > 0$, does NOT mean that all of the entries of $k$ are positive --- in fact, many matrices with all positive entries are NOT positive definite!}

Example: Is $q(x) = 3x_1^2 + 2x_2^2 + x_3^2 + 4x_1x_2 + 4x_2x_3$ a positive definite quadratic form? We construct $k$ as before:

\[q(x) = x^T k x = \begin{bmatrix} x_1 \\ x_2 \\ x_3 \end{bmatrix}^T \begin{bmatrix} 3 & 2 & 0 \\ 2 & 2 & 2 \\ 0 & 2 & 1 \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \\ x_3 \end{bmatrix}.\]

The eigenvalues of $k$ are $5, 2,$ and $-1$, so $k$ is an indefinite matrix and $q(x)$ is an indefinite quadratic form. This is the case even though all entries of $k$ are positive!

\textbf{Optimization Principles for Eigenvalues of Symmetric Matrices}

For symmetric matrices, we've seen that we can interpret eigenvalues as stretching of a vector in the directions specified by the eigenvectors. This is most clearly visualized in terms of a unit ball being mapped to an ellipsoid, as we illustrated earlier.

We can use this observation to answer questions such as: what direction is stretched the most by matrix? Or the least? Understanding these questions is essential in areas such as machine learning (which directions are most sensitive to measurement noise or estimation error), control theory (which directions are easiest/hardest to move my system in), and in dimensionality reduction (which directions "explain" most of my data).

These questions all have a flavor of optimization to them: we are looking for directions with the "most" or "least" effect. This motivates a study of eigenvalues of symmetric matrices from an optimization perspective.

We'll start with the simple case of a real diagonal matrix $\Lambda = \text{diag}(\lambda_1, \ldots, \lambda_n)$. We assume that the diagonal entries, which are the eigenvalues of $\Lambda$ (why?), appear in decreasing order:

\[\lambda_1 \geq \lambda_2 \geq \cdots \geq \lambda_n\]

so that $\lambda_1$ is the largest and $\lambda_n$ is the smallest.

The effect of $\Lambda$ on a vector $y$ is to multiply its entries by the corresponding diagonal elements: $\Lambda y = (\lambda_1 y_1, \ldots, \lambda_n y_n)$. Clearly, the maximal stretch occurs in the $e_1$ direction, while the minimal (or least positive) stretch occurs in the $e_n$ direction.

The key idea of the optimization principle for extremal (smallest or biggest) eigenvalues is the following geometric observation. Let's look at the associated quadratic form

\[q(y) = y^T \Lambda y = \lambda_1 y_1^2 + \cdots + \lambda_n y_n^2\]

Suppose that we are asked to pick a vector $y$ on the unit sphere, i.e., a $y$ satisfying $\|y\|^2 = 1$, that makes $q(y)$ as big or small as possible. We can then measure how much $y$ has been stretched by looking at the ratio $q(y) = \frac{q(y)}{\|y\|^2} = \|\Lambda y\|^2$.

So let's first look at the maximal direction: this means we are looking for $\|y\| = 1$ that maximizes $q(y) = \lambda_1 y_1^2 + \cdots + \lambda_n y_n^2$. Since $\lambda_1 \geq \lambda_i$, we have that

\[q(y) = \lambda_1 y_1^2 + \cdots + \lambda_n y_n^2 \leq \lambda_1 (y_1^2 + \cdots + y_n^2) = \lambda_1\]

and $q(e_1) = \lambda_1$. This means that

\[\lambda_1 = \max \{ q(y) \mid \|y\| = 1 \}. \qquad (\text{Max})\]

We can use the same reasoning to find that

\[\lambda_n = \min \{ q(y) \mid \|y\| = 1 \}. \qquad (\text{Min})\]
