---
title: 9.3 Optimization Principles for Eigenvalues of Symmetric Matrices
subject:  Symmetric Matrices
subtitle: the extreme eigen values
short_title: 9.3 Optimization Principles for Eigenvalues of Symmetric Matrices
authors:
  - name: Nikolai Matni
    affiliations:
      - Dept. of Electrical and Systems Engineering
      - University of Pennsylvania
    email: nmatni@seas.upenn.edu
license: CC-BY-4.0
keywords: 
math:
  '\vv': '\mathbf{#1}'
  '\bm': '\begin{bmatrix}'
  '\em': '\end{bmatrix}'
  '\R': '\mathbb{R}'
---

[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/nikolaimatni/ese-2030/HEAD?labpath=/08_Ch_9_Symmetric_Matrices/103-opt_princ.ipynb)

{doc}`Lecture notes <../lecture_notes/Lecture 16 - Eigenvalues of Symmetric Matrices, Spectral Theorem, Quadratic Forms and Positive Definite Matrices, Optimization Principles for Eigenvalues of Symmetric Matrices.pdf>`

## Reading

Material related to this page, as well as additional exercises, can be found in ALA 8.5.

## Learning Objectives

By the end of this page, you should know:
- the effect of the extreme eigenvalues on a vector
- the maxmimal and minimal direction in which the vector gets stretched
- the maximum and minimum values of a quadratic form and where it is achieved
- how the compute the (general) eigenvalues and eigenvectors via the optimization of quadratic forms

## Introduction

For symmetric matrices, we've seen that we can interpret eigenvalues as stretching of a vector in the directions specified by the eigenvectors. This is most clearly visualized in terms of a unit ball being mapped to an ellipsoid, as we illustrated [earlier](./101-eigen_symm.ipynb#ellipse).

We can use this observation to answer questions such as: what direction is stretched the most by matrix? Or the least? Understanding these questions is essential in areas such as machine learning (which directions are most sensitive to measurement noise or estimation error), control theory (which directions are easiest/hardest to move my system in), and in dimensionality reduction (which directions "explain" most of my data).

These questions all have a flavor of _optimization_ to them: we are looking for directions with the "most" or "least" effect. This motivates a study of eigenvalues of symmetric matrices from an optimization perspective.


## Diagonal Matrix

We'll start with the simple case of a real diagonal matrix $\Lambda = \text{diag}(\lambda_1, \ldots, \lambda_n)$. We assume that the diagonal entries, which are the eigenvalues of $\Lambda$ (why?), appear in decreasing order:

$$
\lambda_1 \geq \lambda_2 \geq \cdots \geq \lambda_n,
$$

so that $\lambda_1$ is the largest and $\lambda_n$ is the smallest.

The effect of $\Lambda$ on a vector $\vv y$ is to multiply its entries by the corresponding diagonal elements: $\Lambda \vv y = \bm \lambda_1 y_1 \\ \vdots \\ \lambda_n y_n \em$. Clearly, the maximal stretch occurs in the $\vv e_1$ direction, while the minimal (or least positive) stretch occurs in the $\vv e_n$ direction.

The key idea of the optimization principle for extremal (smallest or biggest) eigenvalues is the following geometric observation. Let's look at the associated quadratic form

$$
q(\vv y) = \vv y^T \Lambda \vv y = \lambda_1 y_1^2 + \cdots + \lambda_n y_n^2
$$

Suppose that we are asked to pick a vector $\vv y$ on the unit sphere, i.e., a $\vv y$ satisfying $\|\vv y\|^2 = 1$, that makes $q(\vv y)$ as big or small as possible. We can then measure how much/little $\vv y$ has been stretched by looking at the ratio $\frac{q(\vv y)}{\|\vv y\|^2} = q(\vv y)  = \|\Lambda^{\frac{1}{2}} \vv y\|^2$.

So let's first look at the maximal direction: this means we are looking for $\|\vv y\| = 1$ that maximizes $q(\vv y) = \lambda_1 y_1^2 + \cdots + \lambda_n y_n^2$. Since $\lambda_1 \geq \lambda_i$, we have that

$$
q(\vv y) = \lambda_1 y_1^2 + \cdots + \lambda_n y_n^2 \leq \lambda_1 (y_1^2 + \cdots + y_n^2) = \lambda_1
$$

and $q(\vv e_1) = \lambda_1$. This means that

\begin{equation}
\label{max_eqn}
\lambda_1 = \max \{ q(\vv y) \mid \|\vv y\| = 1 \}. \qquad (\text{Max})
\end{equation}

We can use the same reasoning to find that

\begin{equation}
\label{min_eqn}
\lambda_n = \min \{ q(\vv y) \mid \|\vv y\| = 1 \}. \qquad (\text{Min})
\end{equation}


## Generic Symmetric Matrix

Now, can we make a similar statement for a generic symmetric matrix $K=K^{\top}\in\mathbb{R}^{n\times n}$? Perhaps not surprisingly, using the [spectral factorization](./101-eigen_symm.ipynb#spectral_thm) provides an affirmative answer.

In particular, let $K = Q\Lambda Q^T$ be the spectral factorization of $K$. Then
$$
q(\vv x) = \vv x^TK \vv x = \vv x^TQ\Lambda Q^T \vv x = \vv y^T\Lambda \vv y, \quad \text{where } \vv y = Q^T\vv x.
$$

According to our previous discussion, the maximum of $\vv y^T\Lambda \vv y$ over all unit vectors $\|\vv y\|=1$ is $\lambda_1$, which is the **same** as the largest eigenvalue of $K$. Moreover, since $Q$ is an orthogonal matrix, it does not change the length of a vector when it acts on it:
$$
1 = \|\vv y\|^2 = \vv y^T\vv y = \vv x^TQQ^T \vv x = \vv x^T \vv x = \|\vv x\|^2 = 1,
$$

So that the maximum of $q(\vv x)$ over all $\|\vv x\|=1$ is again $\lambda_1$! Further, the vector $\vv x$ achieving the maximum is $Q \vv e_1 = \vv u_1$, the corresponding (normalized) eigenvector of $K$. This is consistent with our prior geometric discussion: the direction of the maximal stretch is the vector aligned with the largest semi-axis of the ellipsoid defined by $q(\vv x) = \vv x^TK \vv x = c$, as in [this ellipse](./102-quad_PSD.ipynb#out_standard).

We can apply the same reasoning to compute $\lambda_n$. We summarize our discussion in the following theorem:

:::{prf:theorem}
:label: opt_eig_thm
Let $K$ be a symmetric matrix with real eigenvalues $\lambda_1 \geq \lambda_2 \geq \cdots \geq \lambda_n$. Then
$$
\lambda_1 = \max\{\vv x^TK \vv x \mid \|\vv x\|=1\} \quad \text{and} \quad \lambda_n = \min\{\vv x^TK \vv x \mid \|\vv x\|=1\},
$$
are respectively its largest and smallest eigenvalues. The maximal (minimal) value is achieved when $\vv x = \pm \vv u_1 (\vv x = \pm \vv u_n)$ is an eigenvector associated with the largest eigenvalue $\lambda_1$ (smallest eigenvalue $\lambda_n$).
:::

:::{prf:example}
Maximize the quadratic form $q(x,y) = 3x^2 + 2xy + 3y^2$ over all $(x,y)$ satisfying $x^2+y^2=1$. The quadratic form is defined by the matrix $K = \begin{bmatrix} 3 & 1 \\ 1 & 3 \end{bmatrix}$, which has eigenvalues $\lambda_1 = 4$ and $\lambda_2 = 2$. Therefore, the maximum is $\lambda_1 = 4$, and is achieved at $\begin{bmatrix} x \\ y \end{bmatrix} = \vv u_1 = \frac{1}{\sqrt{2}} \begin{bmatrix} -1 \\ 1 \end{bmatrix}$.
:::

Finally, we note that the [above theorem](#opt_eig_thm) can be generalized to compute general eigenvalues by first eliminating the direction of the larger/smaller eigenvalues. For example, we can compute the second largest eigenvalue of $K$ by solving
$$
\lambda_2 = \max\{\vv x^TK \vv x \mid \|\vv x\|=1, \, \vv x^T \vv v_1 = 0\}.
$$

The key constraint is $x^T\vv u_1=0$, which says we can only look for vectors that are orthogonal to $\vv u_1$, the eigenvector associated with $\lambda_1$.

:::{prf:example}
Find the maximum value of $q(x_1,x_2,x_3) = 9x_1^2 + 4x_2^2 + 3x_3^2$ subject to the constraint that $\|\vv x\|=1$ and $\vv x^T\vv u_1=0$, for $\vv u_1=\bm 1 \\ 0 \\ 0 \em$ the eigenvector corresponding to the greatest  eigenvalue $\lambda_1=9$ of $K=\text{diag}(9,4,3)$. The constraint $\vv x^T\vv v_1=0$ means $ x_1=0$, and so we need to find $(x_2,x_3)$ satisfying $x_2^2+x_3^2=1$ that maximizes $4x_2^2+3x_3^2$. This happens at $(x_2,x_3)=(1,0)$, leading to a value of 4, which is the second largest eigenvalue $\lambda_2$ of $K$. The corresponding eigenvector is $\vv u_2=(0,1,0)$.
:::

**TO DO**: please add a less trivial example and then end with Thm 8.42 from ALA which has the general variational characterization of $\lambda_i$.


[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/nikolaimatni/ese-2030/HEAD?labpath=/08_Ch_9_Symmetric_Matrices/103-opt_princ.ipynb)

We can extend this logic (where we found the second largest eigenvalue by only optimizing over unit vectors which were orthogonal to $\vv{v_1}$) to obtain a characterization of the $i^{\text{th}}$ largest eigenvalue of a symmetric matrix $A$.

:::{prf:theorem} The general variational characterization of eigenvalues
Let $A$ be a symmetric matrix with eigenvalues $\lambda_1 \geq \lambda_2 \geq \dots \geq \lambda_n$ and corresponding orthogonal eigenvectors $\vv{v_1}, \dots, \vv{v_n}$. Then the maximal value of the quadratic form $q(\vv x) = \vv x^\top A\vv x$ over all unit vectors that are orthogonal to the first $j - 1$ eigenvectors is its $j^{\text{th}}$ eigenvalue:

$$
\lambda_j = \max \{ \vv x^\top A\vv x \mid \|\vv x\| = 1, \quad \langle \vv x, \vv{v_1} \rangle = \dots = \langle \vv{x}, \vv{v_2} \rangle = 0 \}.
$$
:::
